Computer Vision Definitions

  • by

Learn more about computer vision and AI with this short list of top level computer vision and AI terminology. Don’t know what AI training is? An AI model? Object recognition? Find out from these quick computer vision definitions.

  • AI API: An application programming interface (API) for users to gain access to artificial intelligence tools and functionality. By offering third-party AI services, AI APIs save developers from having to build their own AI in-house. 
  • AI Demo: A demonstration of the features and capabilities of an AI platform, or of artificial intelligence in general.
  • AI Model: The result of training an AI algorithm, given the input data and settings (known as “hyperparameters”). An AI model is a distilled representation that attempts to encapsulate everything that the AI algorithm has learned during the training process. AI models can be shared and reused on new data for use in real-world environments.
  • AI Platform: A software library or framework for users to build, deploy, and manage applications that leverage artificial intelligence. AI platforms are less static and more extensive than AI APIs: whereas AI APIs return the results of a third-party pre-trained model, AI platforms allow users to create their own AI models for different purposes.
  • AI Training: The process of training one or more AI models. During the training process, AI models “learn” over time by looking at more and more input data. After making a prediction about a given input, the AI model discovers whether its prediction was correct; if it was incorrect, it adjusts its parameters to account for the error.
  • Annotation: The process of labeling the input data in preparation for AI training. In computer vision, the input images and video must be annotated according to the task you want the AI model to perform. For example, if you want the model to perform image segmentation, the annotations must include the location and shape of each object in the image.
  • Computer Vision: A subfield of computer science, artificial intelligence, and machine learning that seeks to give computers a rapid, high-level understanding of images and videos, “seeing” them in the same way that human beings do. In recent years, computer vision has made great strides in accuracy and speed, thanks to deep learning and neural networks.
  • Data Collection: The process of accumulating large quantities of information for use in training an AI model. Data can be collected from proprietary sources (e.g. your own videos) or from publicly available datasets, such as the ImageNet database. Once collected, data must be annotated or tagged for use in AI training.
  • Deep Learning: A subfield of artificial intelligence and machine learning that uses neural networks with multiple “hidden” (deep) layers. Thanks to both algorithmic improvements and technological advancements, recent years have seen deep learning successfully used to train AI models that can perform many advanced human-like tasks—from recognizing speech to identifying the contents of an image.
  • Dense Classification: A method for training deep neural networks from only a few examples, first proposed in the 2019 academic paper “Dense Classification and Implanting for Few-Shot Learning” by Lifchitz et al. Broadly, dense classification encourages the network to look at all aspects of the object it seeks to identify, rather than focusing on only a few details.
  • Edge AI: The use of AI and machine learning algorithms running on edge devices to process data on local hardware, rather than uploading it to the cloud. Perhaps the greatest benefit of Edge AI is faster speeds (since data does not have to be sent to and from the cloud back and forth), enabling real-time decision-making.
  • Edge Devices: An Internet-connected hardware device that is part of the Internet of Things (IoT) and acts as a gateway in the IoT network: on one hand, the local sensors and devices that collect data; on the other, the full capability of IoT in the cloud. For fastest results, many edge devices are capable of performing computations locally, rather than offloading this responsibility to the cloud.
  • Ensemble Learning: The use of predictions from multiple AI models trained on the same input (or samples of the same input) to reduce error and increase accuracy. Due to natural variability during the training phase, different models may return different results given the same data. Ensemble learning combines the predictions of all these models (e.g. by taking a majority vote) with the goal of improving performance.
  • Facial Authentication: A subfield of facial recognition that seeks to verify a person’s identity, usually for security purposes . Facial authentication is often performed on edge devices that are powerful enough to identify a subject almost instantaneously and with a high degree of accuracy.
  • Facial Recognition: The use of human faces as a biometric characteristic by examining various facial features (e.g. the distance and location of the eyes, nose, mouth, and cheekbones). Facial recognition is used both for facial authentication (identifying individual people with their consent) as well as in video surveillance systems that capture people’s images in public.
  • GPU: Short for “graphics processing unit,” a specialized hardware device used in computers, smartphones, and embedded systems originally built for real-time computer graphics rendering. However, the ability of GPUs to efficiently process many inputs in parallel has made them useful for a wide range of applications—including training AI models.
  • Hash: The result of a mathematical function known as a “hash function” that converts arbitrary data into a unique (or nearly unique) numerical output. In facial authentication, for example, a complex hash function encodes the identifying characteristics of a user’s face and returns a numerical result. When a user attempts to access the system, their face is rehashed and compared with existing hashes to verify their identity.
  • Image Enrichment: The use of AI and machine learning to perform automatic “enrichment” of visual data, such as images and videos, by adding metadata (e.g. an image’s author, date of creation, or contents). In the media industry, for example, image enrichment is used to quickly and accurately tag online retail listings or new agency photos.
  • Image Quality Control: The use of AI and machine learning to perform automatic quality control on visual data, such as images and videos. For example, image quality control tools can detect image defects such as blurriness, nudity, deepfakes, and banned content, and correct the issue or delete the image from the dataset.
  • Image Recognition: A subfield of AI and computer vision that seeks to recognize the contents of an image by describing them at a high level. For example, a trained image recognition model might be able to distinguish between images of dogs and images of cats. Image recognition is contrasted with image segmentation, which seeks to divide an image into multiple parts (e.g. the background and different objects).
  • Internet of Things/IoT: A vast, interconnected network of devices and sensors that communicate and exchange information via the Internet. As one of the fastest-growing tech trends (with an estimated 127 new devices being connected every second), the IoT has the potential to transform industries such as manufacturing, energy, transportation, and more.
  • JSON Response: A response to an API request that uses the popular and lightweight JSON (JavaScript Open Notation) file format. A JSON response consists of a top-level array that contains one or more key-value pairs (e.g. { “name”: “John Smith”, “age”: 30 }).
  • Labeling: The process of assigning a label that provides the correct context for each input in the training dataset, or the “answer” that you would like the AI model to return during training. In computer vision, there are two types of labeling: annotation and tagging. Labeling can be performed in-house or through outsourcing or crowdsourcing services.
  • Liveness Detection: A security feature for facial authentication systems to verify that a given image or video represents a live, authentic person, and not an attempt to fraudulently bypass the system (e.g. by wearing a mask of a person’s likeness, or by displaying a sleeping person’s face). Liveness detection is essential to guard against malicious actors.
  • Machine Learning: A subfield of AI and computer science that studies algorithms that can improve themselves over time by gaining more experience or viewing more data. Machine learning includes both supervised learning (in which the algorithm is given the expected results or labels) and unsupervised learning (in which the algorithm must find patterns in unlabeled data).
  • Machine Vision: A subfield of AI and computer vision that combines hardware and software to enable machines to “see” at a high level as humans can. Machine vision is distinct from computer vision: a machine vision system consists of both a mechanical “body” that captures images and videos, as well as computer vision software that interprets these inputs.
  • Metadata: Data that describes and provides information about other data. For visual data such as images and videos, metadata consists of three categories: technical (e.g. the camera type and settings), descriptive (e.g. the author, date of creation, title, contents, and keywords), and administrative (e.g. contact information and copyright).
  • Neural Network: An AI and machine learning algorithm that seeks to mimic the high-level structure of a human brain. Neural networks have AI Trainingmany interconnected artificial “neurons” arranged in multiple layers, each one storing a signal that it can transmit to other neurons. The use of larger neural networks with many hidden layers is known as deep learning.
  • Object Recognition: A subfield of AI and computer vision that seeks to recognize one or more objects contained in an image. Object recognition is related to, but distinct from, image recognition. For example, given an image of a soccer game, an image recognition model might return only “soccer game,” while an object recognition model would return the different objects in the image (e.g. “player,” “soccer ball,” “goal,” etc.).
  • Perception: In the Chooch AI platform, a pre-trained neural network model that has been trained on a set of visual data for tasks such as image recognition and object recognition. Given an image or video, the Chooch AI API selects the appropriate perception and returns its best estimate for the contents.
  • Pre-Trained Model: An AI model that has already been trained on a set of input training data. Given an input, a pre-trained model can rapidly return its prediction on that input, without needing to train the model again. Pre-trained models can also be used for transfer learning, i.e. applying knowledge to a different but similar problem (for example, from recognizing car manufacturers to truck manufacturers).
  • Segmentation: A subfield of AI and computer vision that seeks to divide an image or video into multiple parts (e.g. the background and different objects). For example, an image of a crowd of people might be segmented into the outlines of each individual person, as well as the image’s background. Image segmentation is widely used for applications such as healthcare (e.g. identifying cancerous cells in a medical image).
  • Sentiment Detection: A subfield of AI and computer vision that seeks to understand the tone of a given text. This may include determining whether a text has a positive, negative, or neutral opinion, or whether it contains a certain emotional state (e.g. “sad,” “angry,” or “happy”).
  • Tagging: The process of labeling the input data with a single tag in preparation for AI training. Tagging is similar to annotation, but uses only a single label for each piece of input data. For example, if you want to perform image recognition for different dog breeds, your tags may be “golden retriever,” “bulldog,” etc.
  • Video Analytics: The use of AI and computer vision to automatically analyze the contents of a video. This may include facial recognition, motion detection, and/or object detection. Video analytics is widely used in industries such as security, construction, retail, and healthcare, for applications from loss prevention to health and safety.
  • Visual AI: The use of artificial intelligence to interpret visual data (i.e. images and videos), roughly synonymous with computer vision.

Want to learn more about computer vision from Chooch AI? Contact us about out computer vision solutions.

The post Computer Vision Definitions appeared first on Chooch.

Leave a Reply

Your email address will not be published. Required fields are marked *