Cogniac Applications

The Cogniac system is designed to automate many different types of visual observation tasks. This flexibility is enabled by allowing complex visual processing tasks to be decomposed into far simpler operations. Applications form the basic building blocks of media processing within the Cogniac system. Many of the application types are based upon deep convolutional neural networks, but other simpler types of applications also have roles to play. The application types currently supported by the Cogniac system developer preview include:

Classification
Detection Full-frame
Detection
Detection Point
Detection Area v2
Optical Character Recognition
Static Count
Static Focus
Camera Capture
Video Segmentation
Chipper

Other application types available by special arrangement include:

Line Crossing
Dynamic Tracking
NSFW
Motion Detection

The highlighted types use deep convolutional neural networks.

Other pre-trained applications for detecting common subjects such as people or vehicles are also available by special arrangement.

Application-Specific Data

Some application types use the concept of optional application specific subject-media data, or app_data, that identifies the relationship between the subject and media object in the context of the observation task performed by that application; such as box region areas in detection applications, and a range of frames in detection full-frame applications.

📘

Application-Specific Data

The Cogniac system supports the following application data types:

box_set - A list of box regions corresponding to the subject. A box region is defined as:

{
    “box”:            (dictionary of integers) “x0”, “x1”, “y0”, “y1”  
                          Values correspond to pixel offsets in the media item 
                          (x0, y0)  upper left corner of bounding box
                          (x1, y1) lower right corner of bounding box
                          Note that x1 must be greater than x0 
                          and y1 must be greater than y0

    “probability”:  (optional) (float) probability for box
}

segment_list - A list of video frame segments corresponding to the subject. A segment is defined as:

{
    “segment”:  (dictionary of integers) “f0”, “f1”
                       f0 & f1 correspond to the frame offsets in a video media 
                       (frame count starts at 0)
                       f0 is the first frame in the media item segment with the subject
                       f1 is the last frame in the media item segment with the subject 
                       Note that f1 must be greater than, or equal to f0

    “probabilities”:  [ordered list of per-frame model output probabilities]
                             The first list entry corresponds to prob for the frame at f0
                             The last list entry corresponds to prob for the frame at f1
                             Note the length of the list if present must equal (f1 - f0 + 1)
}

ocr - A string representing the characters detected.

"01234"

count - A float (>= 0) representing the number of occurrences of the subject detected.

2.5

Classification

A Classification application identifies, or tags, media items with the single most prominent subject from a list of output subjects of interest, to the exclusion of the other subjects. The list of subjects must be mutually exclusive and exhaustive. In other words, exactly one of the subjects must apply to every media item presented to a classification app. Maximum number of output subjects supported is 20.

📘

Classification of Media

All media that is processed by a Classification application will be associated with a single output subject.

Useful Classification applications should have at least two output subjects. The subjects must be mutually exclusive and exhaustive.

Creating a Classification Application

Classification applications require the following fields:

  1. An application name; e.g. 'Security Cam People Classifier'.

  2. A list, output_subjects, of subjects by which to classify the input media, e.g. 'security alert', ' employee', and 'visitor'.

  3. An optional list, input_subjects, of one or more subjects to store input media. Can be output subjects from other Cogniac applications.

To view additional, optional application creation fields and their default values, see Applications - Create

📘

Application Name

It is best-practice to create applications with short but descriptive names that easily identify the end-goal of the application to other users in the same tenant.

Classification Subjects and Feedback

Classification applications select a single output subject in each input media item. It is best-practice when providing application feedback to select the subject that appears most prominently, or is best represented in the media item.

To attain an accurate and stable application, continue to upload new media to the application's input subjects and provide feedback on the application's subject assertions. Monitor the application's model performance statistics: precision, recall, and F1 scores, until sufficient accuracy is reached.

📘

Feedback Context

It is best practice to provide feedback in a manner that matches the feedback of a reasonable human expert in the application's domain. Providing highly-contextual feedback for a classification application that will be used to process out-of-context images may result in lower model performance.

For example, while providing feedback to a classification application trained to identify if an image contains a particular person or subject, it is not recommended to provide positive feedback for blurry images, or images where the individual providing feedback is aware the subject is in the image, but is not visible without that knowledge.

The semantics of a classification application always requires the selection of exactly one output subject. If for some reason there are media items presented to the application with none of the desired subjects, or multiple of the desired subjects, an additional class or classes can be added to deal with these exceptional cases.

For example, if no subject of interest appears in some of the media items, an output subject representing 'None' or 'No Cats or Dogs' can be added. Similarly, if some of the media items contain multiple subjects, you can add a subject like "Multiple Cats or Dogs" with which to classify these images. This strategy works best if these cases are relatively rare. If you expect to have many images where there are no subjects or multiple subjects then the 'Detection Full-frame' application type is probably a better choice.

Note that because subjects are arbitrary and the semantics are learned by the classifier it is possible to combine both the "none" exceptional subject and the "multiple" exceptional subject into a single subject. For example, in a 'cat versus dog' classifier with a cat subject and a dog subject a third class "cat dog exceptions" could be added to cover both exception cases of 'no cats or dogs' and 'multiple cats or dogs'. The application will automatically learn to select this class whenever either of these two conditions apply.

🚧

Use an 'Exception Class' to handle problematic images

Use an 'exception subject' in a classifier application to aggregate the relatively infrequent exceptional cases that often occur when none of the normal subject are present in the media, multiple of the normal subjects are present, or the media is otherwise off-domain. Keep track of the subject semantics with the subject description field.

If the 'no subject' or 'multiple subjects' cases are relatively frequent then a Detection Full-frame application type may be more appropriate.

Note that because neural networks can easily learn arbitrary functions, it is possible to use this aforementioned technique to great effect. For example, it is possible to have a classification app where there is a 'cat' subject and a 'cat with mouse in mouth' subject where, through consistent feedback, the application model learns that 'cat with mouse in mouth' should always be prioritized over 'cat' even though 'cat with mouse in mouth' by definition contains a cat.

Detection Full-Frame

A Detection Full-frame application, much like a Classification application, identifies subjects of interest in media. However, in a Detection Full-frame application the association of each output subject to a media item is assessed independently. So a single media item can have strong associations with multiple subjects; e.g. an image can contain both a cat and a dog, with subject-media association probability around 1.0 for both subjects. With Detection Full-frame applications any number of subjects can be present, or media can have a strongly negative association with all subjects, allowing for more complex subject media relationships than a simple classification application. Maximum number of output subjects supported is 20.

A Detection Full-frame application is an appropriate choice when a single subject-media association is not sufficient for the needs of the application’s end product, or when it is not guaranteed to find even a single output subject in the input media data. For example, in contrast to a Classification application, a Detection Full-frame application with one output subject is useful for filtering raw input media into relevant and irrelevant subjects for further injection of relevant media into other applications.

One of the most common uses of Detection Full-frame applications is as a 'filter' for a relatively rare subject.

Creating a Detection Full-Frame Application

Detection Full-frame applications require the following fields:

  1. An application name; e.g. 'Car Detector'.

  2. A list, output_subjects, of one or more subjects to identify within the input media, e.g. 'car'.

  3. An optional list, input_subjects, of one or more subjects to store input media. Can be output subjects from other Cogniac applications.

  4. An optional boolean, feedback_all_positives, when True, surfaces all positive subject-media associations for user feedback. This field is best used for applications with rare subjects to ensure all positive detections get feedback.

To view additional, optional application creation fields and their default values, see Applications - Create

📘

Detection Full-Frame Subject-Media Associations

In contrast to Classification applications that are guaranteed to output exactly one positive association between a media item and output subject, output media from Detection Full-frame applications are related positively or negatively to every output subject.

When processed by a Detection Full-frame application, media items are tagged with a more complex series of positive and negative relationships with the application's output subjects, and hence an application with a single output subject will generate two sets of subject-media associations, positively correlated media, and negatively correlated media.

Detection Full-Frame Feedback

Detection Full-frame applications identify separately the strength of each association between one or more subjects in a media item. It is best-practice when providing feedback in a Detection Full-frame application to remember to weigh the relevance, both positive and negative, of each output subject individually against the provided media.

📘

False Subject-Media Associations

In Detection Full-frame applications subjects and media can be negatively associated in the same way a subject is positively associated with media. If no output subjects are sufficiently represented in a media item it is perfectly valid to provide False feedback for every subject.

Detection Application

A Detection application boxes regions of interest that are associated with a subject in each input media item. Zero or multiple regions of interest may be identified for each subject.

Creating a Detection Application

Detection applications require the following fields:

  1. An application name; e.g. 'License Plate Boxer'.

  2. A list, output_subjects, with a single subject to identify within the input media, e.g. 'license_plate'.

  3. An optional list, input_subjects, of one or more subjects to store input media. Can be output subjects from other Cogniac applications.

  4. An integer, max_boxes, identifying the maximum number of box regions to detect.

  5. A float value, iou_threshold, used to decide whether a predicted box overlaps enough with ground truth box with respect to Intersection over Union (the area of overlap between the bounding boxes divided by the area of union)

  6. A float value, nms_threshold, denoting the non-maximum suppression; a threshold used to decide whether to consider boxes that significantly overlap each other to be the same box (the box with the largest detection probability is kept).

To view additional, optional application creation fields and their default values, see Applications - Create

Detection Feedback

When a subject is positively associated with a media item in a detection application, the subject-media association must be accompanied by the application specific data, or app_data_type, box_set which identifies a list of box regions. For more information on application specific data and the valid format of box_sets, see Application-Specific Data.

When the subject is not positively associated with the media item, no additional application specific, app_data, is passed.

📘

Detection Boxes and Focus Areas

As mentioned previously, the output of a Cogniac application can be used as an input to another application for creating more complex visual recognition tasks. The applications that accept the output of other Cogniac applications are sometimes referred to as "downstream" applications, as they focus primarily on subject-media that has specifically been processed and filtered by one or more applications prior to being processed by the downstream application.

When an application outputs subject-media with detection boxes in its app_data, any downstream applications that accept the detection output subject media will process only the portions of the media that have been positively identified with the subject. This is accomplished through the concept of subject-media focus areas. Subject-media focus areas are box regions, or in the case of video media, ranges of frames, that identify subregions of the whole media to be processed by the handling application, as opposed to the whole media object. Detection application output app_data box_sets are transformed to subject-media focus areas prior to being fed to downstream applications. For more information on focus areas see Subject Media Associations

Detection Area v2

Optical Character Recognition Application

Optical Character Recognition (OCR) applications identify letters, numbers, and spaces, in media. Due to the potentially infinite combinations of characters that can be observed, OCR applications use a single output subject to associate media, with the specific identified characters listed in the subject-media association's app_data.

Creating an OCR Application

OCR applications require the following fields:

  1. An application name; e.g. 'License Plate Reader'.

  2. A list, output_subjects, with a single output subject to identify within the input media, e.g. 'license_plates'.

  3. An optional list, input_subjects, of one or more subjects to store input media. Can be output subjects from other Cogniac applications.

  4. A string, character_set, of characters to identify. OCR applications are case-sensitive; i.e. 'AaBbCc' will identify both cases of the letters 'A', 'B', and 'C'.

  5. An integer, max_characters, denoting the length of character strings to identify in media. For example, a license plate reader focused on California license plates could use a max character length of 7. Spaces can be used to denote missing characters.

To view additional, optional application creation fields and their default values, see Applications - Create

OCR Feedback

When providing feedback in an OCR application, only one output subject is required, as the characters identified are specified in the subject-media association's app_data field with an app_data_type of ocr.

📘

Negative OCR Subject-Media

False, or empty subject-media associations are not accepted for an OCR application, however an empty OCR detection can be made by passing an OCR value of " ", a number of spaces equal to max_characters (space must be included as an allowed character).

For best results, it is recommended to use either a detection-fullframe or detection application to identify the media, or focus areas of media, that are positively associated with characters being present.

For example, a license plate reader OCR application can use the output of a license plate region detection application that identifies a box region in an image of automobiles that contains the license plate. As a downstream application, the OCR application will receive the output of the license plate detection application as subject-media focus areas, and only the boxed region identified by the detection application will be processed for character recognition.

Static Count

Static Count applications identify the number of instances that a single subject is positively associated with a media item.

Creating a Static Count Application

Static Count applications require the following fields:

  1. An application name; e.g. 'Car Counter'.

  2. A list, output_subjects, with a single output subject to identify within the input media, e.g. 'car'.

  3. An integer, min_count, denoting the minimum allowable number of instances of the subject in media. If, during feedback, a user asserts that less than min_count instances of a subject are present, the application will automatically update the min_count value to reflect that.

  4. An integer, max_count, denoting the maximum allowable number of instances of the subject in media. If, during feedback, a user asserts that more than max_count instances of a subject are present, the application will automatically update the max_count value to reflect that.

To view additional, optional application creation fields and their default values, see Applications - Create

Static Count Feedback

When providing feedback in a Static Count application, only one output subject is required, with the total count of subject instances identified in the subject-media association's app_data field, with an app_data_type of count.

Non-feedback Applications

The Cogniac system also offers a number of application types that perform various image collection, processing, and filtering tasks that do not require feedback for model tuning.

Motion Detection

Motion Detection applications detect and output images when a change in a sequence of images occurs. For additional information on motion detection applications, see the Motion Detector Tutorial.

Creating a Motion Detection Application

Motion Detection applications require the following fields:

  1. An application name; e.g. 'Security Camera Motion'.

  2. A list, output_subjects, with a single output subject to identify within the input media, e.g. 'motion'.

  3. An integer, min_detection_scale, that affects the sensitivity of the detector. A range of 0 through 9, where 0 is most sensitive and allow small changes in images to be detected and 9 is least sensitive.

  4. An integer, median_filter_kernel, that controls the size of the aperture used for smoothing an image mask. The mask results from applying the latest image to a background image. The resulting difference can be viewed as just the changes between the reference background image and the latest image. The edges of differences are smoothed using a filter. The size of this filter should be an odd number and greater than 1. A value of 5 is recommended for most applications.

  5. Integers, *min_detection_height, min_detection_width, that controls the size of the output image generated when motion is detected. Defaults to 227x227 which means a box of 227x227 pixels will be cropped from the image in which a detection occurred. This box will be associated with the output subject for the motion detector. A value of -1 for both the height and width will allow the whole image to be associated with the output subject rather than the region in which detections occurred.

  6. An integer, dilation Iterations, a larger dilation region results in larger boxes which can group multiple objects moving into one box rather than individual boxes. A recommended value for dilation is 17 iterations.

Camera Capture

Camera Capture applications provide a way of grouping cameras together, controlling how media enters your app pipeline(s), and controlling how your cameras are triggered. Every camera you create in the cogniac system is auto-assigned a subject, so all images from a given camera will end up in that camera's subject. By adding a camera to your Camera Capture app, you allow the Camera Capture app to take control of how that camera is triggered. All cameras within a single Camera Capture app will be triggered together.

Creating a Camera Capture Application

Creating a Camera Capture app requires the following fields. Please contact cogniac support if you need assistance configuring your Camera Capture app (see Support).

  1. An application name; i.e. ‘manufacturing line 1 camera capture’
  2. An input subject (optional) that will be used to route trigger events to Camera Capture apps.
  3. A set of Network Cameras whose subjects will become the output subjects of this Camera Capture app.
  4. An EdgeFlow that your GigE-Vision cameras are connected to (not required for IP cameras)
  5. The Video Segment Duration specifies the length of video you want your video stream to be segmented into (assumes continuous IP camera stream).
  6. The Frames Per Second dictates how many still image frames the Camera Capture app should collect per second.
  7. The Action Device Key is the device key used for triggering GigE cameras.
  8. The Action Group Key is the group key used for triggering GigE cameras.
  9. The Action Group Maks is the group mask used for triggering GigE cameras.
  10. The Action Subnet is the subnet your GigE cameras reside on.
  11. The Trigger Delay Seconds is the period of time we wait before triggering cameras after receiving a trigger event.
  12. The Trigger Hold Down Seconds is the minimum duration of time that must elapse between consecutive trigger events.
  13. The Trigger Duration Second can be used in conjunction with the Frames Per Second setting to capture frames at a given frame rate for a specified number of seconds.

Video Segmentation

Video Segmentation applications segment a video file into one or more fixed durations. For example a video with a 5 minute duration can be segmented into 30, 10 second segments. The segments are associated with an output subject for further downstream processing.

Creating a Video Segmentation Application

Video Segmentation applications require the following fields:

  1. An application name; e.g. 'Pier Cam Segments'.

  2. A list, output_subjects, with a single output subject to capture the input media, e.g. 'pier_cam_footage'.

  3. An integer, video_segment_duration, denoting the duration, in seconds, of each segment.

Chipper

The Chipper App can take very large images (up to 100 megapixels) and ‘chip’ them into smaller individual images. This allows large media to be easily interpreted by the user and improve the user experience in the Cogniac web-app. Currently supported image formats are PNG, JPG, and JPEG.

Creating a Chipper Application

Chipper applications require the following fields:

  1. An application name; i.e. ‘chip satellite images’
  2. An input_subject where the large media resides.
  3. An output_subject where the chip media will be output.
  4. Overlap Pixels, the number of overlapping pixels between adjacent chips.
  5. X Chips, the number of chips desired in the X dimension.
  6. Y Chips, the number of chips desired in the Y dimension.