Applications
Applications process incoming media from 'input_subjects' and associate the media with outbound 'output_subjects'.
The attributes associated with an individual application include:
Name | Example | Description |
---|---|---|
application_id string | "di71rG94" | (read only) Unique application_id automatically assigned to each Application object upon creation |
tenant_id string | "2ijgkd9we8dksd" | (read only) Unique tenant_id automatically assigned to each tenant object upon creation |
name string | "Person detector" | Name should be brief and descriptive |
description string | "Find people walking into the front door" | A full description of the purpose of the application. Use this field to capture detailed subject and any exceptional feedback instructions. |
type string | "classification" | (read only) Cogniac appliction type. See below for the valid types |
input_subjects array | ["flicker_cats", "animals_pics"] | Array of input subjects id's (note that input subjects' must be already created and have a valid id before adding to an Application) |
output_subjects array | ["cat", "dog", "face"] | List of subject tags corresponding to objects, patterns, or features that are of interest in this application |
release_metrics string | "best_F1" | The performance measure used when to assess model performance. |
detection_ thresholds dictionary | {"cat": 0.5, "dog": 0.7, "face": 0.5} | Map between subjects and associated probability thresholds. Detections below the specified probability threshold will not be forwarded to subsequent applications (if any). Detections below the threshold will not be posted to the detection_post_urls (if any). |
custom_fields dictionary (Being deprecated, replaced by app_type_config) | { "min_px": 50, "max_px": 450, "stride_px": 20 } | Field values for application parameters that are unique to a particular application type. |
detection_ post_urls array | ["http://127.0.0.1:9999/ my_model_output.net", "https://detections-now.com/detections"] | A list of URL's where model detections will be surfaced in addition to web and iOS interfaces. Posts are retried for thirty seconds, but URL's that fail retried posts after thirty seconds are blacklisted for five minutes. |
gateway_ post_urls array | ["http://127.0.0.1:9999/ my_model_output.net", "https://detections-now.com/detections"] | A list of URL's where model detections will be surfaced from the gateway. Posts are retried for thirty seconds, but URL's that fail retried posts after thirty seconds are blacklisted for five minutes. Specifying a gateway post URL implies that the gateway will implement the application along with any linked applications. |
active bool | true | Flag to control if the the application is active or not. Inactive applications do not process images submitted to their input subjects or requests feed back. |
replay bool | false | Switch to turn on replay of the input subjects to the app. This is used to 'skim' from the pool of input subject images for the purpose of creating more consensus data. |
refresh_feedback bool | false | Flag to control whether the images waiting for user feedback should be re-evaluated by the new model when a new model is released. |
model_id string | "Hpo-d-bf30-019ahMzliYY2KT4iWI-YN_mtsv1_4426.tgz" | Current model in use by cloud inference. |
staging_gateway_model_id string | "Hpo-d-bf30-019ahMzliYY2KT4iWI-YN_mtsv1_4426.tgz" | Force gateway to use a specific staging model. If a staging model is specified and the gateway is configured to use staging models, the gateway will download and use the specified model. Likewise for production models. However, if the gateway is configured to use a production model, but a production model is not specified for a particular application, the gateway will default to using the latest production model. The same logic applies to staging models. |
production_gateway_model_id string | "Hpo-d-bf30-019ahMzliYY2KT4iWI-YN_mtsv1_4426.tgz" | Force gateway to use a specific production model. If a staging model is specified and the gateway is configured to use staging models, the gateway will download and use the specified model. Likewise for production models. However, if the gateway is configured to use a production model, but a production model is not specified for a particular application, the gateway will default to using the latest production model. The same logic applies to staging models. |
app_managers array | ["[email protected]", "[email protected]"] | List of user email address, the users are given the app_manager role that is authorized to manage application settings and maintain feedback control. |
system feedback per_hour integer | 48 | (read only) The current target number of feedback requests per hour for the application. By default this is determined automatically by the system based on the current model performance and the number of subject-media associations that have reached consensus. The user can over-ride the system selected value by setting the requested_feedback_per_hour configuration item to the desired feedback level. |
requested feedback per_hour integer | 50 | Override the target rate of feedback to surface per hour. A Null value indicates the system feedback rate should be used. The default value is Null: system selects feedback rate. Select a higher value to schedule more feedback feedback requests, or a lower value to schedule fewer feedback requests. |
hpo_credit integer | 10 | (read only) Based on amount of feedback given, 1 credit is good for 1 immediately prioritized hyperparameter optimization training run for this application. Training is scheduled giving priority to higher credit holders. |
created_at float | 1455044755 | (read only) Unix Timestamp |
modified_at float | 1455044770 | (read only) Unix Timestamp |
created_by string | "[email protected]" | (read only) email address of the user who created this application |
current_ performance float | 0.9 | _(read only)_performance of current winning model based on current validation images with respected to release metrics |
bestmodel ccp_filename string | "Hpo-d-bf30-019ahMzliYY2KT4iWI-YN_mtsv1_4426.tgz" | _(read only)_filename of the current winning model |
last_candidate_at float | 1455044770 | _(read only)_timestamp when last model was trained for this app |
last_released_at float | 1455044770 | _(read only)_timestamp when last winning model was released for this app |
candidate_ model_count int | 20 | _(read only)_number of models trained for this app |
release_ model_count int | 5 | _(read only)_number of models released for this app |
training_data_count int | 750 | _(read only)_number of training images used by the current winning model |
validation_data_count int | 250 | _(read only)_number of validation images used to evaluate the current winning model performance |
inference_execution_policies dict | { "replicas": 1, "max_batch": 8 "runtime_policy": { "rtc_timeout_seconds": 5, "model_seconds": 0, "model_load_policy": "realtime", "gpu_simul_load": 1, "gpu_selection_policy": "instance-ix" } } | Cogniac EdgeFlows and CloudFlow instances provide a high degree of flexibility for executing application models. Workloads can range from a single application to 100's of simultaneous applications. Different EdgeFlow and CloudFlow models can contain one GPU to dozens of GPUs in clustered environments. Inference_execution_policies controls the tradeoffs between number of applications, application throughput, and GPU memory consumption. Details see below. |
Inference Execution Policies
Cogniac EdgeFlows and CloudFlow instances provide a high degree of flexibility for executing application models. Workloads can range from a single application to 100's of simultaneous applications. Different EdgeFlow and CloudFlow models can contain one GPU to dozens of GPUs in clustered environments.
The following controls are available on a per application basis to tune the tradeoffs between number of applications, application throughput, and GPU memory consumption.
replicas The number of instances of the application model that are simultaneously running. Running multiple instances of a model can increase the application throughput when multiple GPUs are available, at the expense of more overall GPU memory consumption by the application's models. The default replicas is 1.
max_batch The number of media items that may be batched in a single model inference pass. Increasing the max_batch size may increase application throughput for smaller media items at the expense of higher latency. The default max_batch is 1.
runtime_policy specifies controls for managing the scenarios where all applications (including all application replicas) can not fit in GPU memory at once, in which case models must be dynamically loaded and unloaded from GPU memory. Loading and unloading models from GPU memory is relatively slow (temporarily reducing throughput and increasing latency). Furthermore in highly oversubscribed scenarios there can be large delays in acquiring a GPU with sufficient available memory. These policies allow the tradeoffs to be tuned for different usage patterns. runtime_policy consists of the following:
model_load_policy
model_seconds
rtc_timeout_seconds
gpu_selection_policy
gpu_simul_load
where
model_load_policy is one of "realtime", "timebound", "run-to-completion". This controls the policy for UNLOADING a model from GPU memory. This policy is needed because there is a latency associated with loading a model into GPU memory, and potentially an even larger latency associated with finding a GPU with sufficient available memory if the EdgeFlow is highly oversubscribed with respect to number of applications relative to the amount of GPU memory. The default model_load_policy is "realtime".
With the "realtime" policy the model will never be unloaded (once successfully loaded). This provides the highest sustained throughput for applications that are processing constant media streams or can not otherwise absorb the latency or potential uncertainty associated with acquiring a GPU with sufficient memory available combined with the subsequent latency of loading the model into memory.
With the "timebound" policy a model will be unloaded model_seconds (int) after being successfully loaded. If the model_seconds expires while an inference is in progress, the inference will still complete. Thus a "timebound" policy with model_seconds of 0 always results in the model removal immediately after processing a single media batch. The default model_seconds is 0.
With "run-to-completion" policy a model will be unloaded rtc_timeout_seconds (float) after its input queue is emptied of input media items. This policy is useful for certain periodic application input patterns where the inputs may be somewhat spread out in time. For example, a usage pattern that excepts to receive 4 images every 20 seconds but the 4 images arrive over several seconds would be a good candidate for the "run-to-completion" policy. The default rtc_timeout_seconds is 5.
gpu_selection_policy is one of "instance-ix" or "by-free-memory". The default gpu_selection_policy is "by-free-memory".
When "instance-ix" policy is selected a model will be assigned to a GPU based on the replica index (0 to replicas - 1) modulo the total gpu count. This is more deterministic and most appropriate for realtime applications.
When "by-free-memory" policy is selected gpu_simul_load control the number of models that are allowed to simultaneously contend for a give GPU's available memory. This is only really relevant if there are many 10's of apps contending for each GPU. The default gpu_simul_load is 1.
Updated about 2 years ago