Optimize Processes for Large Spaces with the Multi-Camera Tracking Workflow

This post is the first in a series on building multi-camera tracking vision AI applications. In this part, we introduce the overall end-to-end workflow, focusing on building and deploying the multi-camera tracking system. The second part will cover fine-tuning AI models with synthetic data to enhance system accuracy.

Large areas like warehouses, factories, stadiums, and airports are typically monitored by hundreds of cameras to improve safety and optimize operations. Tracking objects and measuring activity accurately across these cameras is called multi-camera tracking, and it lets you effectively monitor and manage your spaces.

For example, retail stores can use multi-camera tracking to understand how customers navigate through the aisles and improve store layout for a better shopping experience. Warehouses can monitor the movement of equipment, material, and people to improve safety, increase delivery speed, and reduce costs. Airports can track the flow of people to enhance security and travel experience.

However, implementing multi-camera tracking systems can be challenging.

First, matching subjects across multiple camera feeds from different angles and views requires advanced algorithms and AI models that can take months to train accurately. In particular, ground-truth training datasets are scarce because labeling requires a single person or up to only a small group to review all the streams from numerous cameras for consistent identification and tracking, which delays AI model training.

Second, multi-camera tracking in real-time necessitates building specialized modules for live data streaming, multi-stream fusion, behavior analytics, and anomaly detection to deliver subsecond latency and high throughput.

Third, scaling to larger spaces like factories or airports necessitates distributed computing and a cloud-native architecture that can handle thousands of cameras and subjects.

That’s why we’re announcing the new multi-camera tracking reference workflow to speed the development of the next wave of vision AI that measures and helps manage infrastructure and operations over large spaces.

NVIDIA multi-camera tracking

NVIDIA multi-camera tracking is a customizable workflow that gives you a solid starting point to get your development in gear without having to begin from scratch, eliminating months of development time.

The workflow also provides a validated path to production. It includes state-of-the-art AI models pretrained on real and synthetic datasets that you can customize for your use case. It comes already packed with real-time video streaming modules.

Foundation layer: Production-ready capabilities that fuse multi-camera feeds to create global IDs for objects, along with their global and local coordinates.
Analytics layer: Unique object counts and local trajectories.
Visualization and UI: Sample heatmaps, histograms, and pathing that you can further build upon.

With these components of the workflow, you can code your business logic and build end-to-end vision AI applications to optimize and manage your spaces.

There is no extra cost, just infrastructure and tool licenses. Plus, you get expert support and the latest product updates for multi-camera tracking workflows with NVIDIA AI Enterprise.

Getting started with the multi-camera tracking workflow

To get started, see the developer guide for multi-camera tracking and learn how to deploy the reference workflow on your local development or in the cloud.

In the following sections, we walk you through the application architecture of the workflow and the step-by-step process for developing, configuring, and deploying it.

End-to-end workflow for multi-camera tracking

Diagram shows the workflows and apps for simulation and synthetic data generation and multi-camera tracking, plus storage and TAO service for images. — *Figure 1. Multi-camera tracking in the Sim2Deploy workflow*

The multi-camera tracking reference workflow (Figure 1) takes live or recorded streams from the Media Management microservice and outputs the behavior and global IDs of the objects in the multi-camera view. The behavior can be defined as the location, direction, speed, or trajectory of the object at any given time.

The object metadata, such as bounding boxes, tracking IDs, and behavior data with timestamps, is stored in an Elasticsearch index. The behavior data is also sorted and stored in a Milvus vector database. A Web UI microservice runs at the end of the workflow, enabling you to visualize the behaviors and click on any object to find where it was at any moment (timestamp). The data from ELK or Milvus is fetched through the Web API microservice.

Video 1. Metropolis multi-camera tracking reference application user interface

In Video 1, the right pane shows the building map overlying the global ID of the object and its behavior. The camera view on the left shows the current location of the object. In this window, you can query an object with its global ID to see where it was in a given period. For example, if you query the object with global id-2, you get the metadata related to this object ID among all the cameras available on the floor map.

Building blocks

The application is built with multiple NVIDIA Metropolis microservices:

It also includes third-party microservices:

Several NVIDIA Metropolis tools are used:

The following perception model is used:

People detection and re-identification embedding

Recipe from simulation to deployment

In this post, we explore a scalable recipe for developing these advanced AI capabilities, starting from using a digital twin for simulation and synthetic data generation to deploying to the cloud for inference.

Simulate and train

To build the most efficient and accurate AI workflow, using advanced NVIDIA technologies is crucial. This involves creating 3D digital twins, generating synthetic data, and streamlining model development with the following tools:

NVIDIA Omniverse: Build 3D digital replicas of real-world environments, position virtual cameras for diverse synthetic data capture, generate ground truth annotations, and validate applications before real-world deployment.
NVIDIA Isaac Sim: Employ Omni.Replicator.Agent to simulate agents, including people and robots, to streamline synthetic data generation from scenes. For more information, see Deploy Multi-Camera Tracking Workflows to Public Cloud with NVIDIA Metropolis Microservices.
NVIDIA TAO Toolkit: Simplify training and optimize models with both real and synthetic data, enhancing inference performance through quantization and pruning.

Together, these tools facilitate robust training, validation, and optimization of AI models, ensuring high performance in real-world applications. To learn how to enhance multi-camera tracking accuracy by fine-tuning AI models with synthetic data, read this blog post for more information.

Build and deploy the multi-camera tracking workflow

For the multi-camera tracking application, NVIDIA provides multiple options to build and deploy the app.

Quick deployment with docker-compose

Anyone can get started with a multi-camera tracking application. NVIDIA provides a few sample video streams that you can use by default and which can be configured in the nvstreamer microservice.

You can then configure these RTSP streams or add real camera RTSP endpoints to VST. Along with these streams, NVIDIA provides sample perception metadata generated with deepstream-app for anyone to check the multi-camera dataflow. You can deploy the entire end-to-end workflow as follows:

$ docker compose -f foundational/mdx-foundational.yml -f mtmc-app/mdx-mtmc-app.yml --profile e2e up -d --pull always --build --force-recreate

Here, the mdx-foundational.yaml file is a docker-compose file containing the basic services like Elasticsearch, Kafka, and so on. The mdx-mtmc-app.yaml file contains microservices such as Perception, Media Management, Behavior Analytics, and so on.

To try out a multi-camera application with preexisting perception data, run the docker compose command with the playback profile parameter:

$ docker compose -f foundational/mdx-foundational.yml -f mtmc-app/mdx-mtmc-app.yml --profile playback up -d --pull always --build --force-recreate

Deploy in a production setting in Kubernetes with Helm charts

To deploy the multi-camera tracking application in Kubernetes, first create a Kubernetes server. For more information, see the hardware and software prerequisites. The application resources are in NGC. The rest of this post assumes that you have set up a Kubernetes server and have NGC team access.

Download the deployment package:

ngc registry resource download-version "nfgnkvuikvjm/mdx-v2-0/metropolis-apps-k8s-deployment:<version>-<mmddyyyy>"

This package has the Helm values.yaml file for foundational services such as storage and monitoring services and application helm configs.

Values files are config files for your application, where you can define the image for a particular service, service replica count, service types, ports, ingress, volume, storage, and so on.

The application helm configs have a values file for each microservice, such as Perception (wdm-deepstream-mtmc-values.yaml) and Multi-Camera Fusion (mtmc-app-override-values.yaml).

For more information, see Values Files.

Deploy foundation services

helm install mdx-foundation-sys-svcs --wait https://helm.ngc.nvidia.com/nfgnkvuikvjm/mdx-v2-0/charts/mdx-foundation-sys-svcs-v1.3.tgz --username='$oauthtoken' --password=YOUR_API_KEY -f application-helm-configs/foundational-sys/foundational-sys-monitoring-override-values.yaml

Here the foundational-sys-monitoring-override-values.yaml file is replaced by override values.yaml, where you can define any updated application settings, such as adding the password for the Grafana UI.

Deploy microservices

For each microservice pod, GPU can be shared, except for the Perception microservice. The GPU sharing is enabled with NVIDIA_VISIBLE_DEVICES. Each microservice has an override value.yaml file, which can be used to customize any configs. We discuss customization later in this post.

Similar to the foundation service, you can deploy each microservice individually. For example, to deploy the Multi-Camera Fusion microservice run, use the following code:

helm install mdx-mtmc-app https://helm.ngc.nvidia.com/nfgnkvuikvjm/mdx-v2-0/charts/mdx-mtmc-app-1.0.37.tgz --username='$oauthtoken' --password=<NGC API KEY> -f application-helm-configs/MTMC/mtmc-app-override-values.yaml

For more information about how to deploy other microservices, see the user guide.

Access multi-camera tracking UI

When the deployment is done, you can access the multi-camera tracking UI at http://<K8s_host_IP>:31080/ui/mtmc/.

To effectively track objects across multiple camera views, minimize occlusion, and improve spatial understanding, NVIDIA employs camera calibration in the Multi-Camera Fusion and Behavior Analytics microservices. This involves providing sample video streams along with a calibration.json file and a building map image. The calibration.json contains both image coordinates and global coordinates, which are used to align each camera’s view with a common top-down map view.

Deploy in the cloud with a 1-click deployment script

To deploy Metropolis applications in the cloud, NVIDIA provides a one-click script to multiple cloud service providers (CSPs), such as Microsoft Azure, Google Cloud Platform, and Amazon Web Services (AWS). For more information, see Deploy Multi-Camera Tracking Workflows to Public Cloud with NVIDIA Metropolis Microservices (video) and Cloud Setup topic in the NVIDIA Metropolis User Guide.

Configure for your use case

If you’ve followed all the build and deploy instructions, then you can run and visualize the results from the multi-camera tracking workflow on the sample streams packaged with the release.

To scale the application to your use case, NVIDIA provides a few configuration options at each microservice level and tools. Add the config level customization in override-values.yaml.

 helm install vst-app https://helm.ngc.nvidia.com/rxczgrvsg8nx/<microservice>/charts/<microservice>-<version>.tgz --username='$oauthtoken' --password=<NGC API KEY> -f application-helm-configs/MTMC/override-values.yaml

Configuring cameras with Video Storage Toolkit in the Media Management microservice

In this microservice, you can set configs for your video streams. If you have live cameras, you can use the Video Storage Toolkit (VST). VST provides multiple configurations that you can set for requirements:

vst_config.json:
  notifications:
    enable_notification: true
    use_message_broker: "kafka"
  security:
   use_https: false
   use_http_digest_authentication: false
vst_storage.json:
   total_video_storage_size_MB: 100000

Now you can deploy the microservices with these override-values.yaml updates:

Access the VST Dashboard at http://IP_address:30000.
Add cameras.

Integrating fine-tuned models and custom trackers in the Perception microservice

The Perception microservice provides multiple configuration options. For example, the multi-camera tracking application comes with default models for detecting people. If you have a customized model and must update model parameters, you can update the model config file.

Perception microservice is based on NVIDIA DeepStream, which provides multiple types of trackers to track objects within a single camera view. You can pick any of the trackers and update the configuration.

A typical Perception microservice pipeline includes a detector and a multi-object tracker, each with numerous parameters listed in their respective configuration files. Manually tuning these parameters for different applications makes it challenging to achieve optimal accuracy, so NVIDIA provides the pipe tuner tool to help you get the best parameters and achieve the best accuracy for a particular use case.

Creating calibration files with the Camera Calibration Toolkit

For your newly added camera streams, you must create the calibration.json file to use the Multi-Camera Fusion and Behaviour Analytics microservice. For efficient and scalable camera calibration, NVIDIA provides a UI-based camera calibration toolkit.

Monitoring and logging

In the multi-camera tracking application, NVIDIA integrates the Kibana dashboard with foundation services that you can use to monitor and visualize the application.

Open the Kibana Dashboard at http://<host_ip>/5601.

Image shows behavior histograms, the number of unique objects in all camera streams, and other charts. — *Figure 2. Kibana dashboard to monitor multi-camera tracking objects over a timestamp*

From the top left, you can see there are a total of 34 objects detected in the given input streams. But out of 34 objects, there are only six unique people. At the bottom of the image are the multi-camera tracking workflow histograms.

Start using the workflow today

Multi-camera tracking reference workflow is now generally available in developer preview. Get started today with the quickstart guide and follow the step-by-step instructions to download the workflow artifacts. Then deploy the workflow in your environment on-premises or on the cloud.

To customize and further build on top of the workflow with NVIDIA tools covering the entire vision AI lifecycle from simulation to fine-tuning and deployment, see the Metropolis Multi-Camera Sim2Deploy quickstart guide.

Give it a try and send us your feedback through the developer forum or your NVIDIA enterprise support team. We can’t wait to see what you build with multi-camera tracking to improve the safety and utility of physical spaces to the next level!

For more information, see the following resources: