Vision Language Models (VLM)
Multimodal models that can reason against image and video inputs and perform descriptive language generation
Specialized Foundation Models
Computer vision models that excel at particular visual perception tasks
![](https://cdn.statically.io/img/build.nvidia.com/_next/image?url=https%3A%2F%2Fassets.ngc.nvidia.com%2Fproducts%2Fapi-catalog%2Fimages%2Focdrnet.jpg&w=3840&q=90)
nvidia /
ocdrnetPREVIEWOCDNet and OCRNet are pre-trained models designed for optical character detection and recognition respectively.
Optical Character Detection
Optical Character Recognition
![](https://cdn.statically.io/img/build.nvidia.com/_next/image?url=https%3A%2F%2Fassets.ngc.nvidia.com%2Fproducts%2Fapi-catalog%2Fimages%2Fvisual-changenet.jpg&w=3840&q=90)
nvidia /
visual-changenetPREVIEWVisual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask
Image Segmentation
NVIDIA NIM
![](https://cdn.statically.io/img/build.nvidia.com/_next/image?url=https%3A%2F%2Fassets.ngc.nvidia.com%2Fproducts%2Fapi-catalog%2Fimages%2Fretail-object-detection.jpg&w=3840&q=90)
nvidia /
retail-object-detectionPREVIEWEfficientDet-based object detection network to detect 100 specific retail objects from an input video.
NVIDIA NIM
Object Detection