Three new tutorial notebooks are out! 🥵 🥵 🥵
(great models under permissive licenses)
Our collection of notebooks is growing. In just the last few weeks, we've added three tutorials covering two models - Florence-2 and RT-DETR.
- Florence-2 - lightweight vision-language model open-sourced by Microsoft under the MIT license. The model demonstrates strong zero-shot and fine-tuning capabilities across tasks such as captioning, object detection, grounding, and segmentation. Despite its small size, it achieves results on par with models many times larger, like Kosmos-2.
- RT-DETR - short for "Real-Time DEtection TRansformer", is a computer vision model developed by Peking University and Baidu, Inc. In their paper, "DETRs Beat YOLOs on Real-time Object Detection" the authors claim that RT-DETR can outperform YOLO models in object detection, both in terms of speed and accuracy. The model has been released under the Apache 2.0 license, making it a great option, especially for enterprise projects.
I've written two blog posts about Florence-2 - one discussing the model's capabilities across various computer vision tasks and the other on fine-tuning Florence-2 on custom object detection. I've also published a YouTube tutorial covering this model. The RT-DETR fine-tuning blog post will be out today. Links to all these resources are in the description below. 👇🏻
⮑ 🔗 notebooks repository: https://lnkd.in/deQexCeS
#tutorial #computervision #objectdetection #multimodal #artificialintelligence