Transformers v4.42 includes a new Transformer-based model capable of real-time object detection. See below for more info:
RT-DETR is now supported in Hugging Face Transformers! 🙌 RT-DETR, short for “Real-Time DEtection TRansformer”, is a computer vision model developed at Peking University and Baidu, Inc. capable of real-time object detection. The authors claim better performance than YOLO models in both speed and accuracy. The model comes with an Apache 2.0 license, meaning people can freely use it for commercial applications. 🔥 RT-DETR is a follow-up work of DETR, a model developed by AI at Meta that successfully used Transformers for the first time for object detection. The latter has been in the Transformers library since 2020. After this, lots of improvements have been made to enable faster convergence and inference speed. RT-DETR is an important example of that as it unlocks real-time inference at high accuracy! Big congrats to Daniel Choi for contributing this model! * Demo notebooks (fine-tuning + inference): https://lnkd.in/eA_WzsyE * Demo Space: https://lnkd.in/ewzWTSHA * Paper: https://lnkd.in/eR3Qg6dm #ai #artificialintelligence #objectdetection #huggingface #computervision
Interesting. Need to read the paper. YOLO is CNN-based. During the X / Twitter debate between Elon and Yann, one of the points was around computer vision without CNNs (which Elon claimed Tesla is doing now and those cars need very fast inference models which certain CNN architectures are capable of).
Recently Yolov10 has been released. I checked on the paper, it doesn't do the comparaison with this one. Do we know if it even outperforms Yolov10 ? In anycase if the code is opensource, it opens a new door for real-time detection. I will have something to read before to sleep.
Hééé tes ex collègues font ça ! https://www.linkedin.com/feed/update/urn:li:activity:7212024869444595713/
RT-DETR is faster than YOLOv8 and has better accuracy. Thanks for the object detection notebook. Do you have any similar examples for segmentation, or could you come up with one?
Great. Well done! And yet, a bit sad. YOLO was the last great bastion of CNN models. It saddens me in a way that this works. I guess now we'll really see them off.
Wow .. faster than YOLO.
Exciting development, Hugging Face! The innovation in real-time object detection keeps raising the bar. Kudos to the team!
So impressive!
Great
RT-DETR benefits real-time object detection by guaranteeing real-time performance and accuracy. It employs Vision Transformers for effective multiscale feature processing, features adaptable inference speed adjustment, and supports CUDA with TensorRT, outperforming other real-time detectors in both speed and accuracy.