Croissant is a high-level format for machine learning datasets that brings together four rich layers.
-
Updated
Jul 19, 2024 - Python
Croissant is a high-level format for machine learning datasets that brings together four rich layers.
SpectroChemPy is a framework for processing, analyzing and modeling spectroscopic data for chemistry with Python
A pipeline to make ASR datasets better
TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Video Games Data is a project that provides video game related data to explore and analyze for data enthusiasts, data scientists and machine learning practitioners.
An Open-source Deep Learning Framework for Visual Place Recognition
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
An open source NLP library based on MindSpore
⚡FlashRAG: A Python Toolkit for Efficient RAG Research
Papers and datasets for tensor time series.
TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.
Registry of data portals, catalogs, data repositories including data catalogs dataset and catalog description standard
Lightweight web API for visualizing and exploring any dataset - computer vision, speech, text, and tabular - stored on the Hugging Face Hub
Add a description, image, and links to the datasets topic page so that developers can more easily learn about it.
To associate your repository with the datasets topic, visit your repo's landing page and select "manage topics."