Matvey Arye’s Post

View profile for Matvey Arye, graphic

Leading the AI / Vector Database effort @ Timescale.

Want to know what I've been working on for the past year? My team has been busy developing vector database indexes for PostgreSQL! The results make me proud and is a testament to what small teams can accomplish.

View profile for 🔥 Avthar Sewrathan, graphic

AI, Developer Products, and Developer Relations

Pgvector is Now Faster than Pinecone (and 75% Cheaper) Thanks to a New Open-Source Extension – introducing pgvectorscale. 🐘 What is pgvectorscale? Pgvectorscale is an open-source PostgreSQL extension that builds on pgvector, enabling greater performance and scalability (keep reading for the actual numbers). By using pgvector and pgvectorscale, developers can build more scalable AI applications, benefiting from higher-performance embedding search and cost-efficient storage. 📈 How does it perform? On our benchmark of 50 million Cohere embeddings (768 dimensions each), PostgreSQL with pgvector and pgvectorscale achieves 28x lower p95 latency and 16x higher query throughput compared to Pinecone for approximate nearest neighbor queries at 99 % recall, all at 75 % less cost when self-hosted on AWS EC2. We also tested it against Pinecone’s p2 high performance index, see the blog post for full results (spoiler: It’s just as impressive). 🤔 Why did we build pgvectorscale? Our team at Timescale built pgvectorscale to make PostgreSQL a better database for AI and to challenge the notion that PostgreSQL and pgvector are not performant for vector workloads. ⚙️How does it achieve such good performance? Pgvectorscale brings specialized data-structures and algorithms for large-scale vector search and storage to PostgreSQL as an extension, including: (1) StreamingDiskANN –  a high-performance, cost-efficient vector search index for pgvector data inspired by research at Microsoft, and (2) Statistical Binary Quantization (SBQ), developed by Timescale’s own researchers to improve upon standard binary quantization techniques. These innovations help PostgreSQL deliver comparable and often superior performance than specialized vector databases like Pinecone. 👏 Big shoutout to Matvey Arye and John Pruitt, two senior staff engineers at Timescale, who worked on these technical breakthroughs. 🧑💻 Sounds exciting! How can I get started? Pgvectorscale is open-source under the PostgreSQL license, and free to use on any PostgreSQL  database. You can find installation instructions on the pgvectorscale GitHub repository (link in comments). It’s also available on any database service in Timescale’s PostgreSQL cloud platform. Share this post with your network to let them know about pgvectorscale and comment your reactions and questions. Let's make PostgreSQL a better database for AI together! See comments for link to learn more about pgvectorscale, and to the pgvectorscale github repo. #pgvector #pinecone #vectordatabase #benchmark #rag #opensource #postgresql

  • No alternative text description for this image
  • No alternative text description for this image
  • No alternative text description for this image

To view or add a comment, sign in

Explore topics