Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate PQ speed ups by reducing distance calculations #124

Open
jkni opened this issue Oct 12, 2023 · 0 comments
Open

Investigate PQ speed ups by reducing distance calculations #124

jkni opened this issue Oct 12, 2023 · 0 comments

Comments

@jkni
Copy link
Collaborator

jkni commented Oct 12, 2023

At this point, profiles of our PQ look like it's almost entirely using distance work. Barring large parameter changes or a paradigm shift in how we quantize, it seems like the next avenue is reducing distance calculations. There are a few approaches out there that focus on this. Several use kd-trees of different types (main tradeoff is how the tree is split, stopping points, and how entangled the clustering is with the construction of the kd-tree). There also exist kd-tree approaches for choosing initial centroids by leaf destiny. Interesting papers in this direction include "Fast K-Means Clustering Via K-D Trees, Sampling, and Parallelism" by Crowell which provides a good survey of the field and "Tree-Based Algorithm for Stable and Efficient Data Clustering" by Aljabbouli, Albizri, and Harfouche.

An alternative approach that might require fewer modifications is presented in "A fast k-means clustering algorithm using cluster center displacement" (Jim Z.C. Lai, Tsung-Jen Huang, Yi-Ching Liaw), which also focuses on reducing distance calculations, but it does not require a specialized data structure.

These approaches are unlikely to move the needle on quality of our quantization and should be considered primarily if we need efficiency improvements on top of our current approach.

@jkni jkni changed the title Investigate PQ speed ups Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
1 participant