Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] : How can I specify embedding dimension with DocumentSummaryIndex? #14582

Open
1 task done
keonho-kim opened this issue Jul 5, 2024 · 4 comments
Open
1 task done
Labels
question Further information is requested

Comments

@keonho-kim
Copy link

Question Validation

  • I have searched both the documentation and discord for an answer.

Question

Hello, guys.

While dealing with DocumentSummaryIndex, I found that a param, embed_dim in VectoreSotreIndex is not supported.

In this case, how can I control embedding dimension? - I think it would be troublesome when I working with several vectorstores, which have different embedding dimension from DocumentSummaryIndex.

Any ideas?

@keonho-kim keonho-kim added the question Further information is requested label Jul 5, 2024
@keonho-kim keonho-kim changed the title How can I specify embedding dimension with DocumentSummaryIndex? Jul 5, 2024
@keonho-kim keonho-kim changed the title How can I specify embedding dimension with DocumentSummaryIndex? Jul 5, 2024
Copy link

dosubot bot commented Jul 5, 2024

Hello, @keonho-kim! I'm here to help you with any bugs, questions, or contributions you may have. Let's tackle this challenge together!

To specify the embedding dimension with DocumentSummaryIndex, you should use the embed_model parameter in the DocumentSummaryIndexEmbeddingRetriever class. The embed_model should be an instance of BaseEmbedding that supports the desired embedding dimension.

Here is an example of how you can specify the embedding dimension:

from llama_index.core.indices.document_summary.base import DocumentSummaryIndex
from llama_index.core.indices.document_summary.retrievers import DocumentSummaryIndexEmbeddingRetriever
from llama_index.core.base.embeddings.base import BaseEmbedding

# Assuming you have a custom embedding model that supports the desired embedding dimension
class CustomEmbeddingModel(BaseEmbedding):
    def __init__(self, embed_dim: int):
        self.embed_dim = embed_dim
        # Initialize other necessary components

    def get_agg_embedding_from_queries(self, queries: List[str]) -> List[float]:
        # Implement the method to return embeddings of the specified dimension
        pass

# Create an instance of your custom embedding model with the desired dimension
custom_embed_model = CustomEmbeddingModel(embed_dim=128)

# Create an instance of DocumentSummaryIndex
document_summary_index = DocumentSummaryIndex(...)

# Create an instance of DocumentSummaryIndexEmbeddingRetriever with the custom embedding model
retriever = DocumentSummaryIndexEmbeddingRetriever(
    index=document_summary_index,
    embed_model=custom_embed_model
)

In this example, CustomEmbeddingModel is a placeholder for your actual embedding model that supports the desired embedding dimension. You need to implement the get_agg_embedding_from_queries method to return embeddings of the specified dimension. Then, you can pass this custom embedding model to the DocumentSummaryIndexEmbeddingRetriever to use it for retrieval [1].

To continue talking to Dosu, mention @dosu.

@logan-markewich
Copy link
Collaborator

Embed dim goes to the vector store, not the index

Whether your need it or not depends on the vector db you ate using

@keonho-kim
Copy link
Author

keonho-kim commented Jul 5, 2024

What I'm confusing is that, what DocumentSumamryIndex returns.

Is it docstore? then what 'embed_summaries' means?

here is what I want to do, but I'm not sure where DocumentSumamryIndex goes. (Now I'm guessing that it is 'docstore')

from llama_index.core.ingestion import IngestionPipeline

pipeline = IngestionPipeline(
          transformations=transformations,
          vector_store=self._vector_store,
          docstore=self._docstore,
          docstore_strategy=docstore_strategy,
          **kwargs,
      )
@logan-markewich
Copy link
Collaborator

document summary index is not related to the code you posted?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
2 participants