Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metadata for DatabaseReader #13799

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Add metadata for DatabaseReader #13799

wants to merge 1 commit into from

Conversation

cluic
Copy link

@cluic cluic commented May 29, 2024

Description

Add Metadata Functionality to DatabaseReader
In the original implementation, the DatabaseReader loaded all the data into text format. However, sometimes we don't need to retrieve all the columns; we only need some fields as metadata. Therefore, I have updated the load_data method.

With this update, we can now load data from the database and specify certain fields to be included as metadata. Here a simple example:

from llama_index.readers.database import DatabaseReader

reader = DatabaseReader(uri="sqlite:///D:/AAA/BBB/msg.db?table=msg")

# load data from database
documents = reader.load_data(
    query="SELECT msgid, content, sender, timestamp FROM msg", 
    doc_cols=["content"],
    metadata_cols=["msgid", "sender", "timestamp"], 
)

Fixes # (issue)

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

  • Yes
  • No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

  • Yes
  • No

Type of Change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • Added new unit/integration tests
  • Added new notebook (that tests end-to-end)
  • I stared at the code and made sure it makes sense

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added Google Colab support for the newly added notebooks.
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran make format; make lint to appease the lint gods
Add metadata for DatabaseReader
@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label May 29, 2024
@cluic
Copy link
Author

cluic commented May 29, 2024

Hi maintainers,

I have updated the DatabaseReader to include metadata functionality. Could you please review my PR and approve the workflows?

Thank you!

Copy link
Contributor

@nerdai nerdai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cluic for the contribution. I've left some comments.

@@ -73,16 +73,21 @@ def __init__(
"set of credentials."
)

def load_data(self, query: str) -> List[Document]:
def load_data(self, query: str, metadata: dict=None, doc_cols: list=None, metadata_cols: list=None) -> List[Document]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from typing import Dict, List

def load_data(
    self,
    query: str,
    metadata: Dict = {},
    doc_cols: List = [],
    metadata_cols: List = []
)

Rather than Optional, I think you can just set defaults as I did above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also use the types that I've indicated above, otherwise mypy might complain

@nerdai
Copy link
Contributor

nerdai commented May 30, 2024

Please also run make lint and make format then commit and push changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:S This PR changes 10-29 lines, ignoring generated files.
2 participants