Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tables in pdf not getting saved into csv file #824

Open
vijayproxima opened this issue Jun 3, 2024 · 0 comments
Open

Tables in pdf not getting saved into csv file #824

vijayproxima opened this issue Jun 3, 2024 · 0 comments

Comments

@vijayproxima
Copy link

HI,
In my pdf file, I have 4 tables [4 regions] for listing the holidays for a year. the tables has columns, Sr.No, Date, Day and Festival. The title on the table is Region Name Holiday List 2024. However, when i execute this line, there is no csv file being created nor the pdfdocs.jsonl file is created. it is just creating the data.jsonl file.
def parsing_the_pdfs():
t0 = time.time()
# Create a Library
LLMWareConfig().set_active_db("sqlite")

lib = Library().create_new_library("pdfdocs")
#parse and extract all of the contents from these documents
# Add file to the library
parsing_output = lib.add_files(input_folder_path=input_data)

print("Update: parsing time :", time.time() -t0)
print("Update: parsing output :", parsing_output)
#export all of the content of the library into jsonl files with metadata
output1 = lib.export_library_to_jsonl_file(output_data, "data.jsonl")
# export all of the tables
output2 = Query(lib).export_all_tables(query="Holiday", output_fp=output_data)

return 0

p= parsing_the_pdfs()
This is the output when I execute the code:
Update: parsing time : 0.0057866573333740234
Update: parsing output : {'docs_added': 0, 'blocks_added': 0, 'images_added': 0, 'pages_added': 0, 'tables_added': 0, 'rejected_files': []}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
1 participant