🔬 Proof of Concept of extracting content from PDF files using multiple PDF libraries
-
Updated
Jul 15, 2024 - C#
🔬 Proof of Concept of extracting content from PDF files using multiple PDF libraries
Read and extract text and other content from PDFs in C# (port of PDFBox)
Data automation and processing tool designed to streamline the extraction and analysis of data from PDF's documents using MS Power Automate Desktop and Excel VBA.
A thin C and Rust wrappers over `mutool convert` that extract text from pdf into in-memory buffer.
The exploit allows you to convert EXE to files, its coded 100% from scratch and used by private methods to assure a great stability and long lasting FUD time. You are able to attach it to all email providers and now a days everyone uses Adobe based Reader or PDF Reader so it gives a huge chance of success.
This is a Python application that converts non-readable PDF files, such as scanned documents, into readable Word documents. It achieves this by first converting the PDF files into images and then extracting the text from the images to create the Word documents. The application provides a user-friendly interface to do the above task.
Efficient tool for PDF lists items extraction to CSV conversion and CSV file merging, leveraging Python's powerful libraries.
DocNET is as fast PDF editing and reading library for modern .NET applications
Testing the capabilities of pdfjs
Engage in dynamic conversations with PDFs to extract and comprehend information using locally hosted LLM variants of Ollama by integrating RAG.
using open source library the goal on this program is to transform a pdf into data blocks with meta-data usable by any other program
PDFsam, a desktop application to split, merge, mix, rotate PDF files and extract pages
Testing the capabilities of reactpdf
PDF Tables extraction with Java and Tabula
Simple script for extracting questions, answers and so on from test PDFs (for a subject called TS I have at uni) to a more usable format.
Python library to interact with https://pdftables.com API
Fix links in PDF files, rewrite links, extract text annotations, remove pages
An Intelligent Assistant that explains the content of a PDF file. Built with ChromaDB and Langchain.
Add a description, image, and links to the pdf-extractor topic page so that developers can more easily learn about it.
To associate your repository with the pdf-extractor topic, visit your repo's landing page and select "manage topics."