Questions tagged [pypdf]
pypdf is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. It can retrieve text and metadata from PDFs as well as merge entire files together.
pypdf
1,511
questions
0
votes
0
answers
24
views
Merged .pdf file created on python is damaged
I'm very new to python. I wanted to merge multiple pdfs - 121 files - in a particular order (since Python merges in an alphabetical order). So, I created a filelist.txt file while merging all the .pdf ...
1
vote
2
answers
48
views
When using PyPDF2 for Python, how do I transfer data in CSV format to an existing PDF with blank form fields?
I am currently using the PyPDF2 extension with Python and have my data (which was originally a Google Form) and then downloaded as a CSV file and am hoping to copy this data into an existing PDF with ...
0
votes
0
answers
36
views
Is there a function in pypdf to get the page number of a field? (Python)
I'm trying to find an attribute or function that will return the page number/index of a field that I pass as an argument. E.g. get_field_page_number(field_name) -> int
I want to be able to get a ...
0
votes
0
answers
40
views
pypdf: extract_text in extraction_mode="layout" is working if table is on one page but not working if the table goes to 2nd page
I am using pypdf to extract text and and using this code below. It works if the table is one page (closing the table), but if the table is extended to another page (partially on one page and the rest ...
0
votes
0
answers
44
views
Python PDF page size
I am trying to get the page sizes of the pages in my PDF. I have tried using both PyPDF2 and pdfminer, I get the same results from both - 423.024x639.024 for artbox, cropbox, etc, and 459.048x675.048 ...
2
votes
0
answers
44
views
Trying to extract information from pdf files in google colab. It is just repeating most information from the first file into all the others
This is the code:
for file in files.get('files', []):
# ... (Get file content as before)
# Extract data from the PDF
pdf_reader = PyPDF2.PdfReader(BytesIO(file_content))
page = ...
0
votes
0
answers
30
views
Python pypdf watermark position
I am trying to add a watermark to a pdf using pypdf. I have a watermark.pdf file which has 'Confidential' in small read font on the very top left of the page. However when I try to stamp or watermark '...
0
votes
1
answer
127
views
I can't get any PDF uploads to read
The app is supposed to read multiple PDFs but I can't get even a single PDF to work because of this issue. Any help is appreciated.
I received the error:
AttributeError: 'bytes' object has no ...
0
votes
1
answer
91
views
Create a blank page and add text content using PyPDF2: module 'PyPDF2' has no attribute 'pdf'
Using this method to add create a blank page, add text to it and then append the page to a pdf.
def add_text_to_blank_page(pdf_writer, text):
# Create a new blank page
page = PyPDF2._pdf....
0
votes
0
answers
62
views
Add image in Image Field with PDF Forms
I got PDF Forms with Text field and Image Field. How to I add image from Image field?
For text field in document pypdf that show great information and I success. But I fails to add image in Image ...
1
vote
1
answer
92
views
PyPDF does not give me the right image
I am writing a python program to merge multiple PDFs containing images into one PDF, with the option to select specific pages from PDF source files, specify the order and other things.
I'm using PyPDF ...
0
votes
1
answer
117
views
How do i extract tables in the most efficient way using?
I have been using pdfplumber since. Is there any other library? apart from camelot, which uses pypdf2 and now theres an error saying:
File "C:\Users\USER\AppData\Local\Programs\Python\Python312\...
0
votes
1
answer
109
views
Anaconda 3: nbconvert failed: PdfFileWriter is deprecated and was removed in PyPDF2 3.0.0. Use PdfWriter instead
ANACONDA 3 - Windows 11
Jupyter Lab: File -> Save As -> PDFviaHTML fails with the error below.
Anaconda Prompt command Line: jupyter nbconvert -to xxx yyy.ipynb fails with the same error
...
0
votes
0
answers
38
views
How can I add the sub-bookmark to a PDF with pypdf, and remaining them when I tried many times?
There are 2 questions:
I want to add the sub-bookmark to an existing bookmark. What I know is just to add first-layer bookmark.
I want to know how to remain the existing bookmark to avoid delete an ...
0
votes
0
answers
95
views
Python - Extract certain values from PDFs in a folder
I am using the below code to extract text from hundreds of PDF files in a specific folder:
from pypdf import PdfReader
import os
import glob
path = input("Enter the file path: ")
pattern = ...