All Questions
700 questions
0
votes
1
answer
43
views
pypdf or pikepdf advice needed on bookmarks
I am sorry but I am unable to understand how to rearrange bookmarks in PDF document.
I have PDF document with medical records which was created by importing new and new items from individual ...
0
votes
0
answers
44
views
PDF form checkboxes checking using python and pdfrw
Has anyone experience with pdf form checkbox checking? The case is that within a django application, based on annotations of pdf forms I map and identify checkboxes I want to either check, or leave ...
1
vote
0
answers
29
views
Combine Duplicate Fonts When Appending PDF Files - Python
I am trying to combine 450 PDFs into a single PDF. There are only a few unique fonts among all the files. Once all the files are combined, I look at the structure though pdfcrowd.com and I can see the ...
0
votes
1
answer
39
views
Correct format of ICC-based /ColorSpace in PDF
I am generating PDF files on-the-fly. The files contain JPEG images in the Adobe RGB (1998) colourspace, with the profile embedded. The PDF generation toolkit embeds the images correctly, but sets the ...
0
votes
0
answers
52
views
How to add a form field to an existing pdf with python
I've been tasked with creating python script that will add a form field to an existing pdf file. Seemed rather straight forward but I've hit a wall. The form field is being added (apparently not ...
0
votes
0
answers
48
views
How to read and update image form field in PDF, python?
Some pdf templates have images in them.
Example:-
When I read the dynamic form fields in the pdf using.
reader = PdfReader(pdf_path)
fields = reader.get_fields()
for field in fields:
field_name = ...
1
vote
0
answers
51
views
Pypdf merged pdfs wrong page atributes
When using pypdf merge function i get pdf file with invisible content. I found out taht coordinates of pages atributes mediBox and cropBox has some errors. Look like this:
/MediaBox [ 0 0 595 ...
0
votes
0
answers
32
views
Is there a way to scale up an image and rotate text in PyPDF
Working on a mini project to print tickets on an older printer, formatted the tickets to fit on the printer and print. But now they want the format of the ticket to be different. I need to scale up ...
0
votes
1
answer
85
views
How to merge two pages from a pdf file in Python such that they are one next to the other?
I'm basically trying to grab two pages and put them one next to the other using python since I want to print a booklet on some A4 pages. The issue I'm finding is that the translation of the page in ...
0
votes
1
answer
65
views
How to resize PDF pages to `Letter` using `pypdf`?
I noticed that the pypdf doesn't have a PaperSize of Letter (it only has A0-A8 & C4). I have PDF's of various page sizes that I want to standardize and scale to Letter size.
Is this possible with ...
1
vote
2
answers
163
views
How to extract the body of a pdf?
I want to extract the body of a pdf. By body I mean the file format that a pdf parser/reader uses to render the pdf. Any language would work, but if you could tell me how to do it in python or Java, ...
0
votes
0
answers
68
views
Crop PDF then Read Text from PDF
I am trying to avoid reading 'page ... of ...' on a pdf as it messes up with the other data being read.
I thought it would be easiest to just crop the margins out and then read the pdf.
I tried using ...
0
votes
0
answers
100
views
I want to scrape table from pdf file using pypdf or tabula into panda df
How can I scrape data using a Python script with the pypdf or tabula library? Specifically, I need to extract Capital names that are listed in a non-table format from the attached PDF. The desired ...
0
votes
0
answers
73
views
How to take a Python Dictionary and Put Data Into Existing PDF?
I have code where I'm taking fields from a PDF, turning them into a list, and then into a dictionary. (I'm not sure if fields->list->dictionary is necessary or if fields->dictionary makes ...
1
vote
2
answers
245
views
When using PyPDF2 for Python, how do I transfer data in CSV format to an existing PDF with blank form fields?
I am currently using the PyPDF2 extension with Python and have my data (which was originally a Google Form) and then downloaded as a CSV file and am hoping to copy this data into an existing PDF with ...
1
vote
1
answer
112
views
Is there a function in pypdf to get the page number of a field? (Python)
I'm trying to find an attribute or function that will return the page number/index of a field that I pass as an argument. E.g. get_field_page_number(field_name) -> int
I want to be able to get a ...
0
votes
0
answers
111
views
Python PDF page size
I am trying to get the page sizes of the pages in my PDF. I have tried using both PyPDF2 and pdfminer, I get the same results from both - 423.024x639.024 for artbox, cropbox, etc, and 459.048x675.048 ...
0
votes
1
answer
289
views
Create a blank page and add text content using PyPDF2: module 'PyPDF2' has no attribute 'pdf'
Using this method to add create a blank page, add text to it and then append the page to a pdf.
def add_text_to_blank_page(pdf_writer, text):
# Create a new blank page
page = PyPDF2._pdf....
0
votes
0
answers
97
views
Add image in Image Field with PDF Forms
I got PDF Forms with Text field and Image Field. How to I add image from Image field?
For text field in document pypdf that show great information and I success. But I fails to add image in Image ...
1
vote
1
answer
156
views
PyPDF does not give me the right image
I am writing a python program to merge multiple PDFs containing images into one PDF, with the option to select specific pages from PDF source files, specify the order and other things.
I'm using PyPDF ...
0
votes
0
answers
113
views
Python - Extract certain values from PDFs in a folder
I am using the below code to extract text from hundreds of PDF files in a specific folder:
from pypdf import PdfReader
import os
import glob
path = input("Enter the file path: ")
pattern = ...
0
votes
1
answer
445
views
Cleaner way to parse a PDF in Python
Looking to parse PDFs to glean relevant info.
Using pypdf and am able to extract text, but it's a bit of a slog formatting into something usable because it appears the PDFs are formatted and not ...
0
votes
1
answer
70
views
PyPDF2: When I combine pdf files some files are duplicated and others are missing
Sometimes when I try to combine Pdf files, a page would be duplicated where the next page
is supposed to be.
Here is the code I use to combine the pdf files.
pdfFiles = []
for filename in os....
0
votes
0
answers
108
views
How can I Insert data in PDF via python on the PDF form having input field ? I am using pypdf
I am trying to automate the PDF form input process on my PDF form having input fields. I have extracted the form fields and data structures via following method :
from pypdf import PdfWriter, ...
0
votes
0
answers
45
views
PyPDF 2, Writing New File, Unable To Read New File
I have a very basic PDF file. I am reading that PDF file, updating some of the fields of that file, and then writing a new file name with the code below.
from PyPDF2 import PdfReader, PdfWriter
from ...
0
votes
1
answer
585
views
How can I extract the PDF section/chapter titles with Python?
I want to add the page titles in the pdf to an array with a loop.I have tried many ways so far but I have not succeeded. How can it be done?
I tried to do it by selecting the first lines on the page, ...
0
votes
0
answers
66
views
Python module to get PDF text coordinates
Is there a Python module that has the possibility of returning contents of a PDF file as a list of bounding boxes, with top and left coordinates and text value? Something like Firefox has would be ...
0
votes
0
answers
21
views
mergeing pdfs with same name in 2 different folders error
i tried this code but i'm getting this error
FileNotFoundError: [Errno 2] No such file or directory: './first_folder/same_name.pdf'
from PyPDF2 import PdfMerger #pip install PyPDF2
def merge_pdf(...
1
vote
2
answers
394
views
How do I add a hyperlink to the top of each page in a PDF in Python?
We are posting scanned and OCRed documents on a website and need to add a link to each page so that people who find the pages via a search engine easily get to the parent index of related documents.
I'...
0
votes
0
answers
262
views
Extracting field labels and details from IRS XFA/AcroForm using Python
I am currently working with IRS forms (U.S. Internal Revenue Service), which are in PDF format, specifically XFA or AcroForm. My aim is to extract not only the field names but also the corresponding ...
0
votes
0
answers
118
views
PYPDF how to set restriction during pdf encryption
When i use pymupdf, i can set restriction based on the criteria i needs by using command. refer below example :
Using PYMUPDF :
perm = (fitz.PDF_PERM_PRINT # permit printing
)
but when i use pypdf, i ...
1
vote
1
answer
106
views
Why does copying text from this PDF give an N-1 Caesarean shift?
This is my first post here, but I was absolutely bewildered, and I need to know why it happens:
I was experimenting with pypdf, hoping to use it for a larger text analysis project, and in lieu of any ...
0
votes
0
answers
53
views
Extracting replies to comments in a PDF file and sorting them
I'm working on a project which I need to extract the comments on a PDF file and sort them based on their issuing date and their replies (if there's any). Currently I'm using PdfReader from pypdf ...
0
votes
1
answer
84
views
Keep selected pages from PDF
I have a pandas dataframe, pdf_summary, which is sorted and has 50 unique rows. Each row is a particular combination of file_pages. How could I create a folder and PDF for each file_name?
pdf_path = &...
2
votes
3
answers
217
views
pypdf: arrange pages of different pdfs in a single page as a grid
I have several pdf files 1.pdf, 2.pdf, ..., n.pdf, each with 10 pages, each page being the same size.
I want to create another file summary.pdf
containing only one page, with all the pages of all the ...
-1
votes
2
answers
431
views
Dealing with PDFs containing both tables and non-tabular data using Camelot PDF parser [closed]
I am using the Camelot PDF parsing library to extract data from PDF files, but I am facing an issue when the PDFs contain both tables and non-tabular data. Camelot seems to only extract table data and ...
0
votes
1
answer
205
views
setting initial view with pypdf
I try to set the initial view of a pdf to fit the entire page with this code:
from pypdf import PdfWriter
writer = PdfWriter(clone_from="test.pdf")
if writer.viewer_preferences is None:
...
0
votes
1
answer
159
views
How to copy text from a cell on Excel file to a PDF form with python?
I'm developing a Python script that reads a specific cell from an Excel file, say B6, and transcribes it to a text box on a PDF form, let's say the text box is called Formulary unite_3.
Python ...
1
vote
1
answer
566
views
How to draw a vertical line on a PDF in Python?
I am working on a project which requires parsing a PDF hosted online for relevant data. The pdf can be found here. I started by using tabula-py to parse the table.
However, because it is in a flat-...
1
vote
1
answer
451
views
Why am I getting two different sets of coordinates from parsing a pdf file?
So, I am trying to parse a PDF file (30000x2000 points), using Python, that has all kinds of data on it, tables, lines, text, notes, images, etc. The goal: find a certain text string on the pdf and ...
0
votes
0
answers
37
views
pypdf/PyPDF2: Unwanted character substitution in links
I have a script that converts links in PDFs. The old links point to local files and the new links point to URLs. If the script encounters a link that references a location in the same PDF (ex. Index ...
0
votes
1
answer
576
views
Flattened filled PDF form is 'of invalid format' on Android, and shows blank fields in Chrome extension
I'm using pypdf (3.17.4) to fill a fillable PDF then flatten the fields. The resulting PDF displays correctly in Acrobat Reader, but, not on my Samsung S9, and not in the Chrome extension on Windows:
...
1
vote
1
answer
629
views
Extracting Arabic data from PDF using PyPDF2
I wanted to write in python3 a function to extract data from Arabic pdf file that has 235 pages and size of 13.6mb focusing on extracting data from page 51 to 67 inclusive then filter the extracted ...
2
votes
2
answers
659
views
How to rotate+scale PDF pages around the center with pypdf?
I would like to rotate PDF pages around the center (other than just multiples of 90°) in a PDF document and optionally scale them to fit into the original page.
Here on StackOverflow, I found a few ...
0
votes
1
answer
92
views
Exclude page number from text when extracting from a PDF
I want to exclude the page number of a PDF from the actual text using pypdf package
from pypdf import PdfReader
reader = PdfReader("pdf-examples/kurdish-sample-2.pdf")
full_text = "&...
0
votes
0
answers
191
views
Extracting text from a PDF - python
I am new to Python and I am developing a program that takes a PDF file as input and converts it into text, I am using Python 3 and tried the PyPDF2 and PDFMiner.six packages.
For the first PDF file it ...
0
votes
1
answer
1k
views
Extracting data from PDFs into CSV [duplicate]
I would like to extract all data into CSV, available from pages 4 to 605 from this PDF. Some people kindly suggest me to use pypdf.
I don't know how to use it. The structure of the PDF is complicated. ...
0
votes
3
answers
2k
views
How do I use PyPDF2 to read and display the contents of my PDF when ran?
I have a dummy pdf that has words on it. The course I am using to learn uses PyPDF2 on python. Is there a way for PyPDF2 to actually read the words on the pdf rather than give me objects?
This is the ...
2
votes
0
answers
1k
views
PdfReadWarning: Advanced encoding /GBK-EUC-H not implemented yet
PdfReadWarning: Advanced encoding /GBK-EUC-H not implemented yet
how to solve this, anyone knows?
I haven't tried anything bcs I don't know what directory should I download the missing font such as: ...
1
vote
0
answers
285
views
How to make pypdf2 Annotations printable
I'm Using the PyPDF2 Library to do some annotations to a pdf. Here's the code that I'm using
from PyPDF2 import PdfReader, PdfWriter
from PyPDF2.generic import AnnotationBuilder
reader = PdfReader(&...