Skip to main content

All Questions

Tagged with
Filter by
Sorted by
Tagged with
0 votes
1 answer
43 views

pypdf or pikepdf advice needed on bookmarks

I am sorry but I am unable to understand how to rearrange bookmarks in PDF document. I have PDF document with medical records which was created by importing new and new items from individual ...
Vladimir Buzalka's user avatar
0 votes
0 answers
44 views

PDF form checkboxes checking using python and pdfrw

Has anyone experience with pdf form checkbox checking? The case is that within a django application, based on annotations of pdf forms I map and identify checkboxes I want to either check, or leave ...
Robert Soroka's user avatar
1 vote
0 answers
29 views

Combine Duplicate Fonts When Appending PDF Files - Python

I am trying to combine 450 PDFs into a single PDF. There are only a few unique fonts among all the files. Once all the files are combined, I look at the structure though pdfcrowd.com and I can see the ...
Ryan Schunk's user avatar
0 votes
1 answer
39 views

Correct format of ICC-based /ColorSpace in PDF

I am generating PDF files on-the-fly. The files contain JPEG images in the Adobe RGB (1998) colourspace, with the profile embedded. The PDF generation toolkit embeds the images correctly, but sets the ...
tobygriffin's user avatar
  • 5,421
0 votes
0 answers
52 views

How to add a form field to an existing pdf with python

I've been tasked with creating python script that will add a form field to an existing pdf file. Seemed rather straight forward but I've hit a wall. The form field is being added (apparently not ...
SeminoleDog's user avatar
0 votes
0 answers
48 views

How to read and update image form field in PDF, python?

Some pdf templates have images in them. Example:- When I read the dynamic form fields in the pdf using. reader = PdfReader(pdf_path) fields = reader.get_fields() for field in fields: field_name = ...
Rahul's user avatar
  • 1,005
1 vote
0 answers
51 views

Pypdf merged pdfs wrong page atributes

When using pypdf merge function i get pdf file with invisible content. I found out taht coordinates of pages atributes mediBox and cropBox has some errors. Look like this: /MediaBox [ 0 0 595 ...
Roman's user avatar
  • 11
0 votes
0 answers
32 views

Is there a way to scale up an image and rotate text in PyPDF

Working on a mini project to print tickets on an older printer, formatted the tickets to fit on the printer and print. But now they want the format of the ticket to be different. I need to scale up ...
hutch's user avatar
  • 1
0 votes
1 answer
85 views

How to merge two pages from a pdf file in Python such that they are one next to the other?

I'm basically trying to grab two pages and put them one next to the other using python since I want to print a booklet on some A4 pages. The issue I'm finding is that the translation of the page in ...
JGF's user avatar
  • 11
0 votes
1 answer
65 views

How to resize PDF pages to `Letter` using `pypdf`?

I noticed that the pypdf doesn't have a PaperSize of Letter (it only has A0-A8 & C4). I have PDF's of various page sizes that I want to standardize and scale to Letter size. Is this possible with ...
ericOnline's user avatar
  • 2,008
1 vote
2 answers
163 views

How to extract the body of a pdf?

I want to extract the body of a pdf. By body I mean the file format that a pdf parser/reader uses to render the pdf. Any language would work, but if you could tell me how to do it in python or Java, ...
somuchsonal's user avatar
0 votes
0 answers
68 views

Crop PDF then Read Text from PDF

I am trying to avoid reading 'page ... of ...' on a pdf as it messes up with the other data being read. I thought it would be easiest to just crop the margins out and then read the pdf. I tried using ...
user26689279's user avatar
0 votes
0 answers
100 views

I want to scrape table from pdf file using pypdf or tabula into panda df

How can I scrape data using a Python script with the pypdf or tabula library? Specifically, I need to extract Capital names that are listed in a non-table format from the attached PDF. The desired ...
Sanju M's user avatar
  • 15
0 votes
0 answers
73 views

How to take a Python Dictionary and Put Data Into Existing PDF?

I have code where I'm taking fields from a PDF, turning them into a list, and then into a dictionary. (I'm not sure if fields->list->dictionary is necessary or if fields->dictionary makes ...
Felicia's user avatar
  • 13
1 vote
2 answers
245 views

When using PyPDF2 for Python, how do I transfer data in CSV format to an existing PDF with blank form fields?

I am currently using the PyPDF2 extension with Python and have my data (which was originally a Google Form) and then downloaded as a CSV file and am hoping to copy this data into an existing PDF with ...
Felicia's user avatar
  • 13
1 vote
1 answer
112 views

Is there a function in pypdf to get the page number of a field? (Python)

I'm trying to find an attribute or function that will return the page number/index of a field that I pass as an argument. E.g. get_field_page_number(field_name) -> int I want to be able to get a ...
mevans_fsi's user avatar
0 votes
0 answers
111 views

Python PDF page size

I am trying to get the page sizes of the pages in my PDF. I have tried using both PyPDF2 and pdfminer, I get the same results from both - 423.024x639.024 for artbox, cropbox, etc, and 459.048x675.048 ...
calwex718's user avatar
0 votes
1 answer
289 views

Create a blank page and add text content using PyPDF2: module 'PyPDF2' has no attribute 'pdf'

Using this method to add create a blank page, add text to it and then append the page to a pdf. def add_text_to_blank_page(pdf_writer, text): # Create a new blank page page = PyPDF2._pdf....
Dhruv's user avatar
  • 645
0 votes
0 answers
97 views

Add image in Image Field with PDF Forms

I got PDF Forms with Text field and Image Field. How to I add image from Image field? For text field in document pypdf that show great information and I success. But I fails to add image in Image ...
aideed programmer's user avatar
1 vote
1 answer
156 views

PyPDF does not give me the right image

I am writing a python program to merge multiple PDFs containing images into one PDF, with the option to select specific pages from PDF source files, specify the order and other things. I'm using PyPDF ...
Andreas Kågedal's user avatar
0 votes
0 answers
113 views

Python - Extract certain values from PDFs in a folder

I am using the below code to extract text from hundreds of PDF files in a specific folder: from pypdf import PdfReader import os import glob path = input("Enter the file path: ") pattern = ...
Mr Cs's user avatar
  • 1
0 votes
1 answer
445 views

Cleaner way to parse a PDF in Python

Looking to parse PDFs to glean relevant info. Using pypdf and am able to extract text, but it's a bit of a slog formatting into something usable because it appears the PDFs are formatted and not ...
Chris's user avatar
  • 1,702
0 votes
1 answer
70 views

PyPDF2: When I combine pdf files some files are duplicated and others are missing

Sometimes when I try to combine Pdf files, a page would be duplicated where the next page is supposed to be. Here is the code I use to combine the pdf files. pdfFiles = [] for filename in os....
Reach Miami's user avatar
0 votes
0 answers
108 views

How can I Insert data in PDF via python on the PDF form having input field ? I am using pypdf

I am trying to automate the PDF form input process on my PDF form having input fields. I have extracted the form fields and data structures via following method : from pypdf import PdfWriter, ...
Madhav Dhungana's user avatar
0 votes
0 answers
45 views

PyPDF 2, Writing New File, Unable To Read New File

I have a very basic PDF file. I am reading that PDF file, updating some of the fields of that file, and then writing a new file name with the code below. from PyPDF2 import PdfReader, PdfWriter from ...
Josh's user avatar
  • 380
0 votes
1 answer
585 views

How can I extract the PDF section/chapter titles with Python?

I want to add the page titles in the pdf to an array with a loop.I have tried many ways so far but I have not succeeded. How can it be done? I tried to do it by selecting the first lines on the page, ...
gofQ's user avatar
  • 1
0 votes
0 answers
66 views

Python module to get PDF text coordinates

Is there a Python module that has the possibility of returning contents of a PDF file as a list of bounding boxes, with top and left coordinates and text value? Something like Firefox has would be ...
DoctorEvil's user avatar
0 votes
0 answers
21 views

mergeing pdfs with same name in 2 different folders error

i tried this code but i'm getting this error FileNotFoundError: [Errno 2] No such file or directory: './first_folder/same_name.pdf' from PyPDF2 import PdfMerger #pip install PyPDF2 def merge_pdf(...
Taha Hussein's user avatar
1 vote
2 answers
394 views

How do I add a hyperlink to the top of each page in a PDF in Python?

We are posting scanned and OCRed documents on a website and need to add a link to each page so that people who find the pages via a search engine easily get to the parent index of related documents. I'...
Mark Olson's user avatar
0 votes
0 answers
262 views

Extracting field labels and details from IRS XFA/AcroForm using Python

I am currently working with IRS forms (U.S. Internal Revenue Service), which are in PDF format, specifically XFA or AcroForm. My aim is to extract not only the field names but also the corresponding ...
Gopa's user avatar
  • 1
0 votes
0 answers
118 views

PYPDF how to set restriction during pdf encryption

When i use pymupdf, i can set restriction based on the criteria i needs by using command. refer below example : Using PYMUPDF : perm = (fitz.PDF_PERM_PRINT # permit printing ) but when i use pypdf, i ...
daniel's user avatar
  • 1
1 vote
1 answer
106 views

Why does copying text from this PDF give an N-1 Caesarean shift?

This is my first post here, but I was absolutely bewildered, and I need to know why it happens: I was experimenting with pypdf, hoping to use it for a larger text analysis project, and in lieu of any ...
Don D. Dizzle's user avatar
0 votes
0 answers
53 views

Extracting replies to comments in a PDF file and sorting them

I'm working on a project which I need to extract the comments on a PDF file and sort them based on their issuing date and their replies (if there's any). Currently I'm using PdfReader from pypdf ...
Amir's user avatar
  • 1
0 votes
1 answer
84 views

Keep selected pages from PDF

I have a pandas dataframe, pdf_summary, which is sorted and has 50 unique rows. Each row is a particular combination of file_pages. How could I create a folder and PDF for each file_name? pdf_path = &...
asd's user avatar
  • 1,309
2 votes
3 answers
217 views

pypdf: arrange pages of different pdfs in a single page as a grid

I have several pdf files 1.pdf, 2.pdf, ..., n.pdf, each with 10 pages, each page being the same size. I want to create another file summary.pdf containing only one page, with all the pages of all the ...
Tom M. Ragonneau's user avatar
-1 votes
2 answers
431 views

Dealing with PDFs containing both tables and non-tabular data using Camelot PDF parser [closed]

I am using the Camelot PDF parsing library to extract data from PDF files, but I am facing an issue when the PDFs contain both tables and non-tabular data. Camelot seems to only extract table data and ...
Pankaj Jaiswal's user avatar
0 votes
1 answer
205 views

setting initial view with pypdf

I try to set the initial view of a pdf to fit the entire page with this code: from pypdf import PdfWriter writer = PdfWriter(clone_from="test.pdf") if writer.viewer_preferences is None: ...
Frans van Haandel's user avatar
0 votes
1 answer
159 views

How to copy text from a cell on Excel file to a PDF form with python?

I'm developing a Python script that reads a specific cell from an Excel file, say B6, and transcribes it to a text box on a PDF form, let's say the text box is called Formulary unite_3. Python ...
L30_P3reZ's user avatar
1 vote
1 answer
566 views

How to draw a vertical line on a PDF in Python?

I am working on a project which requires parsing a PDF hosted online for relevant data. The pdf can be found here. I started by using tabula-py to parse the table. However, because it is in a flat-...
Noah Pragin's user avatar
1 vote
1 answer
451 views

Why am I getting two different sets of coordinates from parsing a pdf file?

So, I am trying to parse a PDF file (30000x2000 points), using Python, that has all kinds of data on it, tables, lines, text, notes, images, etc. The goal: find a certain text string on the pdf and ...
Yurii Gul's user avatar
0 votes
0 answers
37 views

pypdf/PyPDF2: Unwanted character substitution in links

I have a script that converts links in PDFs. The old links point to local files and the new links point to URLs. If the script encounters a link that references a location in the same PDF (ex. Index ...
ZLL's user avatar
  • 3
0 votes
1 answer
576 views

Flattened filled PDF form is 'of invalid format' on Android, and shows blank fields in Chrome extension

I'm using pypdf (3.17.4) to fill a fillable PDF then flatten the fields. The resulting PDF displays correctly in Acrobat Reader, but, not on my Samsung S9, and not in the Chrome extension on Windows: ...
Tom Grundy's user avatar
1 vote
1 answer
629 views

Extracting Arabic data from PDF using PyPDF2

I wanted to write in python3 a function to extract data from Arabic pdf file that has 235 pages and size of 13.6mb focusing on extracting data from page 51 to 67 inclusive then filter the extracted ...
Mahmoud Ezz's user avatar
2 votes
2 answers
659 views

How to rotate+scale PDF pages around the center with pypdf?

I would like to rotate PDF pages around the center (other than just multiples of 90°) in a PDF document and optionally scale them to fit into the original page. Here on StackOverflow, I found a few ...
theozh's user avatar
  • 26.1k
0 votes
1 answer
92 views

Exclude page number from text when extracting from a PDF

I want to exclude the page number of a PDF from the actual text using pypdf package from pypdf import PdfReader reader = PdfReader("pdf-examples/kurdish-sample-2.pdf") full_text = "&...
Hama Sabah's user avatar
0 votes
0 answers
191 views

Extracting text from a PDF - python

I am new to Python and I am developing a program that takes a PDF file as input and converts it into text, I am using Python 3 and tried the PyPDF2 and PDFMiner.six packages. For the first PDF file it ...
Sana'a Al-ahdal's user avatar
0 votes
1 answer
1k views

Extracting data from PDFs into CSV [duplicate]

I would like to extract all data into CSV, available from pages 4 to 605 from this PDF. Some people kindly suggest me to use pypdf. I don't know how to use it. The structure of the PDF is complicated. ...
Michael Picazo's user avatar
0 votes
3 answers
2k views

How do I use PyPDF2 to read and display the contents of my PDF when ran?

I have a dummy pdf that has words on it. The course I am using to learn uses PyPDF2 on python. Is there a way for PyPDF2 to actually read the words on the pdf rather than give me objects? This is the ...
Alex S.'s user avatar
2 votes
0 answers
1k views

PdfReadWarning: Advanced encoding /GBK-EUC-H not implemented yet

PdfReadWarning: Advanced encoding /GBK-EUC-H not implemented yet how to solve this, anyone knows? I haven't tried anything bcs I don't know what directory should I download the missing font such as: ...
Shuan's user avatar
  • 21
1 vote
0 answers
285 views

How to make pypdf2 Annotations printable

I'm Using the PyPDF2 Library to do some annotations to a pdf. Here's the code that I'm using from PyPDF2 import PdfReader, PdfWriter from PyPDF2.generic import AnnotationBuilder reader = PdfReader(&...
rav2001's user avatar
  • 377

1
2 3 4 5
14