Newest 'pdf+pypdf' Questions

0 votes

1 answer

43 views

pypdf or pikepdf advice needed on bookmarks

I am sorry but I am unable to understand how to rearrange bookmarks in PDF document. I have PDF document with medical records which was created by importing new and new items from individual ...

Vladimir Buzalka

43

asked Feb 14 at 7:13

0 votes

0 answers

44 views

PDF form checkboxes checking using python and pdfrw

Has anyone experience with pdf form checkbox checking? The case is that within a django application, based on annotations of pdf forms I map and identify checkboxes I want to either check, or leave ...

Robert Soroka

41

asked Jan 31 at 15:12

1 vote

0 answers

29 views

Combine Duplicate Fonts When Appending PDF Files - Python

I am trying to combine 450 PDFs into a single PDF. There are only a few unique fonts among all the files. Once all the files are combined, I look at the structure though pdfcrowd.com and I can see the ...

Ryan Schunk

11

asked Jan 28 at 2:51

0 votes

1 answer

39 views

Correct format of ICC-based /ColorSpace in PDF

I am generating PDF files on-the-fly. The files contain JPEG images in the Adobe RGB (1998) colourspace, with the profile embedded. The PDF generation toolkit embeds the images correctly, but sets the ...

tobygriffin

5,421

asked Jan 27 at 22:54

0 votes

0 answers

52 views

How to add a form field to an existing pdf with python

I've been tasked with creating python script that will add a form field to an existing pdf file. Seemed rather straight forward but I've hit a wall. The form field is being added (apparently not ...

SeminoleDog

79

asked Jan 18 at 18:46

0 votes

0 answers

48 views

How to read and update image form field in PDF, python?

Some pdf templates have images in them. Example:- When I read the dynamic form fields in the pdf using. reader = PdfReader(pdf_path) fields = reader.get_fields() for field in fields: field_name = ...

Rahul

1,005

asked Jan 17 at 7:32

1 vote

0 answers

51 views

Pypdf merged pdfs wrong page atributes

When using pypdf merge function i get pdf file with invisible content. I found out taht coordinates of pages atributes mediBox and cropBox has some errors. Look like this: /MediaBox [ 0 0 595 ...

Roman

11

asked Dec 29, 2024 at 17:39

0 votes

0 answers

32 views

Is there a way to scale up an image and rotate text in PyPDF

Working on a mini project to print tickets on an older printer, formatted the tickets to fit on the printer and print. But now they want the format of the ticket to be different. I need to scale up ...

hutch

1

asked Sep 17, 2024 at 14:23

0 votes

1 answer

85 views

How to merge two pages from a pdf file in Python such that they are one next to the other?

I'm basically trying to grab two pages and put them one next to the other using python since I want to print a booklet on some A4 pages. The issue I'm finding is that the translation of the page in ...

JGF

11

asked Sep 6, 2024 at 6:44

0 votes

1 answer

65 views

How to resize PDF pages to `Letter` using `pypdf`?

I noticed that the pypdf doesn't have a PaperSize of Letter (it only has A0-A8 & C4). I have PDF's of various page sizes that I want to standardize and scale to Letter size. Is this possible with ...

ericOnline

2,008

asked Sep 2, 2024 at 5:21

1 vote

2 answers

163 views

How to extract the body of a pdf?

I want to extract the body of a pdf. By body I mean the file format that a pdf parser/reader uses to render the pdf. Any language would work, but if you could tell me how to do it in python or Java, ...

somuchsonal

21

asked Aug 14, 2024 at 9:46

0 votes

0 answers

68 views

Crop PDF then Read Text from PDF

I am trying to avoid reading 'page ... of ...' on a pdf as it messes up with the other data being read. I thought it would be easiest to just crop the margins out and then read the pdf. I tried using ...

user26689279

11

asked Aug 8, 2024 at 4:01

0 votes

0 answers

100 views

I want to scrape table from pdf file using pypdf or tabula into panda df

How can I scrape data using a Python script with the pypdf or tabula library? Specifically, I need to extract Capital names that are listed in a non-table format from the attached PDF. The desired ...

Sanju M

15

asked Jul 23, 2024 at 6:22

0 votes

0 answers

73 views

How to take a Python Dictionary and Put Data Into Existing PDF?

I have code where I'm taking fields from a PDF, turning them into a list, and then into a dictionary. (I'm not sure if fields->list->dictionary is necessary or if fields->dictionary makes ...

Felicia

13

asked Jul 23, 2024 at 2:56

1 vote

2 answers

245 views

When using PyPDF2 for Python, how do I transfer data in CSV format to an existing PDF with blank form fields?

I am currently using the PyPDF2 extension with Python and have my data (which was originally a Google Form) and then downloaded as a CSV file and am hoping to copy this data into an existing PDF with ...

Felicia

13

asked Jul 18, 2024 at 0:18

1 vote

1 answer

112 views

Is there a function in pypdf to get the page number of a field? (Python)

I'm trying to find an attribute or function that will return the page number/index of a field that I pass as an argument. E.g. get_field_page_number(field_name) -> int I want to be able to get a ...

mevans_fsi

21

asked Jul 3, 2024 at 20:45

0 votes

0 answers

111 views

Python PDF page size

I am trying to get the page sizes of the pages in my PDF. I have tried using both PyPDF2 and pdfminer, I get the same results from both - 423.024x639.024 for artbox, cropbox, etc, and 459.048x675.048 ...

calwex718

87

asked Jul 1, 2024 at 18:07

0 votes

1 answer

289 views

Create a blank page and add text content using PyPDF2: module 'PyPDF2' has no attribute 'pdf'

Using this method to add create a blank page, add text to it and then append the page to a pdf. def add_text_to_blank_page(pdf_writer, text): # Create a new blank page page = PyPDF2._pdf....

Dhruv

645

asked May 29, 2024 at 13:48

0 votes

0 answers

97 views

Add image in Image Field with PDF Forms

I got PDF Forms with Text field and Image Field. How to I add image from Image field? For text field in document pypdf that show great information and I success. But I fails to add image in Image ...

aideed programmer

1

asked May 29, 2024 at 3:40

1 vote

1 answer

156 views

PyPDF does not give me the right image

I am writing a python program to merge multiple PDFs containing images into one PDF, with the option to select specific pages from PDF source files, specify the order and other things. I'm using PyPDF ...

Andreas Kågedal

11

asked May 20, 2024 at 21:12

0 votes

0 answers

113 views

Python - Extract certain values from PDFs in a folder

I am using the below code to extract text from hundreds of PDF files in a specific folder: from pypdf import PdfReader import os import glob path = input("Enter the file path: ") pattern = ...

Mr Cs

1

asked May 6, 2024 at 15:37

0 votes

1 answer

445 views

Cleaner way to parse a PDF in Python

Looking to parse PDFs to glean relevant info. Using pypdf and am able to extract text, but it's a bit of a slog formatting into something usable because it appears the PDFs are formatted and not ...

Chris

1,702

asked May 1, 2024 at 19:39

0 votes

1 answer

70 views

PyPDF2: When I combine pdf files some files are duplicated and others are missing

Sometimes when I try to combine Pdf files, a page would be duplicated where the next page is supposed to be. Here is the code I use to combine the pdf files. pdfFiles = [] for filename in os....

Reach Miami

1

asked Apr 29, 2024 at 19:33

0 votes

0 answers

108 views

How can I Insert data in PDF via python on the PDF form having input field ? I am using pypdf

I am trying to automate the PDF form input process on my PDF form having input fields. I have extracted the form fields and data structures via following method : from pypdf import PdfWriter, ...

Madhav Dhungana

556

asked Apr 29, 2024 at 4:12

0 votes

0 answers

45 views

PyPDF 2, Writing New File, Unable To Read New File

I have a very basic PDF file. I am reading that PDF file, updating some of the fields of that file, and then writing a new file name with the code below. from PyPDF2 import PdfReader, PdfWriter from ...

Josh

380

asked Apr 24, 2024 at 4:35

0 votes

1 answer

585 views

How can I extract the PDF section/chapter titles with Python?

I want to add the page titles in the pdf to an array with a loop.I have tried many ways so far but I have not succeeded. How can it be done? I tried to do it by selecting the first lines on the page, ...

gofQ

1

asked Apr 19, 2024 at 19:46

0 votes

0 answers

66 views

Python module to get PDF text coordinates

Is there a Python module that has the possibility of returning contents of a PDF file as a list of bounding boxes, with top and left coordinates and text value? Something like Firefox has would be ...

DoctorEvil

473

asked Apr 19, 2024 at 13:01

0 votes

0 answers

21 views

mergeing pdfs with same name in 2 different folders error

i tried this code but i'm getting this error FileNotFoundError: [Errno 2] No such file or directory: './first_folder/same_name.pdf' from PyPDF2 import PdfMerger #pip install PyPDF2 def merge_pdf(...

Taha Hussein

1

asked Apr 9, 2024 at 21:02

1 vote

2 answers

394 views

How do I add a hyperlink to the top of each page in a PDF in Python?

We are posting scanned and OCRed documents on a website and need to add a link to each page so that people who find the pages via a search engine easily get to the parent index of related documents. I'...

Mark Olson

148

asked Mar 17, 2024 at 19:01

0 votes

0 answers

262 views

Extracting field labels and details from IRS XFA/AcroForm using Python

I am currently working with IRS forms (U.S. Internal Revenue Service), which are in PDF format, specifically XFA or AcroForm. My aim is to extract not only the field names but also the corresponding ...

Gopa

1

asked Mar 15, 2024 at 11:30

0 votes

0 answers

118 views

PYPDF how to set restriction during pdf encryption

When i use pymupdf, i can set restriction based on the criteria i needs by using command. refer below example : Using PYMUPDF : perm = (fitz.PDF_PERM_PRINT # permit printing ) but when i use pypdf, i ...

daniel

1

asked Mar 13, 2024 at 5:21

1 vote

1 answer

106 views

Why does copying text from this PDF give an N-1 Caesarean shift?

This is my first post here, but I was absolutely bewildered, and I need to know why it happens: I was experimenting with pypdf, hoping to use it for a larger text analysis project, and in lieu of any ...

Don D. Dizzle

11

asked Mar 12, 2024 at 21:49

0 votes

0 answers

53 views

Extracting replies to comments in a PDF file and sorting them

I'm working on a project which I need to extract the comments on a PDF file and sort them based on their issuing date and their replies (if there's any). Currently I'm using PdfReader from pypdf ...

Amir

1

asked Mar 8, 2024 at 8:57

0 votes

1 answer

84 views

Keep selected pages from PDF

I have a pandas dataframe, pdf_summary, which is sorted and has 50 unique rows. Each row is a particular combination of file_pages. How could I create a folder and PDF for each file_name? pdf_path = &...

asd

1,309

asked Mar 7, 2024 at 7:21

2 votes

3 answers

217 views

pypdf: arrange pages of different pdfs in a single page as a grid

I have several pdf files 1.pdf, 2.pdf, ..., n.pdf, each with 10 pages, each page being the same size. I want to create another file summary.pdf containing only one page, with all the pages of all the ...

Tom M. Ragonneau

75

asked Mar 1, 2024 at 9:27

-1 votes

2 answers

431 views

Dealing with PDFs containing both tables and non-tabular data using Camelot PDF parser [closed]

I am using the Camelot PDF parsing library to extract data from PDF files, but I am facing an issue when the PDFs contain both tables and non-tabular data. Camelot seems to only extract table data and ...

Pankaj Jaiswal

91

asked Feb 24, 2024 at 4:58

0 votes

1 answer

205 views

setting initial view with pypdf

I try to set the initial view of a pdf to fit the entire page with this code: from pypdf import PdfWriter writer = PdfWriter(clone_from="test.pdf") if writer.viewer_preferences is None: ...

Frans van Haandel

3

asked Feb 18, 2024 at 16:27

0 votes

1 answer

159 views

How to copy text from a cell on Excel file to a PDF form with python?

I'm developing a Python script that reads a specific cell from an Excel file, say B6, and transcribes it to a text box on a PDF form, let's say the text box is called Formulary unite_3. Python ...

L30_P3reZ

11

asked Feb 16, 2024 at 23:07

1 vote

1 answer

566 views

How to draw a vertical line on a PDF in Python?

I am working on a project which requires parsing a PDF hosted online for relevant data. The pdf can be found here. I started by using tabula-py to parse the table. However, because it is in a flat-...

Noah Pragin

35

asked Feb 15, 2024 at 22:16

1 vote

1 answer

451 views

Why am I getting two different sets of coordinates from parsing a pdf file?

So, I am trying to parse a PDF file (30000x2000 points), using Python, that has all kinds of data on it, tables, lines, text, notes, images, etc. The goal: find a certain text string on the pdf and ...

Yurii Gul

13

asked Feb 14, 2024 at 21:11

0 votes

0 answers

37 views

pypdf/PyPDF2: Unwanted character substitution in links

I have a script that converts links in PDFs. The old links point to local files and the new links point to URLs. If the script encounters a link that references a location in the same PDF (ex. Index ...

ZLL

3

asked Feb 13, 2024 at 20:38

0 votes

1 answer

576 views

Flattened filled PDF form is 'of invalid format' on Android, and shows blank fields in Chrome extension

I'm using pypdf (3.17.4) to fill a fillable PDF then flatten the fields. The resulting PDF displays correctly in Acrobat Reader, but, not on my Samsung S9, and not in the Chrome extension on Windows: ...

Tom Grundy

826

asked Jan 22, 2024 at 0:53

1 vote

1 answer

629 views

Extracting Arabic data from PDF using PyPDF2

I wanted to write in python3 a function to extract data from Arabic pdf file that has 235 pages and size of 13.6mb focusing on extracting data from page 51 to 67 inclusive then filter the extracted ...

Mahmoud Ezz

15

asked Jan 13, 2024 at 11:17

2 votes

2 answers

659 views

How to rotate+scale PDF pages around the center with pypdf?

I would like to rotate PDF pages around the center (other than just multiples of 90°) in a PDF document and optionally scale them to fit into the original page. Here on StackOverflow, I found a few ...

theozh

26.1k

asked Jan 10, 2024 at 10:46

0 votes

1 answer

92 views

Exclude page number from text when extracting from a PDF

I want to exclude the page number of a PDF from the actual text using pypdf package from pypdf import PdfReader reader = PdfReader("pdf-examples/kurdish-sample-2.pdf") full_text = "&...

Hama Sabah

434

asked Jan 6, 2024 at 12:26

0 votes

0 answers

191 views

Extracting text from a PDF - python

I am new to Python and I am developing a program that takes a PDF file as input and converts it into text, I am using Python 3 and tried the PyPDF2 and PDFMiner.six packages. For the first PDF file it ...

Sana'a Al-ahdal

1,780

asked Dec 24, 2023 at 19:41

0 votes

1 answer

1k views

Extracting data from PDFs into CSV [duplicate]

I would like to extract all data into CSV, available from pages 4 to 605 from this PDF. Some people kindly suggest me to use pypdf. I don't know how to use it. The structure of the PDF is complicated. ...

Michael Picazo

15

asked Dec 11, 2023 at 16:25

0 votes

3 answers

2k views

How do I use PyPDF2 to read and display the contents of my PDF when ran?

I have a dummy pdf that has words on it. The course I am using to learn uses PyPDF2 on python. Is there a way for PyPDF2 to actually read the words on the pdf rather than give me objects? This is the ...

Alex S.

7

asked Dec 10, 2023 at 14:33

2 votes

0 answers

1k views

PdfReadWarning: Advanced encoding /GBK-EUC-H not implemented yet

PdfReadWarning: Advanced encoding /GBK-EUC-H not implemented yet how to solve this, anyone knows? I haven't tried anything bcs I don't know what directory should I download the missing font such as: ...

Shuan

21

asked Dec 10, 2023 at 13:26

1 vote

0 answers

285 views

How to make pypdf2 Annotations printable

I'm Using the PyPDF2 Library to do some annotations to a pdf. Here's the code that I'm using from PyPDF2 import PdfReader, PdfWriter from PyPDF2.generic import AnnotationBuilder reader = PdfReader(&...

rav2001

377

asked Dec 10, 2023 at 4:57

Collectives™ on Stack Overflow

All Questions

Related Tags