Newest 'pdf+python' Questions

-1 votes

3 answers

60 views

How to generate a PDF with a grid of images per page?

Our work involves visually inspecting a number of plots together. All plots of same size. we want to print them in pages to study. Something like a 8.5"x11" paper with 1" margin gives ...

BiGYaN

7,177

asked 16 hours ago

0 votes

0 answers

18 views

How to suppress popup dialog box when converting pdf to docx using pywin32

I'm running a python script on a windows laptop to convert some sample pdf files to docx. However, for each file, a dialog box pops up that prompts me to click OK when the script tries to convert said ...

ClusterPhuck69

1

asked yesterday

0 votes

1 answer

49 views

How to create a searchable PDF using Python and Selenium?

I want to create a program like FireShot (premium version) to take a webpage on chromedriver and convert it into a pdf. Currently this is the code I came up with: import time import os import glob ...

salt lake

33

asked 2 days ago

0 votes

0 answers

47 views

Page number in PyMuPDF multiprocessing with extract_text

So in pymupdf documentation states that PyMuPDF does not support running on multiple threads So they use multiprocessing, and they do this weird thing with segments in example code: seg_size = int(...

Michał Darowny

414

asked Feb 27 at 9:26

2 votes

0 answers

55 views

reserve and Reapply PDF Layout When Editing Text and Images with PyMuPDF and PyQt6

I'm working on a PyQt6-based PDF editor using PyMuPDF (fitz). My goal is to extract all text and images from a PDF while preserving their original positions and dimensions, allow users to edit/move ...

Yousef Hashem

57

asked Feb 19 at 14:43

0 votes

2 answers

169 views

Extract tables from PDF files

I am conducting research on p-hacking, which requires accurately extracting tables from published academic papers. I have downloaded a large number of PDF files for this purpose. So far, I have tried ...

Buoyant Xu

77

asked Feb 16 at 12:41

0 votes

0 answers

21 views

fpdf2 and multi_cell with python

I'm trying to create a PDF document. There is a long and short texts. Word wrapping exists only in multi_cell (though it is drawn as a “ladder” by default). Due to the fact that you need to write text ...

Stanislav

3

asked Feb 14 at 13:23

0 votes

1 answer

43 views

pypdf or pikepdf advice needed on bookmarks

I am sorry but I am unable to understand how to rearrange bookmarks in PDF document. I have PDF document with medical records which was created by importing new and new items from individual ...

Vladimir Buzalka

43

asked Feb 14 at 7:13

0 votes

0 answers

58 views

Printing Thermal Receipt Bills using Python

I am trying to write a small script to print the pdf file generated via ReportLabs, Unfortunately, I do not find much inputs for my requirements. I also tried to convert my PDF into an Image object ...

Knowledge thirst

1

asked Feb 13 at 2:55

0 votes

0 answers

28 views

Integrating Print Settings into a PDF

How can I embed information in a PDF so that the printer knows which tray to use for the paper? There used to be an old tool that unfortunately no longer works for us. It inserted such containers, ...

Patrick

1

asked Feb 12 at 13:30

0 votes

1 answer

52 views

How to Convert a PDF Table with Thousands of Rows into JSON in React

I am working on a project where I need to convert a PDF containing a large table (thousands of rows) into a JSON Array of object. The PDF has a table with headers that should be used as keys in the ...

Manu H N

1

asked Feb 4 at 8:18

0 votes

1 answer

42 views

Combine plots from separate pdfs into one in python

I have created several plots and saved each one separately in its own PDF file using the following line of code: plt.savefig('/path/Plot1.pdf', format='pdf', bbox_inches='tight', dpi=600) Now, I need ...

Programming Noob

1,332

asked Feb 3 at 11:39

0 votes

0 answers

44 views

PDF form checkboxes checking using python and pdfrw

Has anyone experience with pdf form checkbox checking? The case is that within a django application, based on annotations of pdf forms I map and identify checkboxes I want to either check, or leave ...

Robert Soroka

41

asked Jan 31 at 15:12

0 votes

1 answer

51 views

How can I force multi-line text inside a PDF form field using Python

😊 I'm working on filling out a PDF form programmatically using Python. More especifically a T3 for from Canada's CRA. Form is a fill-in form. I'm having trouble getting multi-line text to display ...

robis1985

33

asked Jan 31 at 1:19

0 votes

0 answers

68 views

PDF Scraping in Python

I am having trouble scraping certain data from PDF files in Python. There are no console errors, but when the CSV is produced, the columns Owner's First Name - Zip Code are either filled with the ...

user29394340

1

asked Jan 29 at 15:15

0 votes

0 answers

69 views

How to export Jupyter notebook to pdf having installed TeX Live included it in the path

I am trying to pdf my Jupyter notebook however I keep getting this error: [error] If you have not installed xelatex (TeX), you will need to do so before you can export to PDF. For further instructions,...

Kevin

47

asked Jan 25 at 15:13

0 votes

0 answers

31 views

How to insert an image in a blank position in a pdf file

The following script inserts an image into a PDF file using Python (Python AttributeError: 'Page' object has no attribute 'insertImage'). I would like to identify a blank space in the ...

Silvio Júnior

19

asked Jan 22 at 13:05

1 vote

1 answer

51 views

Convert a PDF to a PNG with transparency

My goal is to obtain a PNG file with a transparent background from a PDF file. The convert tool can do the job: $ convert test.pdf test.png $ file test.png test.png: PNG image data, 595 x 842, 8-bit ...

qouify

3,920

asked Jan 19 at 16:20

0 votes

0 answers

52 views

How to add a form field to an existing pdf with python

I've been tasked with creating python script that will add a form field to an existing pdf file. Seemed rather straight forward but I've hit a wall. The form field is being added (apparently not ...

SeminoleDog

79

asked Jan 18 at 18:46

1 vote

1 answer

51 views

How to extract text associated with image from pdf?

I am using pymupdf to extract images from PDF. Code sample is as below. import pymupdf doc = pymupdf.open('sample.pdf') page = doc[0] # get the page image_list = page.get_images() page_index = 0 ...

Neel

21.3k

asked Jan 16 at 17:33

-1 votes

1 answer

37 views

'_io.BytesIO' object has no attribute 'lower'

Hi encountered this error ('_io.BytesIO' object has no attribute 'lower') while testing to download and process pdf file with azure function app. The code that was failed : def download_file_byURL(...

Arc Angel

1

asked Jan 15 at 2:00

1 vote

1 answer

37 views

ReportLab PDF Correctly encodes only some latin-2 characters [duplicate]

I am trying to write a python program for pdf invoice creation. The text lines i write into a newly generated PDF are in Slovene with characters like č, š, ž, etc. which are found in the latin-2 ...

Jurij Plaskan

31

asked Jan 13 at 13:42

2 votes

1 answer

50 views

Issues Generating Barcode in data:image/png;base64 Format with Custom Size and No Text

I’m working on a Python project where my goal is to generate barcodes in the data:image/png;base64 format, without any human-readable footer text. Additionally, I need to adjust the size (height and ...

Developer Account

43

asked Jan 9 at 17:26

0 votes

2 answers

78 views

Python Script to Fill PDF Form with Character-by-Character Input in Grid Fails

I am working on a Python program to automate filling out PDF forms using PyMuPDF (fitz). I created a basic PDF form where placeholders can be either: Underscores (__________) for text fields, or ...

Thando Hlophe

67

asked Jan 9 at 12:06

1 vote

1 answer

113 views

Playwright python download get temporary file [closed]

I try to use Playwright and python code to get by code some free reports. With the following code and an existing chrome debug window, I try to get the report but I get a kind of temporary file not ...

Scnes de Ouf

11

asked Jan 8 at 14:32

2 votes

2 answers

75 views

Python web scraping - Bulk downloading linked files from the SEC AAER site, 403 Forbidden error

I've been trying to download 300 linked files from SEC's AAER site. Most of the links are pdf's, but some are websites that I would need to save to pdf instead of just downloading. I'm teaching myself ...

Taylor James

23

asked Jan 7 at 5:29

1 vote

0 answers

51 views

Pypdf merged pdfs wrong page atributes

When using pypdf merge function i get pdf file with invisible content. I found out taht coordinates of pages atributes mediBox and cropBox has some errors. Look like this: /MediaBox [ 0 0 595 ...

Roman

11

asked Dec 29, 2024 at 17:39

0 votes

0 answers

48 views

How to repair a PDF file that was transmitted with a wrong MIME type

I have a service A (flask) that transmits a file to service B (Django) using python's requests library. from typing import TYPE_CHECKING import magic if TYPE_CHECKING: from werkzeug....

Murilo Sitonio

305

asked Dec 28, 2024 at 0:44

0 votes

2 answers

174 views

How to insert a unicode text to PDF using PyMuPDF?

I'm trying to use the PyMuPDF library to insert a Unicode text into a PDF file. I have the following code based on the documentation example: import pymupdf doc = pymupdf.open() page = doc.new_page() ...

paarandika

1,439

asked Dec 22, 2024 at 1:46

1 vote

0 answers

165 views

Open pdf in pdf-js viewer from streamlit app

I have a streamlit app, and I want it to display a pdf in an iframe. My functionality requirements for my pdf viewer/iframe are: I want the pdf to open to a particular (parameterizable) page I want ...

Max Power

8,996

asked Dec 17, 2024 at 4:04

0 votes

0 answers

63 views

<textarea> tag is not rendered properly using CSS with IronPDF

I am attempting to convert an HTML form to a fillable PDF with IronPDF (IronPdf 2024.8.1.3) in Python (3.12.6). The HTML renders appropriately in Chrome. Tags other than the <textarea> tag ...

BalooRM

504

asked Dec 10, 2024 at 17:15

0 votes

1 answer

35 views

How to open PDF in ANSA

I am looking for a way to open a PDF file to view it with python through the ANSA script editor. Any way I go about this? It will show no errors but it also doesn't open the PDF file. I was wondering ...

Carlos Cuartas

57

asked Dec 9, 2024 at 20:17

0 votes

1 answer

40 views

PyMuPDF - Prevent PDF pages from being auto cropped [closed]

I'm using PyMuPDF to process a PDF and then re-save it, but the resulting file loses the original page orientations and crop boxes. Some pages in the original PDF are larger or differently oriented (e....

axelmukwena

1,059

asked Dec 7, 2024 at 14:51

0 votes

0 answers

92 views

PDF Text Extraction Order Not Matching Visual Layout Despite Correct Coordinates

I am working on extracting text from a PDF using PyMuPDF. However, I am encountering an issue where the extracted text order does not match the visual flow/Layout flow of the PDF. Details of the Issue:...

Phalgun

1

asked Dec 5, 2024 at 9:43

2 votes

2 answers

60 views

How to save a matplotlib figure with automatic height to pdf

I have the following problem: I want to save a figure with a specific width, but auto-determine its height. Let's look at an example: import matplotlib.pyplot as plt import numpy as np fig,ax=plt....

Simon Schey

47

asked Dec 1, 2024 at 9:46

1 vote

1 answer

78 views

Export a Google Sheet to PDF file with Python requests

Recently I have been trying to convert a Google Sheet into a PDF file, by retaining all formatting data. From my previous question, I have gotten a solution to request https://docs.google.com/...

Anish

13

asked Nov 29, 2024 at 17:36

-2 votes

1 answer

268 views

How to save PDF after cropping from each page of PDF using pdfplumber?

I am using a PDF with multiple pages that has a table on top of each page that I want to get rid of. So I am cropping the PDF after the top table. What I don't know is how to combine or save it as 1 ...

ViSa

2,247

asked Nov 25, 2024 at 10:49

0 votes

0 answers

27 views

How to optimize (in Python) the compression of TIFF bitmaps before inserting them in a PDF? Photoshop uses Predictor 2

I have written a tool in Python that reads TIFF images in CMYK or monochrome (gray levels) and assembles them into a PDF. It's using a nice module (mPdf.py) from Didier Stevens and the zlib library to ...

user3425798

83

asked Nov 22, 2024 at 16:41

3 votes

2 answers

251 views

Python request taking too long to get PDF from website

I'm trying to create a single, lightweight Python script to open a website hosting a guaranteed PDF file, download it, and extract its text. I’ve reviewed many posts here and across the internet and ...

R_Student

789

asked Nov 20, 2024 at 0:35

0 votes

0 answers

63 views

Borb text triggers "AssertionError: A Rectangle must have a non-negative width."

I am interested to change the font of some text that is in documents. I used the #581-filtering-by-font example and added re-submitting to SimpleFindReplace and it is triggering the assertion that a ...

user28348887

1

asked Nov 18, 2024 at 22:10

0 votes

1 answer

60 views

How to change color of border and font in an inline text / freetext annotation in a pdf document?

I'm writing a python script to change color of text and borders of inline annotations inserted with okular into a pdf document. This script instead of changing only text and border seems change the ...

vqqkomb0

1

asked Nov 16, 2024 at 19:50

0 votes

1 answer

300 views

How to display a pdf page into a Flet container

I'm trying to develop a simple app for displaying each page of a Pdf file. I start by adding a container and a button. The Pdf file's full path(absolute path + file name) is given to the variable ...

eljamba

385

asked Nov 14, 2024 at 15:23

0 votes

1 answer

111 views

Convert a Google Sheet to PDF file with formatting [duplicate]

I have been trying to convert a Google sheet to a PDF file. (This is the input Google Sheet.) I have browsed the internet and gotten to converting the Google Sheets to HTML using the gspread library, ...

Anish

13

asked Nov 14, 2024 at 12:47

0 votes

1 answer

80 views

Filter pdf with python

Ive been trying to find an answer by myself but the way how I want it was never found sadly, so I have a PDF file which contains multiple different pdf file, I want to create a python code (with venv) ...

zobomber

11

asked Nov 14, 2024 at 12:35

0 votes

1 answer

52 views

I want to install articles from telegra.ph using links and combine them in one pdf file

I want to install articles from telegra.ph using links for this articles and combine them in one pdf file. However, when i try to do it, i have this error: "requests.exceptions.ConnectionError: ('...

Adilkhan Dilman

1

asked Nov 13, 2024 at 5:32

1 vote

1 answer

138 views

Cleaning parsed data from pdf to csv

I am working on population projections for each district of India. India did not have a census since 2011 hence the use of population projections. My project analyses some variables of related to ...

simrpal

29

asked Nov 8, 2024 at 7:39

0 votes

0 answers

85 views

How to create a PIL.Image from PDF image XObjects using pikepdf in Python

I am trying to do lossless PNG compression on images in PDFs using Pillow. Here is some of my code that accesses the image xobjects and tries to use them to create a PIL.Image object import io import ...

eigenVector5

53

asked Nov 6, 2024 at 19:31

0 votes

0 answers

61 views

Try to exhibit PDF but didn't show

I want to exhibt a PDF file in a tk window but the window is not opening and no Exception is raised import tkinter as tk from tkPDFViewer import tkPDFViewer as pdf cur_file='PPC_Bach.Eng_.-de-...

Victor Luz

1

asked Nov 5, 2024 at 1:21

0 votes

0 answers

29 views

Using Tabula Py Templates

I would like to use Tabula to extract data with the tabula templates. One template will be for the first page and another template for the rest of the pages. Both templates were generated using Tabula....

user27154911

1

asked Oct 31, 2024 at 11:29

0 votes

0 answers

76 views

Failed capture of amounts in transaction table

I have not been able to extract all the information from a table of debit and credit transactions. The following is the table: I have used several approaches and ideas with regular expressions, but I ...

Oscar CENTENO MORA

21

asked Oct 29, 2024 at 21:15

Collectives™ on Stack Overflow

All Questions

Related Tags