All Questions
4,633 questions
-1
votes
3
answers
60
views
How to generate a PDF with a grid of images per page?
Our work involves visually inspecting a number of plots together. All plots of same size. we want to print them in pages to study. Something like a 8.5"x11" paper with 1" margin gives ...
0
votes
0
answers
18
views
How to suppress popup dialog box when converting pdf to docx using pywin32
I'm running a python script on a windows laptop to convert some sample pdf files to docx. However, for each file, a dialog box pops up that prompts me to click OK when the script tries to convert said ...
0
votes
1
answer
49
views
How to create a searchable PDF using Python and Selenium?
I want to create a program like FireShot (premium version) to take a webpage on chromedriver and convert it into a pdf.
Currently this is the code I came up with:
import time
import os
import glob
...
0
votes
0
answers
47
views
Page number in PyMuPDF multiprocessing with extract_text
So in pymupdf documentation states that PyMuPDF does not support running on multiple threads
So they use multiprocessing, and they do this weird thing with segments in example code:
seg_size = int(...
2
votes
0
answers
55
views
reserve and Reapply PDF Layout When Editing Text and Images with PyMuPDF and PyQt6
I'm working on a PyQt6-based PDF editor using PyMuPDF (fitz).
My goal is to extract all text and images from a PDF while preserving their original positions and dimensions, allow users to edit/move ...
0
votes
2
answers
169
views
Extract tables from PDF files
I am conducting research on p-hacking, which requires accurately extracting tables from published academic papers. I have downloaded a large number of PDF files for this purpose.
So far, I have tried ...
0
votes
0
answers
21
views
fpdf2 and multi_cell with python
I'm trying to create a PDF document. There is a long and short texts. Word wrapping exists only in multi_cell (though it is drawn as a “ladder” by default). Due to the fact that you need to write text ...
0
votes
1
answer
43
views
pypdf or pikepdf advice needed on bookmarks
I am sorry but I am unable to understand how to rearrange bookmarks in PDF document.
I have PDF document with medical records which was created by importing new and new items from individual ...
0
votes
0
answers
58
views
Printing Thermal Receipt Bills using Python
I am trying to write a small script to print the pdf file generated via ReportLabs, Unfortunately, I do not find much inputs for my requirements.
I also tried to convert my PDF into an Image object ...
0
votes
0
answers
28
views
Integrating Print Settings into a PDF
How can I embed information in a PDF so that the printer knows which tray to use for the paper?
There used to be an old tool that unfortunately no longer works for us. It inserted such containers, ...
0
votes
1
answer
52
views
How to Convert a PDF Table with Thousands of Rows into JSON in React
I am working on a project where I need to convert a PDF containing a large table (thousands of rows) into a JSON Array of object. The PDF has a table with headers that should be used as keys in the ...
0
votes
1
answer
42
views
Combine plots from separate pdfs into one in python
I have created several plots and saved each one separately in its own PDF file using the following line of code:
plt.savefig('/path/Plot1.pdf', format='pdf', bbox_inches='tight', dpi=600)
Now, I need ...
0
votes
0
answers
44
views
PDF form checkboxes checking using python and pdfrw
Has anyone experience with pdf form checkbox checking? The case is that within a django application, based on annotations of pdf forms I map and identify checkboxes I want to either check, or leave ...
0
votes
1
answer
51
views
How can I force multi-line text inside a PDF form field using Python
😊
I'm working on filling out a PDF form programmatically using Python. More especifically a T3 for from Canada's CRA. Form is a fill-in form.
I'm having trouble getting multi-line text to display ...
0
votes
0
answers
68
views
PDF Scraping in Python
I am having trouble scraping certain data from PDF files in Python. There are no console errors, but when the CSV is produced, the columns Owner's First Name - Zip Code are either filled with the ...
0
votes
0
answers
69
views
How to export Jupyter notebook to pdf having installed TeX Live included it in the path
I am trying to pdf my Jupyter notebook however I keep getting this error:
[error] If you have not installed xelatex (TeX), you will need to do so before you can export to PDF. For further instructions,...
0
votes
0
answers
31
views
How to insert an image in a blank position in a pdf file
The following script inserts an image into a PDF file using Python (Python AttributeError: 'Page' object has no attribute 'insertImage').
I would like to identify a blank space in the ...
1
vote
1
answer
51
views
Convert a PDF to a PNG with transparency
My goal is to obtain a PNG file with a transparent background from a PDF file.
The convert tool can do the job:
$ convert test.pdf test.png
$ file test.png
test.png: PNG image data, 595 x 842, 8-bit ...
0
votes
0
answers
52
views
How to add a form field to an existing pdf with python
I've been tasked with creating python script that will add a form field to an existing pdf file. Seemed rather straight forward but I've hit a wall. The form field is being added (apparently not ...
1
vote
1
answer
51
views
How to extract text associated with image from pdf?
I am using pymupdf to extract images from PDF. Code sample is as below.
import pymupdf
doc = pymupdf.open('sample.pdf')
page = doc[0] # get the page
image_list = page.get_images()
page_index = 0
...
-1
votes
1
answer
37
views
'_io.BytesIO' object has no attribute 'lower'
Hi encountered this error ('_io.BytesIO' object has no attribute 'lower') while testing to download and process pdf file with azure function app. The code that was failed :
def download_file_byURL(...
1
vote
1
answer
37
views
ReportLab PDF Correctly encodes only some latin-2 characters [duplicate]
I am trying to write a python program for pdf invoice creation. The text lines i write into a newly generated PDF are in Slovene with characters like č, š, ž, etc. which are found in the latin-2 ...
2
votes
1
answer
50
views
Issues Generating Barcode in data:image/png;base64 Format with Custom Size and No Text
I’m working on a Python project where my goal is to generate barcodes in the data:image/png;base64 format, without any human-readable footer text. Additionally, I need to adjust the size (height and ...
0
votes
2
answers
78
views
Python Script to Fill PDF Form with Character-by-Character Input in Grid Fails
I am working on a Python program to automate filling out PDF forms using PyMuPDF (fitz). I created a basic PDF form where placeholders can be either:
Underscores (__________) for text fields, or
...
1
vote
1
answer
113
views
Playwright python download get temporary file [closed]
I try to use Playwright and python code to get by code some free reports.
With the following code and an existing chrome debug window, I try to get the report but I get a kind of temporary file not ...
2
votes
2
answers
75
views
Python web scraping - Bulk downloading linked files from the SEC AAER site, 403 Forbidden error
I've been trying to download 300 linked files from SEC's AAER site. Most of the links are pdf's, but some are websites that I would need to save to pdf instead of just downloading. I'm teaching myself ...
1
vote
0
answers
51
views
Pypdf merged pdfs wrong page atributes
When using pypdf merge function i get pdf file with invisible content. I found out taht coordinates of pages atributes mediBox and cropBox has some errors. Look like this:
/MediaBox [ 0 0 595 ...
0
votes
0
answers
48
views
How to repair a PDF file that was transmitted with a wrong MIME type
I have a service A (flask) that transmits a file to service B (Django) using python's requests library.
from typing import TYPE_CHECKING
import magic
if TYPE_CHECKING:
from werkzeug....
0
votes
2
answers
174
views
How to insert a unicode text to PDF using PyMuPDF?
I'm trying to use the PyMuPDF library to insert a Unicode text into a PDF file. I have the following code based on the documentation example:
import pymupdf
doc = pymupdf.open()
page = doc.new_page()
...
1
vote
0
answers
165
views
Open pdf in pdf-js viewer from streamlit app
I have a streamlit app, and I want it to display a pdf in an iframe. My functionality requirements for my pdf viewer/iframe are:
I want the pdf to open to a particular (parameterizable) page
I want ...
0
votes
0
answers
63
views
<textarea> tag is not rendered properly using CSS with IronPDF
I am attempting to convert an HTML form to a fillable PDF with IronPDF (IronPdf 2024.8.1.3) in Python (3.12.6). The HTML renders appropriately in Chrome. Tags other than the <textarea> tag ...
0
votes
1
answer
35
views
How to open PDF in ANSA
I am looking for a way to open a PDF file to view it with python through the ANSA script editor.
Any way I go about this? It will show no errors but it also doesn't open the PDF file.
I was wondering ...
0
votes
1
answer
40
views
PyMuPDF - Prevent PDF pages from being auto cropped [closed]
I'm using PyMuPDF to process a PDF and then re-save it, but the resulting file loses the original page orientations and crop boxes. Some pages in the original PDF are larger or differently oriented (e....
0
votes
0
answers
92
views
PDF Text Extraction Order Not Matching Visual Layout Despite Correct Coordinates
I am working on extracting text from a PDF using PyMuPDF. However, I am encountering an issue where the extracted text order does not match the visual flow/Layout flow of the PDF.
Details of the Issue:...
2
votes
2
answers
60
views
How to save a matplotlib figure with automatic height to pdf
I have the following problem: I want to save a figure with a specific width, but auto-determine its height. Let's look at an example:
import matplotlib.pyplot as plt
import numpy as np
fig,ax=plt....
1
vote
1
answer
78
views
Export a Google Sheet to PDF file with Python requests
Recently I have been trying to convert a Google Sheet into a PDF file, by retaining all formatting data. From my previous question, I have gotten a solution to request https://docs.google.com/...
-2
votes
1
answer
268
views
How to save PDF after cropping from each page of PDF using pdfplumber?
I am using a PDF with multiple pages that has a table on top of each page that I want to get rid of. So I am cropping the PDF after the top table.
What I don't know is how to combine or save it as 1 ...
0
votes
0
answers
27
views
How to optimize (in Python) the compression of TIFF bitmaps before inserting them in a PDF? Photoshop uses Predictor 2
I have written a tool in Python that reads TIFF images in CMYK or monochrome (gray levels) and assembles them into a PDF. It's using a nice module (mPdf.py) from Didier Stevens and the zlib library to ...
3
votes
2
answers
251
views
Python request taking too long to get PDF from website
I'm trying to create a single, lightweight Python script to open a website hosting a guaranteed PDF file, download it, and extract its text.
I’ve reviewed many posts here and across the internet and ...
0
votes
0
answers
63
views
Borb text triggers "AssertionError: A Rectangle must have a non-negative width."
I am interested to change the font of some text that is in documents.
I used the #581-filtering-by-font example and added re-submitting to SimpleFindReplace and it is triggering the assertion that a ...
0
votes
1
answer
60
views
How to change color of border and font in an inline text / freetext annotation in a pdf document?
I'm writing a python script to change color of text and borders of inline annotations inserted with okular into a pdf document.
This script instead of changing only text and border seems change the ...
0
votes
1
answer
300
views
How to display a pdf page into a Flet container
I'm trying to develop a simple app for displaying each page of a Pdf file. I start by adding a container and a button. The Pdf file's full path(absolute path + file name) is given to the variable ...
0
votes
1
answer
111
views
Convert a Google Sheet to PDF file with formatting [duplicate]
I have been trying to convert a Google sheet to a PDF file. (This is the input Google Sheet.) I have browsed the internet and gotten to converting the Google Sheets to HTML using the gspread library, ...
0
votes
1
answer
80
views
Filter pdf with python
Ive been trying to find an answer by myself but the way how I want it was never found sadly, so I have a PDF file which contains multiple different pdf file, I want to create a python code (with venv) ...
0
votes
1
answer
52
views
I want to install articles from telegra.ph using links and combine them in one pdf file
I want to install articles from telegra.ph using links for this articles and combine them in one pdf file. However, when i try to do it, i have this error:
"requests.exceptions.ConnectionError: ('...
1
vote
1
answer
138
views
Cleaning parsed data from pdf to csv
I am working on population projections for each district of India. India did not have a census since 2011 hence the use of population projections. My project analyses some variables of related to ...
0
votes
0
answers
85
views
How to create a PIL.Image from PDF image XObjects using pikepdf in Python
I am trying to do lossless PNG compression on images in PDFs using Pillow. Here is some of my code that accesses the image xobjects and tries to use them to create a PIL.Image object
import io
import ...
0
votes
0
answers
61
views
Try to exhibit PDF but didn't show
I want to exhibt a PDF file in a tk window but the window is not opening and no Exception is raised
import tkinter as tk
from tkPDFViewer import tkPDFViewer as pdf
cur_file='PPC_Bach.Eng_.-de-...
0
votes
0
answers
29
views
Using Tabula Py Templates
I would like to use Tabula to extract data with the tabula templates. One template will be for the first page and another template for the rest of the pages. Both templates were generated using Tabula....
0
votes
0
answers
76
views
Failed capture of amounts in transaction table
I have not been able to extract all the information from a table of debit and credit transactions.
The following is the table:
I have used several approaches and ideas with regular expressions, but I ...