Skip to main content

All Questions

Tagged with
Filter by
Sorted by
Tagged with
-1 votes
3 answers
60 views

How to generate a PDF with a grid of images per page?

Our work involves visually inspecting a number of plots together. All plots of same size. we want to print them in pages to study. Something like a 8.5"x11" paper with 1" margin gives ...
BiGYaN's user avatar
  • 7,177
0 votes
0 answers
18 views

How to suppress popup dialog box when converting pdf to docx using pywin32

I'm running a python script on a windows laptop to convert some sample pdf files to docx. However, for each file, a dialog box pops up that prompts me to click OK when the script tries to convert said ...
ClusterPhuck69's user avatar
0 votes
1 answer
49 views

How to create a searchable PDF using Python and Selenium?

I want to create a program like FireShot (premium version) to take a webpage on chromedriver and convert it into a pdf. Currently this is the code I came up with: import time import os import glob ...
salt lake's user avatar
0 votes
0 answers
47 views

Page number in PyMuPDF multiprocessing with extract_text

So in pymupdf documentation states that PyMuPDF does not support running on multiple threads So they use multiprocessing, and they do this weird thing with segments in example code: seg_size = int(...
Michał Darowny's user avatar
2 votes
0 answers
55 views

reserve and Reapply PDF Layout When Editing Text and Images with PyMuPDF and PyQt6

I'm working on a PyQt6-based PDF editor using PyMuPDF (fitz). My goal is to extract all text and images from a PDF while preserving their original positions and dimensions, allow users to edit/move ...
Yousef Hashem's user avatar
0 votes
2 answers
169 views

Extract tables from PDF files

I am conducting research on p-hacking, which requires accurately extracting tables from published academic papers. I have downloaded a large number of PDF files for this purpose. So far, I have tried ...
Buoyant Xu's user avatar
0 votes
0 answers
21 views

fpdf2 and multi_cell with python

I'm trying to create a PDF document. There is a long and short texts. Word wrapping exists only in multi_cell (though it is drawn as a “ladder” by default). Due to the fact that you need to write text ...
Stanislav's user avatar
0 votes
1 answer
43 views

pypdf or pikepdf advice needed on bookmarks

I am sorry but I am unable to understand how to rearrange bookmarks in PDF document. I have PDF document with medical records which was created by importing new and new items from individual ...
Vladimir Buzalka's user avatar
0 votes
0 answers
58 views

Printing Thermal Receipt Bills using Python

I am trying to write a small script to print the pdf file generated via ReportLabs, Unfortunately, I do not find much inputs for my requirements. I also tried to convert my PDF into an Image object ...
Knowledge thirst's user avatar
0 votes
0 answers
28 views

Integrating Print Settings into a PDF

How can I embed information in a PDF so that the printer knows which tray to use for the paper? There used to be an old tool that unfortunately no longer works for us. It inserted such containers, ...
Patrick's user avatar
0 votes
1 answer
52 views

How to Convert a PDF Table with Thousands of Rows into JSON in React

I am working on a project where I need to convert a PDF containing a large table (thousands of rows) into a JSON Array of object. The PDF has a table with headers that should be used as keys in the ...
Manu H N's user avatar
0 votes
1 answer
42 views

Combine plots from separate pdfs into one in python

I have created several plots and saved each one separately in its own PDF file using the following line of code: plt.savefig('/path/Plot1.pdf', format='pdf', bbox_inches='tight', dpi=600) Now, I need ...
Programming Noob's user avatar
0 votes
0 answers
44 views

PDF form checkboxes checking using python and pdfrw

Has anyone experience with pdf form checkbox checking? The case is that within a django application, based on annotations of pdf forms I map and identify checkboxes I want to either check, or leave ...
Robert Soroka's user avatar
0 votes
1 answer
51 views

How can I force multi-line text inside a PDF form field using Python

😊 I'm working on filling out a PDF form programmatically using Python. More especifically a T3 for from Canada's CRA. Form is a fill-in form. I'm having trouble getting multi-line text to display ...
robis1985's user avatar
0 votes
0 answers
68 views

PDF Scraping in Python

I am having trouble scraping certain data from PDF files in Python. There are no console errors, but when the CSV is produced, the columns Owner's First Name - Zip Code are either filled with the ...
user29394340's user avatar
0 votes
0 answers
69 views

How to export Jupyter notebook to pdf having installed TeX Live included it in the path

I am trying to pdf my Jupyter notebook however I keep getting this error: [error] If you have not installed xelatex (TeX), you will need to do so before you can export to PDF. For further instructions,...
Kevin's user avatar
  • 47
0 votes
0 answers
31 views

How to insert an image in a blank position in a pdf file

The following script inserts an image into a PDF file using Python (Python AttributeError: 'Page' object has no attribute 'insertImage'). I would like to identify a blank space in the ...
Silvio Júnior's user avatar
1 vote
1 answer
51 views

Convert a PDF to a PNG with transparency

My goal is to obtain a PNG file with a transparent background from a PDF file. The convert tool can do the job: $ convert test.pdf test.png $ file test.png test.png: PNG image data, 595 x 842, 8-bit ...
qouify's user avatar
  • 3,920
0 votes
0 answers
52 views

How to add a form field to an existing pdf with python

I've been tasked with creating python script that will add a form field to an existing pdf file. Seemed rather straight forward but I've hit a wall. The form field is being added (apparently not ...
SeminoleDog's user avatar
1 vote
1 answer
51 views

How to extract text associated with image from pdf?

I am using pymupdf to extract images from PDF. Code sample is as below. import pymupdf doc = pymupdf.open('sample.pdf') page = doc[0] # get the page image_list = page.get_images() page_index = 0 ...
Neel's user avatar
  • 21.3k
-1 votes
1 answer
37 views

'_io.BytesIO' object has no attribute 'lower'

Hi encountered this error ('_io.BytesIO' object has no attribute 'lower') while testing to download and process pdf file with azure function app. The code that was failed : def download_file_byURL(...
Arc Angel's user avatar
1 vote
1 answer
37 views

ReportLab PDF Correctly encodes only some latin-2 characters [duplicate]

I am trying to write a python program for pdf invoice creation. The text lines i write into a newly generated PDF are in Slovene with characters like č, š, ž, etc. which are found in the latin-2 ...
Jurij Plaskan's user avatar
2 votes
1 answer
50 views

Issues Generating Barcode in data:image/png;base64 Format with Custom Size and No Text

I’m working on a Python project where my goal is to generate barcodes in the data:image/png;base64 format, without any human-readable footer text. Additionally, I need to adjust the size (height and ...
Developer Account's user avatar
0 votes
2 answers
78 views

Python Script to Fill PDF Form with Character-by-Character Input in Grid Fails

I am working on a Python program to automate filling out PDF forms using PyMuPDF (fitz). I created a basic PDF form where placeholders can be either: Underscores (__________) for text fields, or ...
Thando Hlophe's user avatar
1 vote
1 answer
113 views

Playwright python download get temporary file [closed]

I try to use Playwright and python code to get by code some free reports. With the following code and an existing chrome debug window, I try to get the report but I get a kind of temporary file not ...
Scnes de Ouf 's user avatar
2 votes
2 answers
75 views

Python web scraping - Bulk downloading linked files from the SEC AAER site, 403 Forbidden error

I've been trying to download 300 linked files from SEC's AAER site. Most of the links are pdf's, but some are websites that I would need to save to pdf instead of just downloading. I'm teaching myself ...
Taylor James's user avatar
1 vote
0 answers
51 views

Pypdf merged pdfs wrong page atributes

When using pypdf merge function i get pdf file with invisible content. I found out taht coordinates of pages atributes mediBox and cropBox has some errors. Look like this: /MediaBox [ 0 0 595 ...
Roman's user avatar
  • 11
0 votes
0 answers
48 views

How to repair a PDF file that was transmitted with a wrong MIME type

I have a service A (flask) that transmits a file to service B (Django) using python's requests library. from typing import TYPE_CHECKING import magic if TYPE_CHECKING: from werkzeug....
Murilo Sitonio's user avatar
0 votes
2 answers
174 views

How to insert a unicode text to PDF using PyMuPDF?

I'm trying to use the PyMuPDF library to insert a Unicode text into a PDF file. I have the following code based on the documentation example: import pymupdf doc = pymupdf.open() page = doc.new_page() ...
paarandika's user avatar
  • 1,439
1 vote
0 answers
165 views

Open pdf in pdf-js viewer from streamlit app

I have a streamlit app, and I want it to display a pdf in an iframe. My functionality requirements for my pdf viewer/iframe are: I want the pdf to open to a particular (parameterizable) page I want ...
Max Power's user avatar
  • 8,996
0 votes
0 answers
63 views

<textarea> tag is not rendered properly using CSS with IronPDF

I am attempting to convert an HTML form to a fillable PDF with IronPDF (IronPdf 2024.8.1.3) in Python (3.12.6). The HTML renders appropriately in Chrome. Tags other than the <textarea> tag ...
BalooRM's user avatar
  • 504
0 votes
1 answer
35 views

How to open PDF in ANSA

I am looking for a way to open a PDF file to view it with python through the ANSA script editor. Any way I go about this? It will show no errors but it also doesn't open the PDF file. I was wondering ...
Carlos Cuartas's user avatar
0 votes
1 answer
40 views

PyMuPDF - Prevent PDF pages from being auto cropped [closed]

I'm using PyMuPDF to process a PDF and then re-save it, but the resulting file loses the original page orientations and crop boxes. Some pages in the original PDF are larger or differently oriented (e....
axelmukwena's user avatar
  • 1,059
0 votes
0 answers
92 views

PDF Text Extraction Order Not Matching Visual Layout Despite Correct Coordinates

I am working on extracting text from a PDF using PyMuPDF. However, I am encountering an issue where the extracted text order does not match the visual flow/Layout flow of the PDF. Details of the Issue:...
Phalgun's user avatar
2 votes
2 answers
60 views

How to save a matplotlib figure with automatic height to pdf

I have the following problem: I want to save a figure with a specific width, but auto-determine its height. Let's look at an example: import matplotlib.pyplot as plt import numpy as np fig,ax=plt....
Simon Schey's user avatar
1 vote
1 answer
78 views

Export a Google Sheet to PDF file with Python requests

Recently I have been trying to convert a Google Sheet into a PDF file, by retaining all formatting data. From my previous question, I have gotten a solution to request https://docs.google.com/...
Anish's user avatar
  • 13
-2 votes
1 answer
268 views

How to save PDF after cropping from each page of PDF using pdfplumber?

I am using a PDF with multiple pages that has a table on top of each page that I want to get rid of. So I am cropping the PDF after the top table. What I don't know is how to combine or save it as 1 ...
ViSa's user avatar
  • 2,247
0 votes
0 answers
27 views

How to optimize (in Python) the compression of TIFF bitmaps before inserting them in a PDF? Photoshop uses Predictor 2

I have written a tool in Python that reads TIFF images in CMYK or monochrome (gray levels) and assembles them into a PDF. It's using a nice module (mPdf.py) from Didier Stevens and the zlib library to ...
user3425798's user avatar
3 votes
2 answers
251 views

Python request taking too long to get PDF from website

I'm trying to create a single, lightweight Python script to open a website hosting a guaranteed PDF file, download it, and extract its text. I’ve reviewed many posts here and across the internet and ...
R_Student's user avatar
  • 789
0 votes
0 answers
63 views

Borb text triggers "AssertionError: A Rectangle must have a non-negative width."

I am interested to change the font of some text that is in documents. I used the #581-filtering-by-font example and added re-submitting to SimpleFindReplace and it is triggering the assertion that a ...
user28348887's user avatar
0 votes
1 answer
60 views

How to change color of border and font in an inline text / freetext annotation in a pdf document?

I'm writing a python script to change color of text and borders of inline annotations inserted with okular into a pdf document. This script instead of changing only text and border seems change the ...
vqqkomb0's user avatar
0 votes
1 answer
300 views

How to display a pdf page into a Flet container

I'm trying to develop a simple app for displaying each page of a Pdf file. I start by adding a container and a button. The Pdf file's full path(absolute path + file name) is given to the variable ...
eljamba's user avatar
  • 385
0 votes
1 answer
111 views

Convert a Google Sheet to PDF file with formatting [duplicate]

I have been trying to convert a Google sheet to a PDF file. (This is the input Google Sheet.) I have browsed the internet and gotten to converting the Google Sheets to HTML using the gspread library, ...
Anish's user avatar
  • 13
0 votes
1 answer
80 views

Filter pdf with python

Ive been trying to find an answer by myself but the way how I want it was never found sadly, so I have a PDF file which contains multiple different pdf file, I want to create a python code (with venv) ...
zobomber's user avatar
0 votes
1 answer
52 views

I want to install articles from telegra.ph using links and combine them in one pdf file

I want to install articles from telegra.ph using links for this articles and combine them in one pdf file. However, when i try to do it, i have this error: "requests.exceptions.ConnectionError: ('...
Adilkhan Dilman's user avatar
1 vote
1 answer
138 views

Cleaning parsed data from pdf to csv

I am working on population projections for each district of India. India did not have a census since 2011 hence the use of population projections. My project analyses some variables of related to ...
simrpal's user avatar
  • 29
0 votes
0 answers
85 views

How to create a PIL.Image from PDF image XObjects using pikepdf in Python

I am trying to do lossless PNG compression on images in PDFs using Pillow. Here is some of my code that accesses the image xobjects and tries to use them to create a PIL.Image object import io import ...
eigenVector5's user avatar
0 votes
0 answers
61 views

Try to exhibit PDF but didn't show

I want to exhibt a PDF file in a tk window but the window is not opening and no Exception is raised import tkinter as tk from tkPDFViewer import tkPDFViewer as pdf cur_file='PPC_Bach.Eng_.-de-...
Victor Luz 's user avatar
0 votes
0 answers
29 views

Using Tabula Py Templates

I would like to use Tabula to extract data with the tabula templates. One template will be for the first page and another template for the rest of the pages. Both templates were generated using Tabula....
user27154911's user avatar
0 votes
0 answers
76 views

Failed capture of amounts in transaction table

I have not been able to extract all the information from a table of debit and credit transactions. The following is the table: I have used several approaches and ideas with regular expressions, but I ...
Oscar CENTENO MORA's user avatar

1
2 3 4 5
93