How to use Pymupdf library to read page in Python In Python

The PIL (Python Imaging Library), along with the PyMuPDF library, will be used for PDF processing in this article. To install the PyMuPDF library, run the following command in the command processor of the operating system:

pip install pymupdf

Note: This PyMuPDF library is imported by using the following command.

import fitz

Reading a page from a pdf file requires loading it and then displaying the contents of only one of its pages. This essentially makes that one-page equivalent of an image. Therefore, the page from the pdf file would be read and displayed as an image. 

The following example demonstrates the above process:

Python3




import fitz
from PIL import Image
 
# Path of the PDF file
input_file = r"test.pdf"
 
# Opening the PDF file and creating a handle for it
file_handle = fitz.open(input_file)
 
# The page no. denoted by the index would be loaded
# The index within the square brackets is the page number
page = file_handle[0]
 
# Obtaining the pixelmap of the page
page_img = page.get_pixmap()
 
# Saving the pixelmap into a png image file
page_img.save('PDF_page.png')
 
# Reading the PNG image file using pillow
img = Image.open('PDF_page.png')
 
# Displaying the png image file using an image viewer
img.show()


Output:

 

Explanation:

Firstly the pdf file is opened, and its file handle is stored. Then the first page of the pdf (at index 0) is loaded using list indexing. This page’s pixel map (pixel array) is obtained using the get_pixmap function, and the resultant pixel map is saved in a variable. Then this pixel map is saved as a png image file. Then this png file is opened using the open function present in the Image module of PIL. In the end, the image is displayed using the show function. 

Note: The first open function is used to open a pdf file, and the later one is used to open the png image file. The functions belong to different libraries and are used for different purposes. 

Read a Particular Page from a PDF File in Python

Document processing is one of the most common use cases for the Python programming language. This allows the language to process many files, such as database files, multimedia files and encrypted files, to name a few. This article will teach you how to read a particular page from a PDF (Portable Document Format) file in Python.

Similar Reads

Method 1: Using Pymupdf library to read page in Python

The PIL (Python Imaging Library), along with the PyMuPDF library, will be used for PDF processing in this article. To install the PyMuPDF library, run the following command in the command processor of the operating system:...

Method 2: Reading a particular page from a PDF using PyPDF2

...

Contact Us