How to Read a PDF File in Python


If you need to read a PDF (Portable Document Format) file in your Python code, then you can do the following:

Option 1 – Using PyPDF2

from PyPDF2 import PDFFileReader
temp = open('your_document.pdf', 'rb')
PDF_read = PDFFileReader(temp)
first_page = PDF_read.getPage(0)
print(first_page.extractText())

Option 2 – Using PDFplumber

import PDFplumber
with PDFplumber.open("your_document.PDF") as temp:
  first_page = temp.pages[0]
  print(first_page.extract_text())

Option 3 – Using textract

import textract
PDF_read = textract.process('document_path.PDF', method='PDFminer')