Extracting Text from Image using pytesseract module

Hey!! In this post you will about a wonderful python module i.e. pytesseract. We will be using this module to extract text from the image. It will be great fun so let's start to code.

Logic and Explanation

As pytesseract is not a built-in python module, our first step will be installing pytesseract using pip. For this go to terminal and write the following command and hit enter. Pytesseract will be installed in your computer in few seconds.

pip install pytesseract

Now, we can code. So, first thing we have to do is to import the pytesseract. Also, we will import Image from PIL (Python Imaging Library) as we have to load a image.

import pytesseract as ptess
from PIL import Image

Now, we will load the image in our program whose text we will be extracting in this project.

img = Image.open("text.png")

After this we just have to call the image_to_string function of pytesseract module and then print it in our python terminal.

text = ptess.image_to_string(img)
print(text)

Now, we have to run our program and we found the below error.😱😱

pytesseract.TesseractNotFoundError

On checking documentation of pytesseract on PyPI, we get to know that we will have to the Google Tesseract OCR and after that we have to change the "tesseract_cmd" variable equal to the path of Google Tesseract OCR.

So, let's do this. First open that Github repository and there scroll down to "Installing Tesseract" and click on the link to install tesseract via pre-built package. It will take you to another github page, there you search for Windows if you are working on windows operating system and you will find a option "Windows - Tesseract at UB Mannheim". Click on it and it will take you to another github repo where you will the links to 32bit/64bit installer.

Install the Tesseract OCR on your computer. It's an easy task just press next and agree and install. After it get installed, let's again move to our code.

Now, we have to add tesseract as environment variable by using the below command.

ptess.pytesseract.tesseract_cmd = r'C:/Users/YOUR_USER/AppData/Local/Programs/Tesseract-OCR/tesseract.exe'

Now, save the program and then run.

BOOM!! You will find the text printed in your terminal which is present in the image "test.png".

Hope you liked the post. If any doubt you can ask in the comments below. Also, your appreciation and suggestions are welcomed in comments.

Extracting Text from Image using pytesseract module - Python Projects

Extracting Text from Image using pytesseract module

Logic and Explanation

Bhavik Agarwal

Post a Comment

Footer Ads

Python Project with MySQL for Class 12

Generate QR Code with Python in just 5 lines of Code

Python Program to Find and Remove Duplicate Lines in a Large Text File

Footer Ads

Contact Form