Extracting Text from Image using pytesseract module
Hey!! In this post you will about a wonderful python module i.e. pytesseract. We will be using this module to extract text from the image. It will be great fun so let's start to code.
Logic and Explanation
As pytesseract is not a built-in python module, our first step will be installing pytesseract using pip. For this go to terminal and write the following command and hit enter. Pytesseract will be installed in your computer in few seconds.
pip install pytesseract
Now, we can code. So, first thing we have to do is to import the pytesseract. Also, we will import Image from PIL (Python Imaging Library) as we have to load a image.
import pytesseract as ptess
from PIL import Image
Now, we will load the image in our program whose text we will be extracting in this project.
img = Image.open("text.png")
After this we just have to call the image_to_string function of pytesseract module and then print it in our python terminal.
text = ptess.image_to_string(img)
print(text)
Now, we have to run our program and we found the below error.😱😱
pytesseract.TesseractNotFoundError |
On checking documentation of pytesseract on PyPI, we get to know that we will have to the Google Tesseract OCR and after that we have to change the "tesseract_cmd" variable equal to the path of Google Tesseract OCR.
So, let's do this. First open that Github repository and there scroll down to "Installing Tesseract" and click on the link to install tesseract via pre-built package. It will take you to another github page, there you search for Windows if you are working on windows operating system and you will find a option "Windows - Tesseract at UB Mannheim". Click on it and it will take you to another github repo where you will the links to 32bit/64bit installer.
Install the Tesseract OCR on your computer. It's an easy task just press next and agree and install. After it get installed, let's again move to our code.
Now, we have to add tesseract as environment variable by using the below command.
ptess.pytesseract.tesseract_cmd = r'C:/Users/YOUR_USER/AppData/Local/Programs/Tesseract-OCR/tesseract.exe'
Now, save the program and then run.
Post a Comment
For any doubts feel free to ask in comments below.
Stay Connected to Us for more such Courses and Projects.