Current location - Education and Training Encyclopedia - Resume - Comparative summary of OCR open source projects
Comparative summary of OCR open source projects
Optical Character Recognition (OCR) refers to the process of analyzing and recognizing image files of text materials to obtain text and layout information. That is, the text in the image is recognized and returned as text.

The development of ocr has accumulated a lot. Most people or enterprises directly use third-party services. At present, many large enterprises provide third-party services. Baidu, Alibaba Cloud, Tencent, etc. all provide very convenient api interfaces, which can be called, and the speed, accuracy and effect of recognition are also very good. The only drawback is that api calls need to be charged, which is still very low for individuals and enterprises with low call frequency.

At present, because of the current situation of the company, there are several purposes to use open source.

There are still many open source projects related to ocr. The author happens to be a company that needs similar functions, so I did some simple research and recorded it here.

I hope you can point out that this survey is inaccurate.

Tesseract is an open source image and text recognition engine developed by Google and developed by python.

Therefore, in view of the current situation of the company, I gave up the study and investigation of this project.

Paddleocr is Baidu's open source Chinese recognition ocr software.

EasyOCR is an OCR library written in Python, which is used to recognize characters in images and output them as text, and supports more than 80 languages.

Chinese ocr

Chinese ocr_lite

TrWebOCR

cnocr

In view of the above comparison and discussion, at the same time, according to the current company situation and some previous goals, the simplest cnocr is temporarily selected for learning and internal learning. At the same time, since cnocr is only a python package and cannot be called through the interface, a supplementary project hn_ocr is made.

Currently on github, welcome to learn and improve together.