OCRopus
OCRopus(tm) is a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities. The OCRopus engine is based on two research projects: a high-performance handwriting recognizer developed in the mid-90's and deployed by the US Census bureau, and novel high-performance layout analysis methods. OCRopus is development is sponsored by Google and is initially intended for high-throughput, high-volume document conversion efforts. We expect that it will also be an excellent OCR system for many other applications.
http://code.google.com/p/ocropus/
comments powered by Disqus
Related Products
GOCR
GOCR is an OCR (Optical Character Recognition) program, developed under the GNU Public License. It converts scanned images of text back to text files. Joerg Schulenburg started the program, and now leads a team of developers.
Tesseract-ocr
The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but it is probably one of the most accurate open source OCR engines available. The source code will read a binary, grey or color image and output text. A tiff reader is built in that will read uncompressed TIFF images, or libtiff can be added to read compressed images.
Tessnet2
A .NET 2.0 Open Source OCR assembly using Tesseract engine.
JavaOCR
Java OCR is an Optical Character Recognition algorithm based on a mean squared recognizer. This tool also includes utilities to trace and extract characters.
Pyocrhelper - Python script to make OCR with ocropus easier for end users and/or developers
What is pyocrhelper? Ocropus is a high quality OCR software which accepts images files as input and outputs html text files. Ocropus is quite good at what it does. pyocrhelper is a python class which makes interacting with Ocropus easier for an end user or a developer by taking care of all the steps which have to be taken before Ocropus can be used: determine filetype of input file if (etc) pdf, convert pdf to images convert input image(s) to Ocropus input image run Ocropus on the input image(s)
Tesseract-ocr - An OCR Engine that was developed at HP Labs between 1985 and 1995... and now at Goog
BackgroundThe Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google and is probably one of the most accurate open source OCR engines available. Combined with the Leptonica Image Processing Library it can read a wide variety of image formats and convert them to text in over 40 languages. Important Download Information:The language data files are separate from
VietOCR
Provides optical character recognition (OCR) solutions for Vietnamese language.
ImageMagick
ImageMagick is a software suite to create, edit, and compose bitmap images. It can read, convert and write images in a variety of formats (over 100) including DPX, EXR, GIF, JPEG, JPEG-2000, PDF, PhotoCD, PNG, Postscript, SVG, and TIFF. Use ImageMagick to translate, flip, mirror, rotate, scale, shear and transform images, adjust image colors, apply various special effects, or draw text, lines, polygons, ellipses and Bézier curves.
Uyghurocr - OCR project to recognize Uyghur language
Background uyghurocr project trying to build an open source OCR software to recognize Uyghur language. uyghurocr will start job based on current open source ocr projects - Tesseract OCR and ocropus via enhancing their Arabic (right to left language) support and user interface. I will put everything step by step in order to make Tesseract support Uyghur language here. Later maybe I will create nice user interface and more feature even integrate it with ocropus project. Since most of the audience
Nepali-ocr - Nepali OCR
Nepali OCRA version of Nepali OCR is being developed in Visual Studio 2003 (.NET version 1.1) Currently tesseract-ocr can also be used for Nepali Characters. You can download the language files from the download section. Please put the language files in /usr/share/tessdata/ in linux. If you are using windows, place the tessdata folder with the executable tesseract.exe Tesseract ocrHomepage : http://code.google.com/p/tesseract-ocr Groups: http://groups.google.com/group/tesseract-ocr OCRopusHomepa