OCRopus

OCRopus(tm) is a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities. The OCRopus engine is based on two research projects: a high-performance handwriting recognizer developed in the mid-90's and deployed by the US Census bureau, and novel high-performance layout analysis methods. OCRopus is development is sponsored by Google and is initially intended for high-throughput, high-volume document conversion efforts. We expect that it will also be an excellent OCR system for many other applications.



http://code.google.com/p/ocropus/

Bookmark and Share          6501



comments powered by Disqus


Related Products

GOCR

GOCR is an OCR (Optical Character Recognition) program, developed under the GNU Public License. It converts scanned images of text back to text files. Joerg Schulenburg started the program, and now leads a team of developers.

Read more

Tesseract-ocr

The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but it is probably one of the most accurate open source OCR engines available. The source code will read a binary, grey or color image and output text. A tiff reader is built in that will read uncompressed TIFF images, or libtiff can be added to read compressed images.

Read more

Tessnet2

A .NET 2.0 Open Source OCR assembly using Tesseract engine.

Read more

JavaOCR

Java OCR is an Optical Character Recognition algorithm based on a mean squared recognizer. This tool also includes utilities to trace and extract characters.

Read more

Pyocrhelper - Python script to make OCR with ocropus easier for end users and/or developers

What is pyocrhelper? Ocropus is a high quality OCR software which accepts images files as input and outputs html text files. Ocropus is quite good at what it does. pyocrhelper is a python class which makes interacting with Ocropus easier for an end user or a developer by taking care of all the steps which have to be taken before Ocropus can be used: determine filetype of input file if (etc) pdf, convert pdf to images convert input image(s) to Ocropus input image run Ocropus on the input image(s)

Read more

Tesseract-ocr - An OCR Engine that was developed at HP Labs between 1985 and 1995... and now at Goog

BackgroundThe Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google and is probably one of the most accurate open source OCR engines available. Combined with the Leptonica Image Processing Library it can read a wide variety of image formats and convert them to text in over 40 languages. Important Download Information:The language data files are separate from

Read more

VietOCR

Provides optical character recognition (OCR) solutions for Vietnamese language.

Read more

ImageMagick

ImageMagick is a software suite to create, edit, and compose bitmap images. It can read, convert and write images in a variety of formats (over 100) including DPX, EXR, GIF, JPEG, JPEG-2000, PDF, PhotoCD, PNG, Postscript, SVG, and TIFF. Use ImageMagick to translate, flip, mirror, rotate, scale, shear and transform images, adjust image colors, apply various special effects, or draw text, lines, polygons, ellipses and Bézier curves.

Read more

Uyghurocr - OCR project to recognize Uyghur language

Background uyghurocr project trying to build an open source OCR software to recognize Uyghur language. uyghurocr will start job based on current open source ocr projects - Tesseract OCR and ocropus via enhancing their Arabic (right to left language) support and user interface. I will put everything step by step in order to make Tesseract support Uyghur language here. Later maybe I will create nice user interface and more feature even integrate it with ocropus project. Since most of the audience

Read more

Nepali-ocr - Nepali OCR

Nepali OCRA version of Nepali OCR is being developed in Visual Studio 2003 (.NET version 1.1) Currently tesseract-ocr can also be used for Nepali Characters. You can download the language files from the download section. Please put the language files in /usr/share/tessdata/ in linux. If you are using windows, place the tessdata folder with the executable tesseract.exe Tesseract ocrHomepage : http://code.google.com/p/tesseract-ocr Groups: http://groups.google.com/group/tesseract-ocr OCRopusHomepa

Read more

Related Tags
Browse projects by tags.

Follow feeds Follow bestopensource on Twitter Follow bestopensource on Facebook


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.

Do you provide Consulting, Training, Support for any open source products. Register your business

Tag Cloud >>