PDA

View Full Version : OCR



ClockWork
7th September 2006, 09:18 AM
Hi guys,

Have been fiddling around with Optical Character Recognition software for Mac OS X. Do any of you recall OmniPage for Mac OS 9.x ? Often when one purchased a scanner, you'd get OmniPage LE (Limited Edition) thrown in, and it did a pretty good job at reading printed characters off a page and saving over to MS Word.

It wasn't 100% perfect, yet it was good enough.

So now I've played with it in both Panther and Tiger. I used what came with my scanner, which is ABBYY Fine Reader. What a pile of dingo's kidneys! The amount of corrections one would need to make after ABBYY has read a page would equal the same amount of time one would need to hand type the page.

ABBYY Fine Reader is a cruel lame joke to OCR software. Not only can it not read english characters correctly, it's a two stage process. First you have to scan the page in as a high res image and then Fine Reader reads that image - so it can't read the page and just convert straight to a word processor.
(After that, it comes out looking a little like ASCII garbage...)

The only other alternative for OS X is the updated OmniPage - OmniPage Proffessional X (http://www.citysoftware.com.au/Browse/57519462185d433b9d84c30d1c81eb98001ItemDetail.aspx ) - but check out the price (http://www.citysoftware.com.au/Browse/57519462185d433b9d84c30d1c81eb98001ItemDetail.aspx )!

What a blow out! $838! It's OCR - not some upgrade to InDesign.

If we hadn't been so dim at the time any of us were using OmniPage LE in Mac OS 9.x and had actually bought the OS 9 Limited Edition, Nuance offers the OS X upgrade of Omnipage Pro @ $227.00 - yet naturally it's impossible to beam into the future to realize the price would go through the roof like that.

Does anyone know of a kinder alternative?

cheers,

ClockWork

Quamen
7th September 2006, 09:35 AM
Google loves you. Google wants you to have it for free. I'm just not sure how much work you (or someone else) will need to do before the engine is usable.
http://google-code-updates.blogspot.com/20...seract-ocr.html (http://google-code-updates.blogspot.com/2006/08/announcing-tesseract-ocr.html)

natakim
8th September 2006, 05:24 AM
hey clockers, cheers from the netherlands :D

have you tried readiris, i've used version 9 and it was ok, fairly good catch rate.

a quick google shows they are up to version 11 now, ohh time to update ;)

http://www.irislink.com/c2-73/Readiris-Pro...R-software.aspx (http://www.irislink.com/c2-73/Readiris-Pro-11-for-Mac-OCR-software.aspx)

cheers

ClockWork
9th September 2006, 05:23 AM
Thanks Natakim,

Readiris looks pricey too - yet worth keeping in mind over the insane OmniPage Pro X. I think I've figured the cheapest solution. Run ye olde OmniPage LE from an old iMac G3 in Mac OS 9.1 with MS Office 2001 or 98 on board as well.

(Strange that the new OCRs all seem to require a two-stage process - scanning text into PDF and then reading the PDF - kinda one step back from Mac OS 9).

Have fun wittling away your Euros!

cheers,

cw

JimWOz
9th September 2006, 07:27 AM
Clockwork, Another vote for ReadIris here.
I've had good results on reasonably clean fax copies, provided there are no handwriting markups there. That really screws things. You need to erase these with something like GraphicConverter first.
If you put a clean pdf (made from a Word Doc say) into ReadIris, it's nearly perfect.
Acrobat will decypher a page of clean text too, but if there is much in the way of page layout, graphics, tables etc. it just returns a full page image when it converts it to a Word Doc.
ReadIris collects all the bits of layout as seperate frames and interprets each one seperately - you can even deselect some if you don't want them converted.
So whilst the layout it reproduces might not be perfect, in terms of columns and tables, you have all the text interpreted, and are able to reformat it yourself.

rhb
9th September 2006, 03:41 PM
I use ReadIris 7 which came free with a magazine Cd ( I forget which) a couple of years ago. Quite happy with it.
Vuescan can do OCR but not very accurate.

g5agogo
9th September 2006, 06:20 PM
Just an FYI that Omnipage SE used to (edit: and probably still does) come with Canon MP multifunction printers.

Does a good job on clean scans and on mixed text & graphics, but I find you have to assign each area manually if they are mixed as the auto-assign feature seems to get a bit confused.


Cheers.