Selv wrote:
>Quick question - do you have access to OCR (optical character recognition)
>software? If not, I would certainly be willing to take the TIFF files, run
>them through the OCR I have, and send them back. They would then have to
>be heavily edited, as OCR tends to be about 90-95% accurate in general
>(I've found it to be less accurate for vowels with accents and umlauts,
>though), but at least it's a bit easier than typing in the whole book.
Please take a look at where the Runeberg archives are.
You will see that they are looking for proofreaders to check OCR ascii files
against graphic images. I downloaded a few of them and proofread a couple
of pages of the socalled "raw" OCR output (ascii). To my surprise I couln't
find a single error! (it was Rydberg's Fädarnas Gudasaga that I looked at)
The reason why I did not continue, is because it was a hell of a lot of
work, just to download all the files (2 files for each page) and organize
them on my disk. You need a script language for such big jobes (500 pages +)
Then you can leave the computer on, and let it do the job while you
eat, sleep, go to the cinema or whatever. You guys who volunteer to publish
a dictionary on the web are courageous indeed! There was one particular
server in Norway that had typed up quite a large collection of sagas
and edda poems. But when I tried to "rectify" some of them for private
use, I had to give up, because there were so many typos that after I
was through with a page, the whole page was full of red ink. I wondered
then, if it wouldn't be easier just to type everything up from scratch!
Best regards