Copying text from a PDF (Was: Re: Reply message format)

On Mon, 12 Jul 2004 07:30:51 -0400, Peter T. Daniels
<grammatim@...> wrote:

> I guess you're just so much more sophisticated than I am, but I can't
> copy and paste text out of a .pdf. You'll just have to look it up for
> yourself.

Just to be sure of what I suspected, I opened the PDF file of the Preface
to the Unicode Standard, and copied some text from the screen. I used the
Text Selector tool in Acrobat Reader 6 on an "x86" machine running
[corrupted] Win 98 SE.

Copied text:
"This book, The Unicode Standard, Version 4.0, is the authoritative source
of information on the Unicode character encoding standard."

Some text in a PDF is in image form, with no data to permit simple
recognition of the underlying text bytes(there aren't any). Such text
would require OCR to convert it into conventional text-file format.
However, most text in a PDF can be copied to the Clipboard (or Mac/Linux
equivalent) about as easily as any other text. (It might not have been
possible in older versionsof the Acrobat Reader; I don't recall.)

If someone wants to copy text from a PDF (probably keeping copyright in
mind), and is having difficulty, it might be a good idea to ask for help
from someone who knows how. It is not difficult.

OCR: Optical Character Recognition -- The process, done by scanning an
image and interpreting the image as text, of providing that text as a text
file. Like converting speech to text, it's not simple nor easy if done
well.

--
Nicholas Bodley /*|*\ Waltham, Mass.
Opera 7.5 (Build 3778), using M2