Possible Duplicate:
PDF has garbled text when copy pasting
I'm reading a PDF copy of Jerome H. Friedman's paper "Data Mining and Statistics: What's the Connection?" using Google Chrome.
It contains an amusing quote that I want to copy and paste to my blog.
I used the mouse to select the text of the quote and pressed CTRL + C to copy the text. The document looks like this:

When I paste the text into Notepad, Stack Overflow, or anywhere else, the product is Wingdings-like gibberish:
➣✍❺❼⑤➭✸❸❊➁❥❸❊⑥▼❽❾❸✘➎✳❸❾②❘➊➥❸❊❸❊⑥❦⑨❘②③✇▲➆ ②❥⑤⑩⑨❘②❥⑤⑩❽❾⑤⑩✇➄⑥▼⑨❏✇➄⑥▼❺➌❽❾❻➀➍♣➂⑦❶❼②❥❸❊➁❷⑨❥❽❾⑤❸❊⑥✗②❥⑤⑩⑨❘②③⑨✘⑤⑥☎②❥➇⑦⑤⑩⑨ ➔❸❊➅⑩❺➌⑨❹❸❊❸❊➍P⑨①②❥❻ ➎✳❸❏②❥➇▼✇▲②➟➊❚➇⑦❸❊⑥✆✇P⑨❘②③✇▲②❥⑤⑩⑨❘②❥⑤⑩❽❾⑤⑩✇➄⑥❦➇▼✇➀⑨↔✇➄⑥❦⑤⑩❺❼❸✶✇♣➇⑦❸❷❻➀➁↔⑨❹➇⑦❸❷➊❚➁❥⑤②❥❸✶⑨ ✇❨➂▼✇➄➂✳❸❊➁✶Þ⑦✇♣❽❾❻➀➍♣➂⑦❶❼②❥❸❊➁➟⑨❥❽❾⑤❸❊⑥✗②❥⑤⑩⑨❘②↔⑨❘②③✇➄➁❹②③⑨❚✇♣❽❾❻➀➍♣➂▼✇➄⑥☛➧➀➏
The text should instead look like this:
A difference between statisticians and computer scientists in this field seems to be that when a statistician has an idea he or she writes a paper; a computer scientist starts a company.
I had to type that text out manually. This is feasible for such a small quote, but how do I actually copy what I see?
Is it something unusual about the PDF, the browser, the plugin, or some combiniation of the three?
Answer
Most reliable way of doing it is by using OCR.
But as a dirty and fast solution you can use Google Quick View from the search result for your link, in Quick view use option View > Plain HTML.
It still contains some garbled text and is quite unreadable but a large amount of text is correct and copy-able. Search works here so you can use it to locate the target text and copy it without any garbled text.
Detailed Example here:

Then use View option Plain HTML.

On Google's HTML version, you can search and select the equivalent text like this:

Pasting into Notepad produces this output:
A difference between sta-tisticians and computer scientists in this field seems tobe that when a statistician has an idea he or she writesa paper; a computer scientist starts a company.
Not exactly as displayed, but close enough that you can work with it.
No comments:
Post a Comment