Thursday, 4 April 2019

google chrome - How can I copy this quote from PDF?




Possible Duplicate:
PDF has garbled text when copy pasting



I'm reading a PDF copy of Jerome H. Friedman's paper "Data Mining and Statistics: What's the Connection?" using Google Chrome.


It contains an amusing quote that I want to copy and paste to my blog.


I used the mouse to select the text of the quote and pressed CTRL + C to copy the text. The document looks like this:


A highlighted quote from Jerome's paper.


When I paste the text into Notepad, Stack Overflow, or anywhere else, the product is Wingdings-like gibberish:



➣✍❺❼⑤➭✸❸❊➁❥❸❊⑥▼❽❾❸✘➎✳❸❾②❘➊➥❸❊❸❊⑥❦⑨❘②③✇▲➆ ②❥⑤⑩⑨❘②❥⑤⑩❽❾⑤⑩✇➄⑥▼⑨❏✇➄⑥▼❺➌❽❾❻➀➍♣➂⑦❶❼②❥❸❊➁❷⑨❥❽❾⑤❸❊⑥✗②❥⑤⑩⑨❘②③⑨✘⑤⑥☎②❥➇⑦⑤⑩⑨ ➔❸❊➅⑩❺➌⑨❹❸❊❸❊➍P⑨①②❥❻ ➎✳❸❏②❥➇▼✇▲②➟➊❚➇⑦❸❊⑥✆✇P⑨❘②③✇▲②❥⑤⑩⑨❘②❥⑤⑩❽❾⑤⑩✇➄⑥❦➇▼✇➀⑨↔✇➄⑥❦⑤⑩❺❼❸✶✇♣➇⑦❸❷❻➀➁↔⑨❹➇⑦❸❷➊❚➁❥⑤②❥❸✶⑨ ✇❨➂▼✇➄➂✳❸❊➁✶Þ⑦✇♣❽❾❻➀➍♣➂⑦❶❼②❥❸❊➁➟⑨❥❽❾⑤❸❊⑥✗②❥⑤⑩⑨❘②↔⑨❘②③✇➄➁❹②③⑨❚✇♣❽❾❻➀➍♣➂▼✇➄⑥☛➧➀➏



The text should instead look like this:



A difference between statisticians and computer scientists in this field seems to be that when a statistician has an idea he or she writes a paper; a computer scientist starts a company.



I had to type that text out manually. This is feasible for such a small quote, but how do I actually copy what I see?


Is it something unusual about the PDF, the browser, the plugin, or some combiniation of the three?



Answer



Most reliable way of doing it is by using OCR.


But as a dirty and fast solution you can use Google Quick View from the search result for your link, in Quick view use option View > Plain HTML.


It still contains some garbled text and is quite unreadable but a large amount of text is correct and copy-able. Search works here so you can use it to locate the target text and copy it without any garbled text.




Detailed Example here:
Google search results for URL includes Quick View link.


Then use View option Plain HTML.


The Quick View has an options to view the document as HTML.


On Google's HTML version, you can search and select the equivalent text like this:


Search the HTML verion to find and select the relevant quote.


Pasting into Notepad produces this output:

A difference between sta-tisticians and computer scientists in this field seems tobe that when a statistician has an idea he or she writesa paper; a computer scientist starts a company.



Not exactly as displayed, but close enough that you can work with it.


No comments:

Post a Comment

How can I VLOOKUP in multiple Excel documents?

I am trying to VLOOKUP reference data with around 400 seperate Excel files. Is it possible to do this in a quick way rather than doing it m...