Saturday 21 September 2019

Count the number of words in a PDF file


How can I get the word count of a PDF file? I think that most pdf files for which I want to get total word count have text layer embedded, so I need no OCR.


The task was arisen from searching for some scientific papers of known size, e.g. 15000 words. Most moders papers are published in pdf format



Answer



Quick Answer:


pdftotext myfile.pdf - | wc -w

Long Answer:


If on Unix, you can use pdftotext:



and then do the word count in the generated file. If on Unix, you can use:


wc -w converted-pdf.txt

to get the word count.


Also, see the comment by frabjous - basically, you can do it in one step by piping to stdout instead to a temporary file:


pdftotext myfile.pdf - | wc -w

No comments:

Post a Comment

How can I VLOOKUP in multiple Excel documents?

I am trying to VLOOKUP reference data with around 400 seperate Excel files. Is it possible to do this in a quick way rather than doing it m...