How can I get the word count of a PDF file? I think that most pdf files for which I want to get total word count have text layer embedded, so I need no OCR.
The task was arisen from searching for some scientific papers of known size, e.g. 15000 words. Most moders papers are published in pdf format
Answer
Quick Answer:
pdftotext myfile.pdf - | wc -w
Long Answer:
If on Unix, you can use pdftotext
:
and then do the word count in the generated file. If on Unix, you can use:
wc -w converted-pdf.txt
to get the word count.
Also, see the comment by frabjous - basically, you can do it in one step by piping to stdout
instead to a temporary file:
pdftotext myfile.pdf - | wc -w
No comments:
Post a Comment