Return to site

Java pdf to text converter

broken image
broken image
broken image

Correctly got the big capitals at start of sections, e.g. Junk that was hidden in the PDF did not get output. My second choice is ebook-convert.Īdobe: left in FF for page breaks, left in page numbers, hasn't converted headings/paragraphs to single lines, but it has fixed hyphens. I've been comparing the output side-by-side. (I am pre-processing for text analysis experiments, not as a reader, but I think my first and second choice would be the same.) As a fan of open source (and automation) I hate to say this, but the best results I just got (on quite a large, complex PDF) were to open it in Adobe Reader, then choose File|Save As Text.

broken image