How can I get the content of pdf file line by line in python? I have searched in stackoverflow but could not find any good answer. Notes: pyPdf gives assertion erro, if possible something with slate and pdfminer.
from the command line:python /path/to/pdf2txt.py -o text.txt /path/to/yourpdf.pdf
You can then just take the text file it makes and use for line in file:
If you want to be efficient you would have to change pdf2txt.py, and have outfp
be a python iostring, which would avoid the making a file and then reading from it.