Document Recovery with OCR and NLP

To digitize and recover text from documents which are not in good shape


To extract text from documents using OCR and fill in the places where the words are missing or mistakes in OCR outputs using Natural Language Processing


  • Do a document layout segmentation i.e. understanding which parts of the document is heading, image, body text, etc.
  • Try out handwriting recognition.
  • Use OCR or handwriting recognition to extract the text from image.
  • Filling in the missing details by using an NLP model to understand the context and fill.


