Document Recovery with OCR and NLP

less than 1 minute read

Goal:

To digitize and recover text from documents which are not in good shape

Objective:

To extract text from documents using OCR and fill in the places where the words are missing or mistakes in OCR outputs using Natural Language Processing

Method:

  • Do a document layout segmentation i.e. understanding which parts of the document is heading, image, body text, etc.
  • Try out handwriting recognition.
  • Use OCR or handwriting recognition to extract the text from image.
  • Filling in the missing details by using an NLP model to understand the context and fill.

Leave a Comment