Document Recovery with OCR and NLP
Goal:
To digitize and recover text from documents which are not in good shape
Objective:
To extract text from documents using OCR and fill in the places where the words are missing or mistakes in OCR outputs using Natural Language Processing
Method:
- Do a document layout segmentation i.e. understanding which parts of the document is heading, image, body text, etc.
- Try out handwriting recognition.
- Use OCR or handwriting recognition to extract the text from image.
- Filling in the missing details by using an NLP model to understand the context and fill.
Leave a Comment