Optical Character Recognition with PyTesseract
In week five of "Ten Weeks of Plugins", a series dedicated to building FiftyOne plugins, we discuss Optical Character Recognition (OCR) and Keyword Search. The PyTesseract OCR plugin leverages the Tesseract OCR engine to perform optical character recognition on samples in a dataset, while the Keyword Search plugin allows users to search within labels generated by the first plugin. These two plugins combined enable searching through documents like pages of old books, handwritten notes or resumes based on their textual content.
Company
Voxel51
Date published
Sept. 21, 2023
Author(s)
Jacob Marks
Word count
2148
Language
English
Hacker News points
None found.