I want to start a personal project where I scan, OCR and index markdown for old books. This is a book with ALL of Romania’s roads back in 1974. It has tables and maps and all sorts of other interesting historical data points.
I already have some idea of data engineering. I’m a software engineer and I’ve made a project that helps with RAG, search and indexing of markdown files (even very big ones). My problem is the OCR part. Any tips?
Originally posted by u/alexlazar98 on Reddit.com/r/datahoarder
beep boop I’m a bot to seed discussions from Reddit. Upvote or downvote posts like normal, discuss the topics here as well!
You must log in or register to comment.