unstatusthequo
Yes this is great. I looked into stitching these together but always figured it would be a huge undertaking. Consider looking at TensorFlow for OCR which should be much better and maybe faster.
namanyayg
nice work on this! i've been looking for something like this to manage my own docs.

one thing that caught my eye was the mention of 'proper stemming support' - can you elaborate on how you're handling stemming? are you using a specific library or rolling your own implementation? also, have you considered adding any sort of faceting/search filtering to the results?

compressedgas
Do the search results have document page numbers?