MIDS Capstone Project Fall 2023

Second Sight

Our project was motivated by the strong need for more affordable solutions for those living with visual impairment. In the US alone, 20 million people are affected by visual impairments that make it difficult for them to read and process text. Of the 20 million affected in the US, 27% live below poverty level. While text reading tools exist on the market, many come with high prices tags.

Our application - Second Sight - looks to provide a mobile text reading experience for those struggling with vision impairment who are seeking more affordable care. We have designed an application which combines modern advances in optical character recognition and text-to-speech capabilities to provide an app that is safe and free for use for all.

Our app targets users with presbyopia, a condition that makes it hard for middle-aged and older adults to see things up close.  User can download our app in the Apple app store.  They can activate the app with voice enabled prompts and use the simple interface to take photos of text.  The text will then be read aloud to them.  If there is any uncertainty in the read out, a warning will be issued to recommend that the user exercise caution.

Our project leverages TrOCR, a pre-trained transformer that is highly regarded in the OCR space.  We conducted testing using both handwritten, typed, and fine tuned versions of the TrOCR model on typed and handwritten datasets.  The typed text model worked well out of the box, achieving 93% accuracy on our test dataset. However, accurately identifying handwriting is a more difficult task.  The handwritten TrOCR model needed to be fine tuned and then run through a spell check pipeline to achieve 72% accuracy. Evaluation of our model included looking at Character Error Rate (CER), letter accuracy, and word accuracy.

Throughout our project we learned that creating a model that could handle both typed text and handwritten text would be a challenge, and a full solution may require two separate models which get applied seamlessly depending on if the text appears to be handwritten or text. We also realized that while we were committed to keeping all processing on the phone so that internet is not required (allowing our app to be free for users to use), that will limit how large and therefore complex our models can be.   We also found that while pretrained transformers were a great place to start and could get pretty good performance out of the box on typed text, they'd need to be fine tuned for our use cases. Finding good datasets for the fine tuning was difficult.  In the end, we resorted to creating our own dataset, which was arduous but helped performance.

As we continue to move beyond the MVP stage, we know we need to continue to work on the generalizability of our model.  We plan to continue end to end testing, ensuring our models work in some challenging edge cases, such as curved text and currency detection. We’d also like to add non English languages to the model and consider ways to reduce the size of the model so we can have greater flexibility in how we deploy it. As for the application, we want to launch to a broader community of users to do more testing, so we can collect more user feedback.

At Second Sight, we are on a mission bring autonomy and safety to those struggling with visual impairment by providing accessible technology solutions that assist with recognition and understanding of text data.  By developing a product that is accessible to all, we are on a mission to democratize text recognition technologies and aid a population of people worldwide who are struggling with vision impairment or weaknesses.

Last updated:

December 12, 2023