Proposal for a bot to automatically OCR memes and other text-as-graphics posted to social media

Find a file

Simon Brooke 80e850ef04 Multi-tweet responses in web interface.		2019-03-09 12:07:36 +00:00
env	Added mechanism for splitting long output into a thread	2019-03-07 23:41:15 +00:00
resources	Multi-tweet responses in web interface.	2019-03-09 12:07:36 +00:00
src	Multi-tweet responses in web interface.	2019-03-09 12:07:36 +00:00
test	Added mechanism for splitting long output into a thread	2019-03-07 23:41:15 +00:00
.gitignore	Added responder bot	2019-02-18 23:43:27 +00:00
Capstanfile	Added a boilerplate luminus project	2019-02-17 12:35:35 +00:00
Dockerfile	Added a boilerplate luminus project	2019-02-17 12:35:35 +00:00
ireadit.immortal.yml	Using immortal for auto-restart on failure	2019-02-18 12:19:42 +00:00
LICENSE	Initial commit	2019-02-16 14:39:32 +00:00
Procfile	Added a boilerplate luminus project	2019-02-17 12:35:35 +00:00
project.clj	Test data, and a couple of unit tests.	2019-02-19 16:42:49 +00:00
README.md	A proper, useful, README	2019-02-19 08:54:16 +00:00

README.md

IReadIt

A bot to automatically transcribe memes and other text-as-graphics posted to social media into text.

Why this matters

People with visual impairments are able to read digital text by using 'screen reader' software, which scan the text written to the computer screen and read it aloud. But screen reader software cannot cope with pictures of text, because the digital text isn't there.

If no-one posted 'memes', or screen-shots, or photographs of text, to social media, people with visual impairments could take part in the conversation exactly as well as everyone else. In this sense, the Internet, while liberating for all of us, is especially liberating for people with visual impairments, because it's a domain where their impairment coould and should be minimised.

But people persist in posting 'memes', screen-shots, photographs of passages in books they are reading, emojis, and other images without explanatory text, and so this potential area of liberation becomes yet another area of frustration and humiliation.

The IReadIt bot is an attempt to mitigate this. The idea is, if there's a tweet with a picture in it, and you think that picture has text in it (if you have a serious visual impairment, there's no way you can know for certain), you can respond to the tweet with a tweet saying

@IReadItBot, please read this for me

Or, actually, just

@IReadItBot

and the bot will attempt to read it. It's a lot less than perfect, so the main takeaway is:

You wouldn't deliberately park in a disabled bay; you wouldn't deliberately block a wheelchair ramp. Don't post text-as-graphics to the Internet without a full transcription: it's lazy, selfish, disrespectful, boorish and unkind. Just don't do it, ever, OK?

Specification

The specification is being developed on the wiki. Feel free to participate.

Technology and limitations

Basically the bot has three main components:

The Tesseract optical character recognition library, without which this would not be possible at all;
A web service wrapper, written in Clojure, which uses Tess4J to invoke Tesseract on images posted to it;
A Twitter bot, also written in Clojure, which listens for tweets memtioning it and responds with a transcription generated by Tesseract of the text on image on the preceding tweet, if any.

It's not perfect!

Things which can be improved

Currently, the bot scans only the first image on a tweet issue.
Currently, the bot posts only the first 240 characters read from the image issue.
Currently, the bot makes no attempt to guess what language the text is likely to be in, but assumes English issue.
Currently, the bot is quite fragile, and often crashes issue.
It's possible that the text could be improved by filtering through spelling and grammar checkers issue.
It currently only works for Twitter. Getting it working for other social networks could be done, but it's not something I'm currently planning.

All these should be fixed, but they will each take time and I thought it would be better to get something working soon than wait for perfection.

Limitations of Tesseract

Tesseract makes a pretty good job of scanning text, but it is inherently quite a hard issue.

It doesn't cope at all well with multi-column text, or text with different areas, for example boxouts.
It doesn't cope well with text on busy backgrounds, which, for example, many memes have.

These are limitations which I can't really address; they're outside the scope of this project.

Limitations of Tess4J

The Tess4J interface layer often crashes hard, blowing away the JVM and consequently the whole bot. This could be addressed by putting Tesseract in a different process communicating over a port; using J Taylor's Go implementation of a Tesseract web service might improve things because in my testing it has seemed much more reliable, but it would need some configuration issues sorted out.

Alternatively, someone could put some work in making Tess4J more stable. It seems to be a problem in Tess4J rather than in Tesseract itself, so some work here might benefit not only this project but others.

Things which cost

Running a web service isn't free. Optical character recognition is quite compute intensive; if this service becomes popular, it will cost real money to run, in server rental; it will also need someone to look after it, and there's a limit to what I can do in 'spare' time.

So if this turns out to be something which is genuinely useful to a lot of people, then it will need either to be sponsored or crowdfunded.

## How you can help

I'll be very happy to accept pull requests, particularly if they address issues which are already raised on the issue tracker.

## Licence and copyright

Copyright © 2019 Simon Brooke. Licensed under the GNU General Public License, version 2.0 or (at your option) any later version. If you wish to incorporate parts of this project into another open source project which uses a less restrictive license, please contact me; I'm open to dual licensing it.