Google is Using OCR to Expand its Reach

Written by: Christine Buske on Monday, November 3rd, 2008
Posted to: Google
Add a comment...

Google has grown from being just a search engine, to developing products and applications. Given that Google is still a search engine at heart, and loves indexing things, it is not surprising they would sooner or later index scanned documents. If you upload a scanned document to the Internet, Google will now not only find it, it will read the contents using Optical Character Recognition (OCR) technology.

Google says that using this technology it allows them to “convert a picture (of a thousand words) into a thousand words — words that can be searched and indexed, so that these valuable documents are more easily found.”

In the same blog post, Google states that:

This is a small but important step forward in our mission of making all the world’s information accessible and useful.

Being able to ‘read’ a scanned document has proven to be far more difficult than it seems. Documents are often creased, smudged, and stained. Aside from this, Google explains there are added complications, such as determining whether a circle is a zero, or the letter “O.” The technology isn’t exactly new though. For example, Evernote already reads and files your scanned notes.

Google had already worked on indexing PDF files, but scanned documents used to pose more of a challenge, until now. To see how well it works, consider the following example:

Here’s how a human would normally see a scanned document-

But since the text in the scan is actually all one big image search engines usually see nothing at all like this:

But with Google’s new technology they see and index this:

Andy alludes to the huge potential behind this seemingly small step for Google: Imagine if Google would develop a service for companies to scan and archive their important documents.

What do you think Google will use this technology for? Just for the search engine, or do you believe they will find a way to capitalize on it?

Don't miss another post! Subscribe by RSS feed or by email today!

Share this post!   No comments, be the first!

Comments are closed at this time.