Why are faxes so hard for OCR systems to read?

Posted on: January 31, 2020

Summary: Optical Character Recognition (OCR) systems are typically excellent at reading structured, machine-printed forms. These forms have fixed locations and are typically printed by a computer, making for crisp characters and certain locations. Reading faxes has been, and continues to be, difficult for most OCR Document Capture products in the industry. This increases the volume of forms that must be manually data entered via exception processes, slowing the process and increasing costs. This article looks at the reasons why faxes are a challenge for OCR and some approaches to solving the problem so that faxes can be turned into data more effectively.

Is Fax Still a Problem?

Millions of faxes are sent and received each day across the healthcare, insurance and financial services industry. While it may be tough for Millennials to believe, especially given the use of the internet and mobile devices, fax remains a common form of document sharing.  I started a project in 2019 to determine the volume and cost of faxing in the U.S. After some work on this I determined that no solid stats exist, but the volume is measured in billions of faxes per year. Costs seemed to settle around $1-2 per sent fax and $5-15 per incoming fax.  Clearly billions of dollars are spent on the fax.  Much of these costs are in the manual cost of creating and sending the fax for the sender and the receipt, data entry and manual processing of fax for the recipient.

Can you Automate Faxes?

Many users of fax systems think that most faxes are automated. That’s partly true.  Fax Servers emerged in the 1990’s to make sending a fax simpler and easier. While traditional fax machines do exist, and are used for sending faxes, the volume has shifted to fax servers. In my unscientific survey of faxes in 2019, the best information I could find was that over 60% of outbound faxes are sent via server. That’s great for the sender, but not so great for the recipient of those faxes. They have the real challenge as automation technologies like Optical Character Recognition (OCR) technologies have struggled to “read” faxes and turn the fax image into data.

OCR looks at the image of the faxed document and attempts to turn the image into data. Here is a simple example. A fax of an invoice comes in.  On the topic of the document is an image of the vendor name “Acme”. OCR examines that group of pixels, and interprets the word as the characters A, C, M and E. OCR then tells the system the vendor name is Acme, as if someone had entered the data via keyboard.

The Challenges with Faxes

There are a number of challenges here. Most of them with the image of the fax coming in.  If you have ever seen a fax in the wild you probably have experienced a hard to read fax.  The general rule has been that if you are having trouble reading it, OCR software will have an even harder time. Incoming fax images have one or more of the following problems:

  1. Quality:  Faxes are often sent at a lower quality than a document you might print on a printer.  This is typically measured in Dots Per Inch or DPI.  The standard setting on most fax machines is 100 X 200 DPI.  This results in a low quality document.  Other higher quality fax modes are available, but are often not used.
  2. Noise: Faxes are susceptible to noise.  Noise are specks, dots, lines and other visual markings on the page that were not intended by the sender but come though to the receiver.  There are many causes for this, but it can really make it difficult for OCR to do its job.
  3. Distortion: Fax images can be stretched or shrunk in the faxing process. Faxes can also be distorted in a variety of ways, making them difficult for humans or machines to read.

The Solution: OCR + AI

With all these challenges, how do you read a low quality, noisy and distorted fax? The answer is that OCR was given a significant boost in the last several years through the use of Artificial Intelligence.  With modern OCR systems AI can improve the fax image through running a number of image processing algorithms and settings, finding the best combination of settings to make the image more readable for OCR.  We are finding that AI can dramatically improve the ability for OCR to read faxes.

Interested in learning more?  We have an upcoming webinar on the topic that you might be interested in.

Webinar – Turning Faxes into Data Using AI and OCR – An Overview

On Wednesday, March 4, 2020 at 1:00pm EDT BRYJ will be hosting a webinar on turning faxes into data with AI. This 40-minute webinar is oriented to organizations that are looking to reduce costs, improve customer experience and speed up key processes by turning faxes and other documents into data with AI and OCR. The webinar is intended for both business and technical users to provide an understanding what AI and OCR are and how they can apply the technologies in 2020.

You can Learn More and Register Here: https://www.bryjinc.com/bryj-events/