Skip to the content.

Doc Transformers

Document processing using transformers. This is still in developmental phase, currently supports only extraction of form data i.e (key - value pairs)

pip install -q doc-transformers

Pre-requisites

Please install the following seperately

sudo apt install tesseract-ocr
pip install -q detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.8/index.html

Implementation

# loads the pretrained dataset also 
from doc_transformers import form_parser

# loads the image
image = form_parser.load_image(input_path_image)

# gets the bounding boxes, predictions, extracted words and image processed
bbox, preds, words, image = form_parser.process_image(image)

# returns image and extracted key-value pairs along with title as the output
im, df = form_parser.visualize_image(bbox, preds, words, image)

# process and returns k-v pairs by concatenating relevant strings.
df_main = form_parser.process_form(df)

Results

Input & Output

Table

LABEL TEXT
title CREDIT CARD VOUCHER ANY RESTAURANT
title ANYWHERE
key DATE:
value 02/02/2014
key TIME:
value 11:11
key CARD
key TYPE:
value MC
key ACCT:
value XXXX XXXX XXXX
value 1111
key TRANS
key KEY:
value HYU8789798234
key AUTH
key CODE:
value 12345
key EXP
key DATE:
value XX/XX
key CHECK:
value 1111
key TABLE:
value 11/11
key SERVER:
value 34
value MONIKA
key Subtotal:
value $1969
value .69
key Gratuity: Total: