Doc Transformers

Document processing using transformers. This is still in developmental phase, currently supports only extraction of form data i.e (key - value pairs)

pip install -q doc-transformers

Pre-requisites

Please install the following seperately

sudo apt install tesseract-ocr
pip install -q detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.8/index.html

Implementation

# loads the pretrained dataset also 
from doc_transformers import form_parser

# loads the image
image = form_parser.load_image(input_path_image)

# gets the bounding boxes, predictions, extracted words and image processed
bbox, preds, words, image = form_parser.process_image(image)

# returns image and extracted key-value pairs along with title as the output
im, df = form_parser.visualize_image(bbox, preds, words, image)

# process and returns k-v pairs by concatenating relevant strings.
df_main = form_parser.process_form(df)

Results

Input & Output

Table

After saving to csv the result looks like the following

LABEL	TEXT
title	CREDIT CARD VOUCHER ANY RESTAURANT
title	ANYWHERE
key	DATE:
value	02/02/2014
key	TIME:
value	11:11
key	CARD
key	TYPE:
value	MC
key	ACCT:
value	XXXX XXXX XXXX
value	1111
key	TRANS
key	KEY:
value	HYU8789798234
key	AUTH
key	CODE:
value	12345
key	EXP
key	DATE:
value	XX/XX
key	CHECK:
value	1111
key	TABLE:
value	11/11
key	SERVER:
value	34
value	MONIKA
key	Subtotal:
value	$1969
value	.69
key	Gratuity: Total:

Please note that this is still in development phase and will be improved in the near future