DocMatcher: Document Image Dewarping via Structural and Textual Line Matching

by Felix Hertlein, Alexander Naumann, York Sure-Vetter.
Venue    Notes   

Abstract

Despite the ongoing efforts to digitalize the world, there is still a vast amount of paper documents that need to be processed. Document image dewarping is a crucial step in the digitization process, as it aims to remove the distortions induced by challenging environment settings and document sheet deformations often encountered when using smartphone cameras for image capture. With better dewarping results, subsequent document analysis tasks, such as text recognition, information extraction, and classification, can be performed more accurately. Recently, deep learning-based methods were combined with knowledge about the expected document structure, also known as a template, at inference time to improve the dewarping results. While this approach has shown promising results, its utilization of the template information can be further improved. Our contributions in this work are threefold: (1) we propose a novel document image dewarping approach that leverages the prior knowledge about the document structure effectively by detecting and matching lines from the warped and the template domain, and (2) we introduce a novel evaluation metric called matched normalized character error rate (mnCER) to overcome the limitations of existing metrics in evaluating the dewarping process. (3) Finally, we evaluate our approach on the Inv3DReal dataset and show that our approach outperforms the state-of-the-art methods in terms of visual and text-based metrics. Our approach improves upon the state-of-the-art methods by 32.6% in Local Distortion and 40.2% in mnCER. Our code and models are available at https://github.com/FelixHertlein/doc-matcher.