Publications‎ > ‎

Open World Classification of Printed Invoices

  • Enrico Sorio, Alberto Bartoli, Giorgio Davanzo, Eric Medvet
  • 10th ACM Symposium on Document Engineering (DocEng), 2010, Manchester (United Kingdom)
  • Google Scholar
  • Download: pdf


A key step in the understanding of printed documents is their classification based on the nature of information they contain and their layout. In this work we consider a dynamic scenario in which document classes are not known a priori and new classes can appear at any time. This open world setting is both realistic and highly challenging. We use an SVM-based classifier based only on image-level features and use a nearest-neighbor approach for detecting new classes. We assess our proposal on a real-world dataset composed of 562 invoices belonging to 68 different classes. These documents were digitalized after being handled by a corporate environment, thus they are quite noisy---e.g., big stamps and handwritten signatures at unfortunate positions and alike. The experimental results are highly promising.

[Last updated: Sun Oct 25 2020 00:04:07 GMT+0200 (CEST)]