October, 2022

The goal of CeADAR (University College Dublin) in the InterQ project is to digitalise the input generated by factory operators during production. CeADAR, as Ireland’s national centre for Applied AI, is specialized on providing companies with solutions that innovate in the AI, Machine Learning and Data Analytics fields. Recently, CeADAR has been designated as the EU AI Digital Innovation Hub in Ireland.

Up to this day, the companies collaborating in InterQ have been collecting handwritten data concerning the production quality and storing it physically without exploiting the information. In the context of managing the production quality, these annotated observations are an interesting source of data for detecting weariness as the products measurements may slip out of tolerance levels.

The technology solution from CeADAR implies the use of Computer Vision processing techniques and Deep Learning classification to digitalise these data. The application of Computer Vision solves the problem of identifying documents, associating them to their templates and text segmentation. Through a residual neural network (ResNet, the individual characters are recognised as a classifier has been trained with an extended MNIST dataset, containing 0 to 9 digits and the 26 capital letters composing the Latin alphabet. This type of neural networks has proved a high classification accuracy both with digits and other datasets such as clothes images.

    This InterQ task solution will be deployed as a WebApp hosted by CeADAR, where the interested companies can submit their new templates for either fully automatic or supervised processing, creating a PDF from the submitted Excel file that can be printed in the desired format as well as being stored with all the information related to this process in the server. This PDF will contain a QR able to recognise the original document once a filled copy is scanned and submitted to the same WebApp either as an image or a PDF.

    Once the scanned document is identified, the positions for each cell are computed and their content is extracted and sent to the ResNet classifier, where a character and the probability of correct recognition are computed. This result is then used to fill the XLSX copy of the original template, which is returned by the WebApp with all the recognised data transcribed and visual indicators about the reliability of each character prediction, for a convenient supervision of the digitalised document.

    Check out the next video where the technology is tested in the Wind Energy use case: