Journal of Chongqing University of Technology(Natural Science) ›› 2024, Vol. 38 ›› Issue (1): 150-159.

• Information and computer science • Previous Articles     Next Articles

Hybrid similarity metric for instrument quotation spreadsheet structure recognition

  

  • Online:2024-02-07 Published:2024-02-07

Abstract: For instrumentation companies, it is of great significance to quickly and efficiently automate the response to users’ request for quotation and to realize unmanned quotation. Nevertheless, there is no unified and standardized format for the bill of materials spreadsheets provided by different users, resulting in semi-structured quotation spreadsheets for instrumentation companies and creating difficulties for unmanned quotation systems to perform analysis. The key to building an unmanned quotation system is to accurately automate the extraction of meter parameters, which presupposes a proper understanding of the spreadsheet structure. Therefore, with the goal of building an unmanned quotation system, this paper studies the structure recognition of instrument quotation spreadsheets and proposes hybrid similarity metrics for table structure recognition (HSMTSR). With Levenshtein distance, Dice coefficient and cell type similarity (TySim), this approach identifies spreadsheet structures based on the similarity resolution of cell and row data. Meanwhile, flowmeter spreadsheet dataset (FSDS) is built to analyze the structure of meter quotation spreadsheet, including 714 spreadsheets with 8 574 rows of data. Practical applications show the method accurately and efficiently automates the identification of multiple complex structures of instrument quotation spreadsheets, and achieves superior results in several evaluation metrics.

CLC Number: 

  • TP391