重庆理工大学学报(自然科学) ›› 2024, Vol. 38 ›› Issue (1): 150-159.

• 信息计算机 • 上一篇    下一篇

混合相似性度量的仪表询价电子表格结构识别

徐传运,马莹丽,李刚,舒涛,李星光   

  1. 重庆理工大学两江人工智能学院,重庆师范大学计算机与信息科学学院
  • 出版日期:2024-02-07 发布日期:2024-02-07
  • 作者简介:徐传运,男,博士,教授,主要从事机器学习及应用研究,Email:xcy@cqnu.edu.cn;通信作者李刚,男,博士,教授,主要从事智能信息处理和机器视觉研究,Email:ligang@cqut.edu.cn

Hybrid similarity metric for instrument quotation spreadsheet structure recognition

  • Online:2024-02-07 Published:2024-02-07

摘要: 对仪表企业来说,快速高效地自动响应用户的询价请求,实现无人化询价,具有非常重要的意义。但不同用户提供的物料清单表无统一规范的格式,导致仪表企业只能获得半结构化的询价电子表格,无人化询价系统难以分析与理解。构建无人化询价系统的关键是准确地自动提取仪表参数,而提取参数的前提是正确理解表格结构。因此,以构建无人化询价系统为目标,研究仪表询价电子表格的结构识别,提出混合相似性度量表格结构识别方法(hybridsimilaritymetricsfortablestructurerecognition,HSMTSR)。所提方法结合Levenshtein距离、Dice系数和单元格类型相似度(celltypesimilarity,TySim),根据单元格和行数据的相似度解析识别表格结构。同时,建立流量仪表电子表格数据集(flowmeterspreadsheetdataset,FSDS)研究分析仪表询价电子表格的结构,包括714个电子表格,8574行数据。实际应用表明,所提方法可准确高效地自动识别多种复杂结构的仪表询价电子表格,并在多个评价指标上均取得较好效果

关键词: 电子表格, 结构识别, 相似性度量, 类型相似度, 仪表询价

Abstract: For instrumentation companies, it is of great significance to quickly and efficiently automate the response to users’ request for quotation and to realize unmanned quotation. Nevertheless, there is no unified and standardized format for the bill of materials spreadsheets provided by different users, resulting in semi-structured quotation spreadsheets for instrumentation companies and creating difficulties for unmanned quotation systems to perform analysis. The key to building an unmanned quotation system is to accurately automate the extraction of meter parameters, which presupposes a proper understanding of the spreadsheet structure. Therefore, with the goal of building an unmanned quotation system, this paper studies the structure recognition of instrument quotation spreadsheets and proposes hybrid similarity metrics for table structure recognition (HSMTSR). With Levenshtein distance, Dice coefficient and cell type similarity (TySim), this approach identifies spreadsheet structures based on the similarity resolution of cell and row data. Meanwhile, flowmeter spreadsheet dataset (FSDS) is built to analyze the structure of meter quotation spreadsheet, including 714 spreadsheets with 8 574 rows of data. Practical applications show the method accurately and efficiently automates the identification of multiple complex structures of instrument quotation spreadsheets, and achieves superior results in several evaluation metrics.

中图分类号: 

  • TP391