计算机应用 ›› 2015, Vol. 35 ›› Issue (7): 1945-1949.DOI: 10.11772/j.issn.1001-9081.2015.07.1945

• 人工智能 • 上一篇    下一篇

服装类商品属性实体识别

周详1, 李少波1,2, 杨观赐2   

  1. 1. 中国科学院 成都计算机应用研究所, 成都 610041;
    2. 现代制造技术教育部重点实验室(贵州大学), 贵阳 550003
  • 收稿日期:2015-02-04 修回日期:2015-04-04 出版日期:2015-07-10 发布日期:2015-07-17
  • 通讯作者: 周详(1989-),男,河南宜阳人,硕士研究生,主要研究方向:自然语言处理,codant@163.com
  • 作者简介:李少波(1973-),男,湖南岳阳人,教授,博士生导师,博士,主要研究方向:计算智能、智能系统; 杨观赐(1983-),男,湖南嘉禾人,副教授,博士,主要研究方向:计算智能、智能系统。
  • 基金资助:

    国家科技支撑计划项目(2012BAF12B14);国家自然科学基金资助项目(51475097)。

Entity recognition of clothing commodity attributes

ZHOU Xiang1, LI Shaobo1,2, YANG Guanci2   

  1. 1. Chengdu Institute of Computer Application, Chinese Academy of Sciences, Chengdu Sichuan 610041, China;
    2. Key Laboratory of Advanced Manufacturing Technology, Ministry of Education (Guizhou University), Guiyang Guizhou 550003, China
  • Received:2015-02-04 Revised:2015-04-04 Online:2015-07-10 Published:2015-07-17

摘要:

针对服装类商品标题中的商品属性实体识别问题,提出了一种边界探测规则与条件随机场(CRF)相结合的混合方法。首先,使用统计方法挖掘隐蔽的实体提示字信息;然后,以字为粒度对三种统计成词指标及其内涵进行了阐释;接着,基于统计成词指标和提示字信息设计了实体边界探测规则;最后,基于经验风险最小化给出了规则中阈值的确定方法。在与字标注的CRF模型的对比实验中,总体准确率、召回率、F1值分别提升了1.61%、2.54%和2.08%,验证了对于实体边界探测规则的有效性。所提方法可用于电子商务信息检索(IR)、电子商务信息抽取(IE)、查询意图识别等任务。

关键词: 命名实体识别, 服装类商品, 条件随机场, 电子商务

Abstract:

For the entity recognition of commodity attributes in clothing commodity title, a hybrid method combining Conditional Random Field (CRF) with entity boundary detecting rules was proposed. Firstly, the hidden entity hint character messages were obtained through a statistical method; secondly, statistical word indicators and their implications were interpreted with a granularity of character; thirdly, entity boundary detecting rules was proposed based on the entity hint characters and statistical word indicators; finally, a method for identifying threshold values in rules was proposed based on empirical risk minimization. In the comparison experiments with character-based CRF models, the overall precision, recall and F1 score were increased by 1.61%, 2.54% and 2.08% respectively, which validated the efficiency of the entity boundary detecting rule. The proposed method can be used in e-commerce Information Retrieval (IR), e-commerce Information Extraction (IE) and query intention identification, etc.

Key words: Named Entity Recognition (NER), clothing commodity, Conditional Random Field (CRF), e-commerce

中图分类号: