Journal of Computer Applications ›› 2015, Vol. 35 ›› Issue (7): 1945-1949.DOI: 10.11772/j.issn.1001-9081.2015.07.1945

Previous Articles     Next Articles

Entity recognition of clothing commodity attributes

ZHOU Xiang1, LI Shaobo1,2, YANG Guanci2   

  1. 1. Chengdu Institute of Computer Application, Chinese Academy of Sciences, Chengdu Sichuan 610041, China;
    2. Key Laboratory of Advanced Manufacturing Technology, Ministry of Education (Guizhou University), Guiyang Guizhou 550003, China
  • Received:2015-02-04 Revised:2015-04-04 Online:2015-07-10 Published:2015-07-17

服装类商品属性实体识别

周详1, 李少波1,2, 杨观赐2   

  1. 1. 中国科学院 成都计算机应用研究所, 成都 610041;
    2. 现代制造技术教育部重点实验室(贵州大学), 贵阳 550003
  • 通讯作者: 周详(1989-),男,河南宜阳人,硕士研究生,主要研究方向:自然语言处理,codant@163.com
  • 作者简介:李少波(1973-),男,湖南岳阳人,教授,博士生导师,博士,主要研究方向:计算智能、智能系统; 杨观赐(1983-),男,湖南嘉禾人,副教授,博士,主要研究方向:计算智能、智能系统。
  • 基金资助:

    国家科技支撑计划项目(2012BAF12B14);国家自然科学基金资助项目(51475097)。

Abstract:

For the entity recognition of commodity attributes in clothing commodity title, a hybrid method combining Conditional Random Field (CRF) with entity boundary detecting rules was proposed. Firstly, the hidden entity hint character messages were obtained through a statistical method; secondly, statistical word indicators and their implications were interpreted with a granularity of character; thirdly, entity boundary detecting rules was proposed based on the entity hint characters and statistical word indicators; finally, a method for identifying threshold values in rules was proposed based on empirical risk minimization. In the comparison experiments with character-based CRF models, the overall precision, recall and F1 score were increased by 1.61%, 2.54% and 2.08% respectively, which validated the efficiency of the entity boundary detecting rule. The proposed method can be used in e-commerce Information Retrieval (IR), e-commerce Information Extraction (IE) and query intention identification, etc.

Key words: Named Entity Recognition (NER), clothing commodity, Conditional Random Field (CRF), e-commerce

摘要:

针对服装类商品标题中的商品属性实体识别问题,提出了一种边界探测规则与条件随机场(CRF)相结合的混合方法。首先,使用统计方法挖掘隐蔽的实体提示字信息;然后,以字为粒度对三种统计成词指标及其内涵进行了阐释;接着,基于统计成词指标和提示字信息设计了实体边界探测规则;最后,基于经验风险最小化给出了规则中阈值的确定方法。在与字标注的CRF模型的对比实验中,总体准确率、召回率、F1值分别提升了1.61%、2.54%和2.08%,验证了对于实体边界探测规则的有效性。所提方法可用于电子商务信息检索(IR)、电子商务信息抽取(IE)、查询意图识别等任务。

关键词: 命名实体识别, 服装类商品, 条件随机场, 电子商务

CLC Number: