计算机应用 ›› 2014, Vol. 34 ›› Issue (9): 2566-2570.DOI: 10.11772/j.issn.1001-9081.2014.09.2566

• 人工智能 • 上一篇    下一篇

融合广告主行为的拍卖词实时触发

解忠乾1,常笑2,姬东鸿1   

  1. 1. 武汉大学 计算机学院,武汉 430072;
    2. 百度在线网络技术(北京)有限公司,北京 100085
  • 收稿日期:2014-04-02 修回日期:2014-05-04 出版日期:2014-09-01 发布日期:2014-09-30
  • 通讯作者: 解忠乾
  • 作者简介: 
    解忠乾(1989-),男,山东菏泽人,硕士研究生,CCF会员,主要研究方向:自然语言处理、数据挖掘;
    常笑(1982-),男,吉林四平人,博士研究生,主要研究方向:计算广告学、数据挖掘;
    姬东鸿(1967-),男,湖北武汉人,教授,CCF会员,主要研究方向:自然语言处理。
  • 基金资助:

    国家自然科学基金重点项目

Real-time advertising trigger with advertiser behavioral analysis

XIE Zhongqian1,CHANG Xiao2,JI Donghong1   

  1. 1. School of Computer, Wuhan University, Wuhan Hubei 430072, China
    2. Baidu Online Network Technology (Beijing) Company Limited, Beijing 100085, China
  • Received:2014-04-02 Revised:2014-05-04 Online:2014-09-01 Published:2014-09-30
  • Contact: XIE Zhongqian

摘要:

搜索引擎触发广告的过程中,需要实时计算拍卖词(Bidword)和用户查询(Query)的相关性,广告语境下的Term动态赋权方式和短语商业价值评估成为相关性计算必须考虑的问题。为此引入广告主行为,结合连续词袋模型(CBOW),提出了一种广告语境下的短语相关计算方法ADPCB。首先通过CBOW模型获得短语中每个Term的向量;然后分析广告主行为,构建关于短语的全局赋权树,对短语结构进行分析得到Term的动态权重;最后将Term权重和向量线性组合产生短语的向量表示,用于Bidword和Query的相关性度量。对10000对带有标签的Query和Bidword(正负比例1∶〖KG-*2〗1)利用Word2vec进行实验,ADPCB比结合CBOW模型的TF-IDF效果更好;而在准确率达到0.70时,ADPCB比潜在狄利克雷分布(LDA)、BM25和TF-IDF获得了更高的召回率。结果表明ADPCB提高了触发Bidword和Query的相关性,同时可以量化短语中Term的商业价值属性,减少低商业价值Query的广告触发数量,可应用于实时计算的场景。

Abstract:

In the process of advertising on search engines, it needs to calculate the correlation between auction word (Bidword) and user's query (Query) in real time. Dynamic Term weight in advertisements and phrase commercial value assessment must be considered in relevant calculation. Thus, a phrase related calculation approach named ADPCB was proposed based on behavioral analysis and Continuous Bag-Of-Words (CBOW) model to deal with those problems. Firstly, this approach got vector of each Term by CBOW. Secondly, to analyze advertiser's behavior and construct a global empowerment tree about phrases, the phrase structure was analyzed to obtain dynamic Term weight. Finally the phrase distributed representation produced by Term weight and linear combination was applied to the related measurement between Bidword and Query. Experiments were conducted on 10000 pairs Query and Bidword (positive and negative ratio is 1∶〖KG-*2〗1) with editorial judgments by using Word2vec, ADPCB performed better than Term Frequency-Inverse Document Frequency (TF-IDF) which combined with CBOW; when the accuracy was 0.70, ADPCB got higher recall than that of Latent Dirichlet Allocation (LDA), BM25 (Best Match25) and TF-IDF. The experimental results and analysis show that ADPCB can recognize the commercial value quality of the phrase to reduce the quantity of advertising trigger of low commercial value Query, it can be used in real-time calculation scene.

中图分类号: