Automatic identification of Uyghur domain term in Web text
ZHONG Jun1,TIAN Sheng-wei2,YU Long3
1. College of Information Science and Technology, Xinjiang University, Urumqi Xinjiang 830046,China 2. College of Software Design, Xinjiang University, Urumqi Xinjiang 830046, China 3. Network Center, Xinjiang University, Urumqi Xinjiang 830046, China
Abstract:Since the Uyghur domain term is difficult to achieve, the workload of artificial expansion of the domain term is tremendous, and the efficiency is low, this paper used the Conditional Random Field (CRF) to identify the Uyghur domain term from the Web texts, which expanded the domain term with the conjunction word and the Mutual Information (MI) between the words based on the co-occurrence of terms. The experiments on the collected Web texts show that, for the short Uyghur domain terms, the algorithm achieves the precision as high as 97.59% and the recall 93.38%, and for the long Uyghur domain terms achieves the precision 55.72%.
钟军 田生伟 禹龙. Web文本中维吾尔语领域术语的自动发现[J]. 计算机应用, 2012, 32(02): 407-410.
ZHONG Jun TIAN Sheng-wei YU Long. Automatic identification of Uyghur domain term in Web text. Journal of Computer Applications, 2012, 32(02): 407-410.
SUI ZHIFANG, CHEN YIRONG. The research on the automatic term extraction in the domain of information science and technology [C]// Proceedings of the 5th East Asia Forum of the Terminology. Haikou: China National Institute of Standardization Press, 2007: 165-169.
[2]
BOURIGAULT D, JACUEMIN C, L'HOMMM-C. Recent advances in computational terminology [M]. Amsterdam: John Benjamins Publishing Company, 2001: 353-370.
[3]
FORTUNA B, LAVRA N, VELARDI P. Advancing topic ontology learning through term extraction [C]// PRICAI 2008: Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence, LNAI 5351. Berlin: Springer-Verlag, 2008: 626-635.
[4]
BUITELAAR P, OLEJNIK D, SINTEK M. A protégé plug-in for ontology extraction from text based on linguistic analysis [C]// The Semantic Web Research and Applications, LNCS 3053. Berlin: Springer-Verlag, 2004: 31-44.
[5]
PANTEL P, LIN D.A statistical corpus-based term extractor [C]// Proceedings of 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence. Ottawa: [s.n.], 2001:36-44.
KAGEURA K,UMINO B. Methods of automatic term recognition: A review [J]. Terminology, 1996, 3(2): 259-289.
[10]
QIN LONGZHANG,QIN LU,ZHI FANGSUI.Measuring termhood in automatic terminology extraction [C]// Natural Language Processing and Knowledge Engineering.Piscataway:IEEE,2007:328-335.
[11]
LAFFERTY J, MCCAIINM A, PEREIRA F. Conditional random fields: Probabilistic models for segmenting and labeling sequence data [C]// ICML: International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers, 2001: 961-965.