|
Personal title and career attributes extraction based on distant supervision and pattern matching
YU Dong, LIU Chunhua, TIAN Yue
Journal of Computer Applications
2016, 36 (2):
455-459.
DOI: 10.11772/j.issn.1001-9081.2016.02.0455
Focusing on the issue of extracting title and career attributes from unstructured text for specific person, an distant supervision and pattern matching based method was proposed. Features of personal attributes were described from two aspects of string pattern and dependency pattern. Title and career attributes were extracted by two stages. At first, both distant supervision and human annotated knowledge were used to build high coverage pattern base to discover and extract a candidate attribute set. Then the literal connections among multiple attributes and dependency relations between the specific person and candidate attributes were used to design a filtering rule set. Test on CLP-2014 PAE share task shows that the
F-score of the proposed method reaches 55.37%, which is significantly higher than the best result of the evaluation (
F-measure 34.38%), and it also outperforms the method based on supervised Conditional Random Field (CRF) sequence tagging method with
F-measure of 43.79%. The experimental results show that by carrying out a filter process, the proposed method can mine and extract title and career attributes from unstructured document with a high coverage rate.
Reference |
Related Articles |
Metrics
|
|