Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Identifying non-elliptical entity mentions in a coordinated NP with ellipses

Authors
Chae, JeongminJung, YoungheeLee, TaeminJung, SoonyoungHuh, ChanKim, GilhanKim, HyeoncheolOh, Heungbum
Issue Date
Feb-2014
Publisher
ACADEMIC PRESS INC ELSEVIER SCIENCE
Keywords
Ellipsis resolution; Named entity recognition; Text mining
Citation
JOURNAL OF BIOMEDICAL INFORMATICS, v.47, pp.139 - 152
Indexed
SCIE
SCOPUS
Journal Title
JOURNAL OF BIOMEDICAL INFORMATICS
Volume
47
Start Page
139
End Page
152
URI
https://scholar.korea.ac.kr/handle/2021.sw.korea/99425
DOI
10.1016/j.jbi.2013.10.002
ISSN
1532-0464
Abstract
Named entities in the biomedical domain are often written using a Noun Phrase (NP) along with a coordinating conjunction such as 'and' and 'or'. In addition, repeated words among named entity mentions are frequently omitted. It is often difficult to identify named entities. Although various Named Entity Recognition (NER) methods have tried to solve this problem, these methods can only deal with relatively simple elliptical patterns in coordinated NPs. We propose a new NER method for identifying non-elliptical entity mentions with simple or complex ellipses using linguistic rules and an entity mention dictionary. The GENIA and CRAFT corpora were used to evaluate the performance of the proposed system. The GENIA corpus was used to evaluate the performance of the system according to the quality of the dictionary. The GENIA corpus comprises 3434 non-elliptical entity mentions in 1585 coordinated NPs with ellipses. The system achieves 92.11% precision, 95.20% recall, and 93.63% F-score in identification of non-elliptical entity mentions in coordinated NPs. The accuracy of the system in resolving simple and complex ellipses is 94.54% and 91.95%, respectively. The CRAFT corpus was used to evaluate the performance of the system under realistic conditions. The system achieved 78.47% precision, 67.10% recall, and 72.34% F-score in coordinated NPs. The performance evaluations of the system show that it efficiently solves the problem caused by ellipses, and improves NER performance. The algorithm is implemented in PHP and the code can be downloaded from https://code.google.com/p/medtextmining/. (C) 2013 Published by Elsevier Inc.
Files in This Item
There are no files associated with this item.
Appears in
Collections
Graduate School > Department of Computer Science and Engineering > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kim, Hyeon cheol photo

Kim, Hyeon cheol
Department of Computer Science and Engineering
Read more

Altmetrics

Total Views & Downloads

BROWSE