A DATA-DRIVEN TEXT SIMILARITY MEASURE BASED ON CLASSIFICATION ALGORITHMS

Cho, Su Gon; Kim, Seoung Bum

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

A DATA-DRIVEN TEXT SIMILARITY MEASURE BASED ON CLASSIFICATION ALGORITHMS

Authors: Cho, Su Gon; Kim, Seoung Bum

Issue Date: 2017

Publisher: UNIV CINCINNATI INDUSTRIAL ENGINEERING

Keywords: classification; sentence-term matrix; text similarity measure; text mining

Citation: INTERNATIONAL JOURNAL OF INDUSTRIAL ENGINEERING-THEORY APPLICATIONS AND PRACTICE, v.24, no.3, pp.328 - 339

Indexed: SCIE
SCOPUS

Journal Title: INTERNATIONAL JOURNAL OF INDUSTRIAL ENGINEERING-THEORY APPLICATIONS AND PRACTICE

Volume: 24

Number: 3

Start Page: 328

End Page: 339

URI: https://scholar.korea.ac.kr/handle/2021.sw.korea/86287

ISSN: 1072-4761

Abstract: Measuring text similarity has shown its fundamental utilization in various text mining application problems. This paper proposes a new method based on classification algorithms for measuring the similarity between two texts. Specifically, a sentence-term matrix that describes the frequency of terms that occur in a collection of sentences was created to measure the classification accuracy of two texts. Our idea is based on the fact that similar texts are difficult to distinguish from each other, which should lead to a low classification accuracy between similar texts. By doing comparative experiments on several widely used text similarity measures, analysis results with real data from the Machine Learning Repository at the University of California, Irvine demonstrate that the proposed method is able to achieve outperformed the other existing similarity measures across the entire range of term selection filters.

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Engineering > School of Industrial and Management Engineering > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher KIM, Seoung Bum photo

KIM, Seoung Bum: 공과대학 (산업경영공학부)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :8,739,870; Today View :22,811

RSS_1.0 RSS_2.0 ATOM_1.0

(02841) 서울특별시 성북구 안암로 14502-3290-1114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE