Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Automatic Stop Word Generation for Mining Software Artifact Using Topic Model with Pointwise Mutual Information

Full metadata record
DC Field Value Language
dc.contributor.authorLee, Jung-Been-
dc.contributor.authorLee, Taek-
dc.contributor.authorIn, Hoh Peter-
dc.date.accessioned2021-09-01T07:45:51Z-
dc.date.available2021-09-01T07:45:51Z-
dc.date.created2021-06-18-
dc.date.issued2019-09-
dc.identifier.issn1745-1361-
dc.identifier.urihttps://scholar.korea.ac.kr/handle/2021.sw.korea/63057-
dc.description.abstractMining software artifacts is a useful way to understand the source code of software projects. Topic modeling in particular has been widely used to discover meaningful information from software artifacts. However, software artifacts are unstructured and contain a mix of textual types within the natural text. These software artifact characteristics worsen the performance of topic modeling. Among several natural language preprocessing tasks, removing stop words to reduce meaningless and uninteresting terms is an efficient way to improve the quality of topic models. Although many approaches are used to generate effective stop words, the lists are outdated or too general to apply to mining software artifacts. In addition, the performance of the topic model is sensitive to the datasets used in the training for each approach. To resolve these problems, we propose an automatic stop word generation approach for topic models of software artifacts. By measuring topic coherence among words in the topic using Pointwise Mutual Information (PMI), we added words with a low PMI score to our stop words list for every topic modeling loop. Through our experiment, we proved that our stop words list results in a higher performance of the topic model than lists from other approaches.-
dc.languageEnglish-
dc.language.isoen-
dc.publisherIEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG-
dc.subjectLATENT DIRICHLET ALLOCATION-
dc.titleAutomatic Stop Word Generation for Mining Software Artifact Using Topic Model with Pointwise Mutual Information-
dc.typeArticle-
dc.contributor.affiliatedAuthorIn, Hoh Peter-
dc.identifier.doi10.1587/transinf.2018EDP7390-
dc.identifier.scopusid2-s2.0-85071942456-
dc.identifier.wosid000484013400022-
dc.identifier.bibliographicCitationIEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, v.E102D, no.9, pp.1761 - 1772-
dc.relation.isPartOfIEICE TRANSACTIONS ON INFORMATION AND SYSTEMS-
dc.citation.titleIEICE TRANSACTIONS ON INFORMATION AND SYSTEMS-
dc.citation.volumeE102D-
dc.citation.number9-
dc.citation.startPage1761-
dc.citation.endPage1772-
dc.type.rimsART-
dc.type.docTypeArticle-
dc.description.journalClass1-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalWebOfScienceCategoryComputer Science, Information Systems-
dc.relation.journalWebOfScienceCategoryComputer Science, Software Engineering-
dc.subject.keywordPlusLATENT DIRICHLET ALLOCATION-
dc.subject.keywordAuthortext mining-
dc.subject.keywordAuthorsoftware artifact-
dc.subject.keywordAuthorstop words-
dc.subject.keywordAuthortopic modeling-
dc.subject.keywordAuthorPointwise Mutual Information (PMI)-
Files in This Item
There are no files associated with this item.
Appears in
Collections
Graduate School > Department of Computer Science and Engineering > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher IN, Hoh Peter photo

IN, Hoh Peter
컴퓨터학과
Read more

Altmetrics

Total Views & Downloads

BROWSE