Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Supervised Paragraph Vector: Distributed Representations of Words, Documents and Class Labels

Full metadata record
DC Field Value Language
dc.contributor.authorPark, Eunjeong L.-
dc.contributor.authorCho, Sungzoon-
dc.contributor.authorKang, Pilsung-
dc.date.accessioned2021-09-01T22:46:46Z-
dc.date.available2021-09-01T22:46:46Z-
dc.date.created2021-06-19-
dc.date.issued2019-
dc.identifier.issn2169-3536-
dc.identifier.urihttps://scholar.korea.ac.kr/handle/2021.sw.korea/68939-
dc.description.abstractWhile the traditional method of deriving representations for documents was bag-of-words, they suffered from high dimensionality and sparsity. Recently, many methods to obtain lower dimensional and densely distributed representations were proposed. Paragraph Vector is one of such algorithms, which extends the word2vec algorithm by considering the paragraph as an additional word. However, it generates a single representation for all tasks, while different tasks may require different representations. In this paper, we propose a Supervised Paragraph Vector, a task-specific variant of Paragraph Vector for situations where class labels exist. Essentially, Supervised Paragraph Vector uses class labels along with words and documents and obtains corresponding representations with respect to the particular classification task. In order to prove the benefits of the proposed algorithm, three performance criteria are used: interpretability, discriminative power, and computational efficiency. To test interpretability, we find words that are close and far to class vectors and demonstrate that such words are closely related to the corresponding class. We also use principal component analysis to visualize all words, documents, and class labels at the same time and show that our method effectively displays the related words and documents for each class label. To evaluate discriminative power and computational efficiency, we perform document classification on four commonly used datasets with various classifiers and achieve comparable classification accuracies to bag-of-words and Paragraph Vector.-
dc.languageEnglish-
dc.language.isoen-
dc.publisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC-
dc.subjectCLASSIFICATION-
dc.titleSupervised Paragraph Vector: Distributed Representations of Words, Documents and Class Labels-
dc.typeArticle-
dc.contributor.affiliatedAuthorKang, Pilsung-
dc.identifier.doi10.1109/ACCESS.2019.2901933-
dc.identifier.scopusid2-s2.0-85063262447-
dc.identifier.wosid000461869900016-
dc.identifier.bibliographicCitationIEEE ACCESS, v.7, pp.29051 - 29064-
dc.relation.isPartOfIEEE ACCESS-
dc.citation.titleIEEE ACCESS-
dc.citation.volume7-
dc.citation.startPage29051-
dc.citation.endPage29064-
dc.type.rimsART-
dc.type.docTypeArticle-
dc.description.journalClass1-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalResearchAreaEngineering-
dc.relation.journalResearchAreaTelecommunications-
dc.relation.journalWebOfScienceCategoryComputer Science, Information Systems-
dc.relation.journalWebOfScienceCategoryEngineering, Electrical & Electronic-
dc.relation.journalWebOfScienceCategoryTelecommunications-
dc.subject.keywordPlusCLASSIFICATION-
dc.subject.keywordAuthorClass label-
dc.subject.keywordAuthordistributed representations-
dc.subject.keywordAuthorrepresentation learning-
dc.subject.keywordAuthordocument embedding-
dc.subject.keywordAuthorword embedding-
Files in This Item
There are no files associated with this item.
Appears in
Collections
College of Engineering > School of Industrial and Management Engineering > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kang, Pil sung photo

Kang, Pil sung
공과대학 (School of Industrial and Management Engineering)
Read more

Altmetrics

Total Views & Downloads

BROWSE