DSpace university logo mark
Advanced Search
Japanese | English 

NAOSITE : Nagasaki University's Academic Output SITE > Faculty of Engineering > Articles in academic journal >

Comparing LDA with pLSI as a Dimensionality Reduction Method in Document Clustering

File Description SizeFormat
LNCS4938_13.pdf232.25 kBAdobe PDFView/Open

Title: Comparing LDA with pLSI as a Dimensionality Reduction Method in Document Clustering
Authors: Masada, Tomonari / Kiyasu, Senya / Miyahara, Sueharu
Issue Date: Mar-2008
Publisher: Springer
Citation: Lecture Notes in Computer Science, 4938, pp.13-26; 2008
Abstract: In this paper, we compare latent Dirichlet allocation (LDA) with probabilistic latent semantic indexing (pLSI) as a dimensionality reduction method and investigate their effectiveness in document clustering by using real-world document sets. For clustering of documents, we use a method based on multinomial mixture, which is known as an efficient framework for text mining. Clustering results are evaluated by F-measure, i.e., harmonic mean of precision and recall. We use Japanese and Korean Web articles for evaluation and regard the category assigned to each Web article as the ground truth for the evaluation of clustering results. Our experiment shows that the dimensionality reduction via LDA and pLSI results in document clusters of almost the same quality as those obtained by using original feature vectors. Therefore, we can reduce the vector dimension without degrading cluster quality. Further, both LDA and pLSI are more effective than random projection, the baseline method in our experiment. However, our experiment provides no meaningful difference between LDA and pLSI. This result suggests that LDA does not replace pLSI at least for dimensionality reduction in document clustering.
Description: The original publication is available at www.springerlink.com / Large-scale Knowledge Resources: Construction and Application - Third International Conference on Large-scale Knowledge Resources, Lkr 2008, Tokyo, Japan, March 3-5, 2008, Proceedings
URI: http://hdl.handle.net/10069/16305
ISBN: 978-3-540-78158-5
ISSN: 03029743
DOI: 10.1007/978-3-540-78159-2_2
Type: Journal Article
Text Version: author
Appears in Collections:Articles in academic journal

Citable URI : http://hdl.handle.net/10069/16305

All items in NAOSITE are protected by copyright, with all rights reserved.


Valid XHTML 1.0! Copyright © 2006-2015 Nagasaki University Library - Feedback Powerd by DSpace