DSpace university logo mark
Advanced Search
Japanese | English 

NAOSITE : Nagasaki University's Academic Output SITE > Faculty of Engineering > Conference Paper >

Semi-supervised bibliographic element segmentation with latent permutations


File Description SizeFormat
LNCS7008_60.pdf480.11 kBAdobe PDFView/Open

Title: Semi-supervised bibliographic element segmentation with latent permutations
Authors: Masada, Tomonari / Takasu, Atsuhiro / Shibata, Yuichiro / Oguri, Kiyoshi
Issue Date: 2011
Publisher: Springer Verlag
Citation: Lecture Notes in Computer Science, 7008, pp.60-69; 2011
Abstract: This paper proposes a semi-supervised bibliographic element segmentation. Our input data is a large scale set of bibliographic references each given as an unsegmented sequence of word tokens. Our problem is to segment each reference into bibliographic elements, e.g. authors, title, journal, pages, etc. We solve this problem with an LDA-like topic model by assigning each word token to a topic so that the word tokens assigned to the same topic refer to the same bibliographic element. Topic assignments should satisfy contiguity constraint, i.e., the constraint that the word tokens assigned to the same topic should be contiguous. Therefore, we proposed a topic model in our preceding work [8] based on the topic model devised by Chen et al. [3]. Our model extends LDA and realizes unsupervised topic assignments satisfying contiguity constraint. The main contribution of this paper is the proposal of a semi-supervised learning for our proposed model. We assume that at most one third of word tokens are already labeled. In addition, we assume that a few percent of the labels may be incorrect. The experiment showed that our semi-supervised learning improved the unsupervised learning by a large margin and achieved an over 90% segmentation accuracy.
Description: 13th International Conference on Asia-Pacific Digital Libraries, ICADL 2011; Beijing; 24 October 2011 through 27 October 2011
URI: http://hdl.handle.net/10069/26677
ISSN: 03029743
Rights: © 2011 Springer-Verlag. / The original publication is available at www.springerlink.com
Type: Conference Paper
Text Version: author
Appears in Collections:Conference Paper

Citable URI : http://hdl.handle.net/10069/26677

All items in NAOSITE are protected by copyright, with all rights reserved.

 

Valid XHTML 1.0! Copyright © 2006-2015 Nagasaki University Library - Feedback Powerd by DSpace