1. 
有川, 節夫 ; Arikawa, Setsuo


2. 
Arimura, Hiroki; 有村, 博紀; Abe, Junichiro ... [et al.]
This paper describes applications of the optimized pattern discovery framework to text and Web mining. In particular, we introduce a class of simple combinatorial patterns over phrases, called proximity phrase association patterns, and consider the problem of finding the patterns that optimize a given statistical measure within the whole class of patterns in a large collection of unstructured texts. For this class of patterns, we develop fast and robust text mining algorithms based on techniques in computational geometry and string matching. Finally, we successfully apply the developed text mining algorithms to the experiments on interactive document browsing in a large text database and keyword discovery from Web bases.
3. 
Minami, Toshiro; 南, 俊朗; Kurita, Hidekazu ... [et al.]
This paper proposes a new approach to solve the data inputting bottleneck problem for library catalog data, or metadata. The data have been provided by paper cards arranged in wooden boxes. A lot of efforts have been taken to digitize them in order to put these data to be machinereadable. However, despite such efforts, only a small amount of data has been digitized so far because the inputting is done manually. We solve this problem by using the catalog card images digitized by highspeed scanners. This approach has advantages such as: (1) we can deal with the electronic catalog data with remarkably reduced time and cost; (2) it enables the seamless integration of the imagebased and keywordbased searches; and (3) it boosts up the process of inputting of the catalog data itself.
4. 
Asai, Tatsuya; 浅井, 達哉; Arimura, Hiroki ... [et al.]
In this paper, we study an online data mining problem from streams of semistructured data such as XML data. Modeling semistructured data and patterns as labeled ordered trees, we present an online algorithm StreamT that receives fragments of an unseen possibly infinite semistructured data in the document order through a data stream, and can return the current set of frequent patterns immediately on request at any time. A crucial part of our algorithm is the incremental maintenance of the occurrences of possibly frequent patterns using a tree sweeping technique. We give modifications of the algorithm to other online mining model. We present theoretical and empirical analyses to evaluate the performance of the algorithm.
5. 
Arikawa, Setsuo ; 有川, 節夫


6. 
Arikawa, Setsuo ; 有川, 節夫


7. 
Arikawa, Setsuo ; 有川, 節夫


8. 
Baba, Kensuke; Shinohara, Ayumi; Takeda, Masayuki ... [et al.]
Atallah et al. [2] introduced a randomized algorithm for string matching with mismatches, which utilized fast Fourier transformation (FFT) to compute convolution. It estimates the score vector of matches between text string and a pattern string, that is, the vector obtained when the pattern is slid along the text, and the number of matches is counted for each position. This paper simplifies the algorithm and give an exact analysis of the variance of the estimator.
9. 
Arikawa, Setsuo ; 有川, 節夫


10. 
Arikawa, Setsuo ; 有川, 節夫
Oneway sequential search systems based on pattern matching machines are described. The powers of the systems are evaluated from a viewpoint of formal language theory. Their applicability to medical information processing is briefly discussed.
