close
1.
Article
Kyushu Univ. Production Kyushu Univ. Production
Cover image of Text Data Mining: Discovery of Important Keywords in the Cyberspace
Arimura, Hiroki; 有村, 博紀; Abe, Junichiro ... [et al.]
Publication info: Digital Libraries: Research and Practice. pp. 220-226, 2000. IEEE
View fulltext:
Abstract: This paper describes applications of the optimized pattern discovery framework to text and Web mining. In particular, we introduce a class of simple combinatorial patterns over phrases, called proximity phrase association patterns, and consider the problem of finding the patterns that optimize a given statistical measure within the whole class of patterns in a large collection of unstructured texts. For this class of patterns, we develop fast and robust text mining algorithms based on techniques in computational geometry and string matching. Finally, we successfully apply the developed text mining algorithms to the experiments on interactive document browsing in a large text database and keyword discovery from Web bases. Read more
2.
Article
Kyushu Univ. Production Kyushu Univ. Production
Cover image of Putting old data into new system: Web-based catalog card image searching
Minami, Toshiro; 南, 俊朗; Kurita, Hidekazu ... [et al.]
Publication info: Digital Libraries: Research and Practice. pp. 141-148, 2000. IEEE
View fulltext:
Abstract: This paper proposes a new approach to solve the data inputting bottleneck problem for library catalog data, or metadata. The data have been provided by paper cards arranged in wooden boxes. A lot of efforts have been taken to digitize them in order to put these data to be machine-readable. However, despite such efforts, only a small amount of data has been digitized so far because the inputting is done manually. We solve this problem by using the catalog card images digitized by high-speed scanners. This approach has advantages such as: (1) we can deal with the electronic catalog data with remarkably reduced time and cost; (2) it enables the seamless integration of the image-based and keyword-based searches; and (3) it boosts up the process of inputting of the catalog data itself. Read more
3.
Article
Kyushu Univ. Production Kyushu Univ. Production
Cover image of Online algorithms for mining semi-structured data stream
Asai, Tatsuya; 浅井, 達哉; Arimura, Hiroki ... [et al.]
Publication info: Proceedings of 2002 IEEE International Conference on Data Mining. pp. 27-34, 2002. IEEE
View fulltext:
Abstract: In this paper, we study an online data mining problem from streams of semi-structured data such as XML data. Modeling semi-structured data and patterns as labeled ordered trees, we present an online algorithm StreamT that receives fragments of an unseen possibly infinite semi-structured data in the document order through a data stream, and can return the current set of frequent patterns immediately on request at any time. A crucial part of our algorithm is the incremental maintenance of the occurrences of possibly frequent patterns using a tree sweeping technique. We give modifications of the algorithm to other online mining model. We present theoretical and empirical analyses to evaluate the performance of the algorithm. Read more
4.
Article
Kyushu Univ. Production Kyushu Univ. Production
Cover image of ELEMENTARY FORMAL SYSTEMS AND FORMAL LANGUAGES-SIMPLE FORMAL SYSTEMS
Arikawa, Setsuo ; 有川, 節夫
Publication info: Memoirs of the Faculty of Science, Kyushu University. Series A, Mathematics. 24, (1), pp. 47-75, 1970-03. Faculty of Science, Kyushu University
View fulltext:
5.
Article
Kyushu Univ. Production Kyushu Univ. Production
Cover image of ON THE LENGTH FUNCTIONS OF LANGUAGES RECOGNIZABLE BY LINEAR BOUNDED AUTOMATA
Arikawa, Setsuo ; 有川, 節夫
Publication info: Memoirs of the Faculty of Science, Kyushu University. Series A, Mathematics. 23, (1), pp. 12-27, 1969-03. Faculty of Science, Kyushu University
View fulltext:
6.
Article
Kyushu Univ. Production Kyushu Univ. Production
Cover image of ON SOME PROPERTIES OF LENGTH-GROWING FUNCTIONS ON TWO-WAY PUSHDOWN AUTOMATA
Arikawa, Setsuo ; 有川, 節夫
Publication info: Memoirs of the Faculty of Science, Kyushu University. Series A, Mathematics. 22, (2), pp. 110-127, 1968-10. Faculty of Science, Kyushu University
View fulltext:
7.
Article
Kyushu Univ. Production Kyushu Univ. Production
Cover image of A Note on Randomized Algorithm for String Matching with Mismatches
Baba, Kensuke; Shinohara, Ayumi; Takeda, Masayuki ... [et al.]
Publication info: Proceedings of the Prague Stringology Conference. 2002, pp. 9-17, 2002-09. Czech Technical University
View fulltext:
Abstract: Atallah et al. [2] introduced a randomized algorithm for string matching with mismatches, which utilized fast Fourier transformation (FFT) to compute convolution. It estimates the score vector of matches between text string and a pattern string, that is, the vector obtained when the pattern is slid along the text, and the number of matches is counted for each position. This paper simplifies the algorithm and give an exact analysis of the variance of the estimator. Read more
8.
Article
Kyushu Univ. Production Kyushu Univ. Production
Cover image of ON REGULAR SEPARATION OF LANGUAGES
Arikawa, Setsuo ; 有川, 節夫
Publication info: Bulletin of Mathematical Statistics. 16, (1/2), pp. 83-94, 1974-03. Research Association of Statistical Sciences
View fulltext:
9.
Article
Kyushu Univ. Production Kyushu Univ. Production
Cover image of ONE-WAY SEQUENTIAL SEARCH SYSTEMS AND THEIR POWERS
Arikawa, Setsuo ; 有川, 節夫
Publication info: Bulletin of Mathematical Statistics. 19, (3/4), pp. 69-85, 1981-03. Research Association of Statistical Sciences
View fulltext:
Abstract: One-way sequential search systems based on pattern matching machines are described. The powers of the systems are evaluated from a viewpoint of formal language theory. Their applicability to medical information processing is briefly discussed. Read more
10.
Article
Kyushu Univ. Production Kyushu Univ. Production
Cover image of Pattern Matching Machines for Japanese Texts
Shinohara, Takeshi; Arikawa, Setsuo; 篠原, 武 ... [et al.]
Publication info: RIFIS Research Report. 110, 1986-03. Research Institute of Fundamental Information Science, Kyushu University
View fulltext:
Abstract: Texts in Japanese use many characters, Japanese alphabet kana and Chinese letter kanji, unlike texts in European languages. For that reason, Japanese characters are represented by 2-byte code in most computer systems. In many cases, the usual 1-byte characters are used together with 2-byte characters. In this paper, we discuss pattern matching algorithms for Japanese texts, in which 1-byte characters and 2-byte characters are mixed. We have already succeeded to realize run-time efficient pattern matching machines for texts of 1-byte characters by dividing character codes. We show that the method of dividing character codes is also applicable to pattern matching machines for Japanese texts. Read more