close
1.
雑誌論文
Kyushu Univ. Production 九州大学成果文献
Cover image of Automatic Wrapper Generation for Multilingual Web Resources
Yamada, Yasuhiro; 山田, 泰寛; Ikeda, Daisuke ... [ほか]
出版情報: Lecture Notes in Computer Science. 2534, pp. 107-113, 2002-11. Springer
本文を見る:
概要: We present a wrapper generation system to extract contents of semi-structured documents which contain instances of a record. The generation is done automatically using general assumptions on the structure of instances. It outputs a set of pairs of left and right delimiters surrounding instances of a field. In addition to input documents, our system also receives a set of symbols with which a delimiter must begin or end. Our system treats semi-structured documents just as strings so that it does not depend on markup and natural languages. It does not require any training examples which show where instances are. We show experimental results on both static and dynamic pages which are gathered from 13 Web sites, markuped in HTML or XML, and written in four natural languages. In addition to usual contents, generated wrappers extract useful information hidden in comments or tags which are ignored by other wrapper generation algorithms. Some generated delimiters contain whitespaces or multibyte characters. 続きを見る
2.
雑誌論文
Kyushu Univ. Production 九州大学成果文献
Cover image of Eliminating Useless Parts in Semi-structured Documents Using Alternation Counts
Ikeda, Daisuke; 池田, 大輔; Yamada, Yasuhiro ... [ほか]
出版情報: Lecture Notes in Computer Science. 2226, pp. 113-127, 2001-11. Springer
本文を見る:
概要: We propose a preprocessing method for Web mining which, given semi-structured documents with the same structure and style, distinguishes useless parts and non-useless parts in each document without any knowledge on the documents. It is based on a simple idea that any -gram is useless if it appears frequently. To decide an appropriate pair of length and frequency , we introduce a new statistic measure alternation count. It is the number of alternations between useless parts and non-useless parts. Given news articles written in English or Japanese with some non-articles, the algorithm eliminates frequent -grams used for the structure and style of articles and extracts the news contents and headlines with more than 97% accuracy if articles are collected from the same site. Even if input articles are collected from different sites, the algorithm extracts contents of articles from these sites with at least 95% accuracy. Thus, the algorithm does not depend on the language, is robust for noises, and is applicable to multiple formats. 続きを見る
3.
雑誌論文
Kyushu Univ. Production 九州大学成果文献
Cover image of マッシュアップを簡単に実現するメタCGIとそのアーキテクチャ — A meta-CGI for light-weight implementations of Mashups and its architecture
森, 雅生; Mori, Masao; 中藤, 哲也 ... [ほか]
出版情報: 2007-05-31.
本文を見る:
概要: 複数の異なるコンテンツやサービスから新たなウェブサービスを再構成して展開するマッシュアッ プが注目されている.これはXML データ取り扱いやPerl,PHP,JavaScript などの専門的な技術 を駆使して実装される.本稿はウェブサービスの検索に重点をおいたマッシュアップをサーバサイド で簡単に実行できるアーキテクチャを提案する. 続きを見る
4.
雑誌論文
Kyushu Univ. Production 九州大学成果文献
Cover image of マッシュアップ・リソースとマッシュアップ・グルー
森, 雅生; 中藤, 哲也; 廣川, 佐千男 ... [ほか]
出版情報: WebDB forum. 2008, 2008-12-01.
本文を見る:
概要: 近年,店舗や駅などの検索結果と地図情報とを融合するサービスの実現手法としてマッシュアップが利用されるようになど,実用的で興味深いマッシュアップの例が多数見られるようになった.しかし,手元ににあるデータとWeb 上のサービスを組み合わせて簡単な処理を施し結果をまとめることや,それらの処理プログラムを試行錯誤して実務者が開発するといったような,実務的作業の能率をあげるといった観点でのマッシュアップ開発は少ない.本稿ではまず,マッシュアップを,対象(mashupresouce) と結合法(mashup glue) という二つの観点で捉えるプログラミング・スタイルを提案する.マッシュアップ対象は,API や検索サイト,そして手元のCSV ファイルなどであり,入力型と出力型が規定された機械としてとらえる.マッシュアップ結合法は,出力と入力の単純結合の他に,マージ,ソート,CGI リンク,各種グラフ表示などのフィルター機能からなる.さらに,ブラウザ上で開発できるマッシュアップ開発環境を構築した. 続きを見る
5.
その他
Kyushu Univ. Production 九州大学成果文献
Cover image of Links and Cycles in Web Databases - presentation slides
Mori, Masao; Nakatoh, Tetsuya; Hirokawa, Sachio ... [ほか]
出版情報: Workshop on Semantic Web Applications and Perspectives. 4, 2007-12-19.
本文を見る:
6.
雑誌論文
Kyushu Univ. Production 九州大学成果文献
Cover image of FFTを用いた近似文字列照合のスコア計算のための最適な写像 — An Optimal Mapping for Score of String Matching with FFT
Nakatoh, Tetsuya; Baba, Kensuke; Mori, Masao ... [ほか]
出版情報: 日本データベース学会論文誌. 6, (3), pp. 25-28, 2007-12-21. 日本データベース学会
本文を見る:
概要: 文字列中から与えられたパターンを見つけ出す文字列照合問題は,Web の情報検索やDNA 配列の特定パターンの検索に用いられるなど,幅広い応用範囲を持つ.パターンの編集に置換のみを許した近似文字列照合は,不一致を許す文字列照合と呼ばれ,単なる文字列照合より応用範囲が広く,また難易度も高い.この問題の解法として,高速フーリエ変換(FFT) を利用した高速な確率アルゴリズムが幾つか提案されている.それらは文字から数値への写像の生成方法により,写像総数と,解の推定値の分散が異なる.本稿で提案するアルゴリズムは,総写像数が理論上での最小であり,推定値の分散も小さい. 続きを見る
7.
雑誌論文
Kyushu Univ. Production 九州大学成果文献
Cover image of Coloring for Pattern Detection
Yamada, Yasuhiro; Hirokawa, Sachio; 山田, 泰寛 ... [ほか]
出版情報: DOI Technical Report. 233, 2008-03. 九州大学大学院システム情報科学研究院情報理学部門
本文を見る:
概要: This paper studies pattern detection of Web documents with the same type of contents using pattern languages. The pattern detection problem of the documents is to find a descriptive regular pattern of input strings such that successive variables do not appear in the pattern. Our pattern detection approach is to find a set of substrings of input strings instead of to detect a pattern of the strings directly. The set is called a component set. It divides each input string into colored regions and noncolored regions. This paper proposes an algorithm to generate a pattern from the two regions. Under this approach, the pattern detection problem is replaced as the problem that is to find a component set. 続きを見る
8.
雑誌論文
Kyushu Univ. Production 九州大学成果文献
Cover image of COMBINATORY LOGIC AND $ lambda $-CALCULUS FOR CLASSICAL LOGIC
Baba, Kensuke; Kameyama, Yukiyoshi; Hirokawa, Sachio ... [ほか]
出版情報: Bulletin of informatics and cybernetics. 32, (2), pp. 105-122, 2000-12. 統計科学研究会
本文を見る:
概要: Since Griffin's work in 1990, classical logic has been an attractive target for extracting computational contents. However, the classical principle used in Griffin's type system is the double-negation-elimination rule, which prevents one to analyze the intuitionistic part and the purely classical part separately. By formulating a calculus with $ mathrm{J} $ (for the elimination rule of falsehood) and $ mathrm{P} $ (for Peirce formula which is concerned with purely classical reasoning) combinators, we can separate these two parts. This paper studies the $ lambda mathrm{PJ} $ calculus with $ mathrm{P} $ and $ mathrm{J} $ combinators and the $ lambda mathrm{C} $ calculus with $ mathrm{C} $ combinator(for the double-negation-elimination rule). We also propose two $ lambda $-calculi which correspond to $ lambda mathrm{PJ} $ and $ lambda mathrm{C} $. We give four classes of reduction rules for each calculus, and systematically study their relationship by simulating reduction rules in one calculus by the corresponding one in the other. It is shown that, by restricting the type of $ P $, simulation succeeds for several choices of reduction rules, but that simulating the full calculus $ lambda mathrm{PJ} $ in $ lambda mathrm{C} $ succeeds only for one class. Some programming examples of our calculi such as encoding of conjunction and disjunction are also given. 続きを見る
9.
雑誌論文
Kyushu Univ. Production 九州大学成果文献
Cover image of A Template Discovery Algorithm by Substring Amplification
Ikeda, Daisuke; 池田, 大輔; Yamada, Yasuhiro ... [ほか]
出版情報: DOI Technical Report. 220, 2003-12. 九州大学大学院システム情報科学研究院情報理学部門
本文を見る:
概要: In this paper, we consider to find a set of substrings common to given strings. We define this problem as the template discovery problem which is, given a set of strings generated by some fixed but unknown pattern, to find the constant parts of the pattern. A pattern is a string over constant and variable symbols. It generates strings by replacing variables into constant strings.We assume that the frequency distribution of replaced strings follow a power-law distribution. Although the longest common subsequence problem, which is one of the famous common part discovery problems, is well-known to be NP-complete, we show that the template discovery problem can be solved in linear time with high probability. This complexity is achieved due to the following our contributions: reformulation of the problem, using a set of substrings to express a string, and counting all occurrences $ F( f ) $ with frequency $ f $ instead of just frequency $ f $. We demonstrate the effectiveness of the proposed algorithm using data on the Web. Moreover, we show noise robustness and effectiveness even when input strings are generated by a union of patterns and pattern with the iterate operation. 続きを見る
10.
雑誌論文
Kyushu Univ. Production 九州大学成果文献
Cover image of A note on the regularity of fuzzy languages
Hirokawa, Sachio; Miyano, Satoru; 廣川, 佐千男 ... [ほか]
出版情報: 九州大学理学部紀要 : Series A, Mathematics. 32, (1), pp. 61-66, 1978-03-25. 九州大学理学部
本文を見る: