Improving OCR for Historical Documents by Modeling Image Distortion - 九大コレクション

＜図書（部分）＞
Improving OCR for Historical Documents by Modeling Image Distortion

作成者	作成者名 Maekawa, Keiya マエカワ, ケイヤ所属機関所属機関名 Kyushu University 九州大学
	著者識別子 K000191 作成者名 Tomiura, Yoichi 冨浦, 洋一トミウラ, ヨウイチ所属機関所属機関名 Kyushu University 九州大学
	著者識別子 K006833 作成者名 Fukuda, Satoshi 福田, 悟志フクダ, サトシ所属機関所属機関名 Kyushu University 九州大学
	著者識別子 K003977 作成者名 Ishita, Emi 石田, 栄美イシタ, エミ所属機関所属機関名 Kyushu University 九州大学
	作成者名 Uchiyama, Hideaki 内山, 英昭ウチヤマ, ヒデアキ所属機関所属機関名 Kyushu University 九州大学
本文言語	英語
出版者	Springer Nature
発行日	2019-10-29
収録物名	Digital Libraries at the Crossroads of Digital Information for the Future
巻	11853
開始ページ	312
終了ページ	316
会議情報	会議名 International Conference on Asian Digital Libraries ICADL 回次 21 \| 2019 開催地 Kuala Lumpur 開催国マレーシア
出版タイプ	Accepted Manuscript
アクセス権	open access
関連DOI	以下の異版 https://doi.org/10.1007/978-3-030-34058-2_31
関連DOI
関連DOI	Digital Libraries at the Crossroads of Digital Information for the Future
関連DOI	Lecture Notes in Computer Science
関連DOI
関連URI	https://fim.uitm.edu.my/icadl2019/
関連情報	Digital Libraries at the Crossroads of Digital Information for the Future
関連情報	Lecture Notes in Computer Science
概要	Archives hold printed historical documents, many of which have de-teriorated. It is difficult to extract text from such images without errors using optical character recognition (OCR). This problem re...duces the accuracy of infor-mation retrieval. Therefore, it is necessary to improve the performance of OCR for images of deteriorated documents. One approach is to convert images of de-teriorated documents to clear images, to make it easier for an OCR system to recognize text. To perform this conversion using a neural network, data is needed to train it. It is hard to prepare training data consisting of pairs of a deteriorated image and an image from which deterioration has been removed; however, it is easy to prepare training data consisting of pairs of a clear image and an image created by adding noise to it. In this study, PDFs of historical documents were collected and converted to text and JPEG images. Noise was added to the JPEG images to create a dataset in which the images had noise similar to that of the actual printed documents. U-Net, a type of neural network, was trained using this dataset. The performance of OCR for an image with noise in the test data was compared with the performance of OCR for an image generated from it by the trained U-Net. An improvement in the OCR recognition rate was confirmed.続きを見る

本文ファイル

ファイル	ファイルタイプ	サイズ	閲覧回数	説明
2927456	pdf	329 KB	361

詳細

レコードID	2927456
関連URI	https://fim.uitm.edu.my/icadl2019/
関連ISBN	9783030340575
関連ISSN	1611-3349
主題	OCR Error
	Information Retrieval
	Historical Document Image
注記	The study is based on a poster presentation that won BEST POSTER AWARDS at International Conference on Asian Digital Libraries 2019.
登録日	2020.05.12
更新日	2021.06.16

この情報を出力する

このページのリンク

他の検索サイト

利用統計

＜図書（部分）＞ Improving OCR for Historical Documents by Modeling Image Distortion

本文ファイル

詳細

この資料を見た人はこんな資料も見ています

＜図書（部分）＞
Improving OCR for Historical Documents by Modeling Image Distortion