DEBORA: digital AccEss to BOoks of the RenAissance

被引:29
作者
Le Bourgeois, F. [1 ]
Emptoz, H. [1 ]
机构
[1] Inst Natl Sci Appl, LIRIS, F-69621 Villeurbanne, France
关键词
historical document digitization; document image analysis; document image compression; file format; digital libraries;
D O I
10.1007/s10032-006-0030-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
EBORA (Digital AccEss to BOoks of the RenAissance) is a multidisciplinary European project aiming at digitizing and thus making rare sixteenth century books more accessible. End-users, librarians, historians, researchers in book history and computer scientists participated in the development of remote and collaborative access to digitized Renaissance books, necessary because of the reduced accessibility to digital libraries in image mode through the Internet. The size of files for the storage of images, the lack of a standard file format exchange suitable for progressive transmission, and limited querying possibilities currently limit remote access to digital libraries. To improve accessibility, historical documents must be digitized and retro-converted to extract a detailed description of the image contents suited to users' needs. Specialists of the Renaissance have described the metadata generally required by end-users and the ideal functionalities of the digital library. The retro-conversion of historical documents is a complex process that includes image capture, metadata extraction, image storage and indexing, automatic conversion in a reusable electronic form, publication on the Internet, and data compression for faster remote access. The steps of this process cannot be developed independently. DEBORA proposes a global approach to retro-conversion from the digitization to the final functionalities of the digital library centered on users' needs. The retro-conversion process is mainly based on a document image analysis system that simultaneously extracts the metadata and compresses the images. We also propose a file format to describe compressed books as heterogeneous data (images/text/ links/annotation/physical layout and logical structure) suitable for progressive transmission, editing, and annotation. DEBORA is an exploratory project that aims at demonstrating the feasibility of the concepts by developing prototypes tested by end-users.
引用
收藏
页码:193 / 221
页数:29
相关论文
共 29 条
[1]
ANDRE J, 2003, J DOC NUMER, V7, P127
[2]
[Anonymous], P INT C COMP SYST SI
[3]
MEANS FOR ACHIEVING A HIGH DEGREE OF COMPACTION ON SCAN-DIGITIZED PRINTED TEXT [J].
ASCHER, RN ;
NAGY, G .
IEEE TRANSACTIONS ON COMPUTERS, 1974, C 23 (11) :1174-1179
[4]
MEASUREMENT OF DATA-COMPRESSION IN ADVANCED GROUP-4 FACSIMILE SYSTEMS [J].
BODSON, D ;
DEUTERMANN, AR ;
URBAN, SJ ;
CLARKE, CE .
PROCEEDINGS OF THE IEEE, 1985, 73 (04) :731-739
[5]
High quality document image compression with "DjVU" [J].
Bottou, L ;
Haffner, P ;
Howard, PG ;
Simard, P ;
Bengio, Y ;
LeCun, Y .
JOURNAL OF ELECTRONIC IMAGING, 1998, 7 (03) :410-425
[6]
*DEBORA, 2000, EUR PROJ
[7]
Digital geometric methods in document image analysis [J].
Gross, A ;
Latecki, LJ .
PATTERN RECOGNITION, 1999, 32 (03) :407-424
[8]
HERSCH R, 1998, ELECT PUBLISHING ART
[9]
The emerging JBIG2 standard [J].
Howard, PG ;
Kossentini, F ;
Martins, B ;
Forchhammer, S ;
Rucklidge, WJ .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 1998, 8 (07) :838-848
[10]
Lossless and lossy compression of text images by soft pattern matching [J].
Howard, PG .
DCC '96 - DATA COMPRESSION CONFERENCE, PROCEEDINGS, 1996, :210-219