ALPH-REGIM Database

 

ALPH-REGIM Database Description:

The ALPH-REGIM-database is composed of more than 5000 text images. Most of prototypes are available online. Arabic/Latin and Printed/Handwritten script database is constituted of: (1) typescript texts for each Latin and Arabic language. (2) Handwritten texts for each Latin and Arabic script. Typeset Arabic text is typed in more than ten Arabic typefaces, to constitute the second part of this database labelled Arabic Font Base : AFB . Typeset Latin text is typed in eight typefaces (Arial, Bookman, Century, Comic, Courier, Impact, Modern, Time New Roman). The handwritten texts are produced by 52 scriptors and scanned on black-white images. ALPH-REGIM-database was developed in two frames :

  1. Script and language identification is a main step for multilingual OCR and Multilingual information retrieval (IR). An automatic script identification system is useful to (i) develop generalized OCR system which can recognize different languages present in text images zone, (ii) run only specific OCR to be developed for each typeset or handwritten script alphabet [9], (iii) classifying documents and indexing the growing parts of Digital Libraries, in such a way that data retrieval is facilitated.

  2. Arabic font recognition (AFR) is necessary  process (i) : to improve AOCR performance , (ii) :to produce the re-editable text.

 

Download:

Arabic Font Base (zip file - 132 Mo)

Arabic Handwriten Script (zip file - 7 Mo)

Arabic Printed Script (zip file - 64 Mo)

Base Manuscrit Arab (zip file - 12 Mo)

Base Manuscrit Latin (zip file - 10 Mo)

Latin Handritten Script (zip file - 19 Mo)

Latin Printed Script (zip file - 8 Mo)

Latin Font (zip file - 19 Mo)

 

For more details, contact Sami Ben Moussa. E-mail: sami.benmoussa[at]ieee.org