Technological Aspects

​The Technologies behind this Project

 1. Scanning

The scans of the newspapers appearing in this website were made from one of two possible sources: the paper original or microfilm. Every effort is made to use the very best copy, which is determined by both quality and completeness of the inventory of the newspaper editions. This is no easy task, especially since historical newspapers undergo a constant process of wear and disintegration and collections are almost always partial. In this sense, the efforts to scan archival material in general and historical newspapers in particular, are part of an important mission to save information, which otherwise might be lost forever. As stated, the decision to either photograph a newspaper page or scan from microfilm is made based on the state of the material and integrity of the collection, with preference given to scanning from the paper source, to the extent that this is possible.

All the materials appearing on the site were scanned\ photographed in the digitization center of the National Library, expressly for this project. The newspapers that were scanned from microfilm appear in black and white (binary) or grey tone. In cases where we photographed\ scanned material from colored paper, we tried our best to photograph in color in order to remain loyal as much as possible to the original.

The quality of the scan\ photograph of the materials included in the project is 300 DPI (in some of the material, also 400 DPI).

Materials originating in newsprint were scanned\ photographed using a PENTAX Z645 camera a PANASONIC KV S5055C scanner and CANON 5d camera.

Materials originating in microfilm were scanned in an Eclipse by Nextscan microfilm scanner. Following scanning\ photography all material underwent careful QA and minimal graphic processing (as needed).


2. Accessibility Platform and Manual Typing of Titles and Names of Authors

In many online projects of historical newspapers all the scanned materials undergo Optical Character Recognition (OCR), writing identification technology. After many trials with various OCR programs, in this project we decided not to use this technology. The reason is that the results of Arabic language OCR are low (20–30% success rate). Hence the benefits are inadequate for performing text searches. To circumvent this obstacle we have chosen a system that incorporates manual typing of article headlines and author names. This method, while not optimal in comparison to full OCR – has the advantage of rendering the digital archive at least partially searchable.

Manual typing of titles and authors’ names presented us with a dilemma: what to do when a certain word is printed with a “mistake”, which may in fact not be a mistake, but only a different system of spelling which was acceptable at the time. The decision we took was to not correct any “mistake,” but to type accurately what was printed in the original. This, of course impacts the search results, and therefore, if you do not find what you are looking for, it is recommended to play with the spelling of the search word, and this holds especially for words not originating in Arabic (for example, spelling of ايتالية instead of ايطالية).

After scanning, the image files are combined and become digital sheets using Olive Software technology (currently manual typing is also being carried out). This program produces sheets that are loaded onto the OLIVE APA system, which enables easy access to newspapers on computers and mobile devices, with browsing and advanced search options (for help and information about performing searches, please see the User Guide ).