Reference Guide
Programming / OCR / Appending additional OCR language dictionaries
In This Topic
    Appending additional OCR language dictionaries
    In This Topic

    The language dictionaries provided within the installation package are:

    ara (Arabic)
    deu (German)
    eng (English)
    fra (French)
    heb (Hebrew)
    ita (Italian)
    nld (Dutch; Flemish)
    por (Portuguese)
    spa (Spanish; Castilian)
    vie (Vietnamese)

    Of course the OCR engine isn't restricted to those languages only and can recognize many more.
    If the language you wish to recognize is not in the above list, please download the complete OCR languages pack.
    It includes more than120 languages and can be downloaded from http://www.gdpicture.com/download/tesseract_ocr_4x_language_pack.zip
    You can also try other language files provided by the Tesseract team here: https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#updated-data-files-for-version-400-september-15-2017

    This 326MB archive contains most of all the available languages that are currently supported. We strongly recommend to use these dictionary files.

    Once the download is completed, simply extract the archive content in the folder, where you have your OCR dictionaries already installed.

     

    To obtain language names from language codes please visit this page: https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#updated-data-files-for-version-400-september-15-201

     

    If for any reason you want to use previous language data files (without LSTM engine usage) you can download the complet pack from this link: http://www.gdpicture.com/download/tesseract_ocr_304_language_pack.zip