Install tesseract for C++ on Windows 10
Asked Answered
N

5

5

I am having problems while installing tesseract to develop in C++ on Windows 10.

Can anyone provide a guide to get:
1. Leptonica (required by tesseract) lib and includes
2. Tesseract lib and includes
3. Link both to project (e.g. Visual Studio)

so that example from https://github.com/tesseract-ocr/tesseract/wiki/APIExample works:

#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>

int main()
{
    char *outText;

    tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
    // Initialize tesseract-ocr with English, without specifying tessdata path
    if (api->Init(NULL, "eng")) {
        fprintf(stderr, "Could not initialize tesseract.\n");
        exit(1);
    }

    // Open input image with leptonica library
    Pix *image = pixRead("/usr/src/tesseract/testing/phototest.tif");
    api->SetImage(image);
    // Get OCR result
    outText = api->GetUTF8Text();
    printf("OCR output:\n%s", outText);

    // Destroy used object and release memory
    api->End();
    delete[] outText;
    pixDestroy(&image);

    return 0;
}
Nianiabi answered 2/6, 2018 at 16:30 Comment(0)
O
10

I've been trying to link tesseract library to my c++ project in Visual Studio 2019 for a couple of days and I finally managed to do it. Any thread that I found or even official tesseract documentation do not have full list of instructions on what to do.

I'll list what I have done, hopefully it will help someone. I don't pretend its the optimal way to do so.

  1. There are basic tips in official tesseract documentation. Go to "Windows" section. I did install sw and cppan but I guess it wasn't necessary. The main thing here is installing vcpkg. It requiers Git so I installed it. then:

    > cd c:tools (I installed it in c:\tools, you may choose any dir)

    > git clone https://github.com/microsoft/vcpkg

    > .\vcpkg\bootstrap-vcpkg.bat

    > .\vcpkg\vcpkg install tesseract:x64-windows-static (I used x64 version)

    > .\vcpkg\vcpkg integrate install

At this point everything should work, they said. Headers should be included, libs should be linked. But none was working for me.

  1. Change project configuration to Release x64 (or Release x86 if you installed x86 tesseract).

  2. To include headers: Go to project properties -> C/C++ -> General. Set Additional Include Directories to C:\tools\vcpkg\installed\x64-windows-static\include (or whereever you installed vcpkg)

  3. To link libraries : project properties -> Linker -> General. Set Additional Library Directories to C:\tools\vcpkg\installed\x64-windows-static\lib

  4. Project properties -> C/C++ -> Code Generation. Set Runtime Library to Multi-threaded(/MT). Otherwise I got errors like "runtime mismatch static vs DLL"

  5. Tesseract lib couldn't link to its dependcies, so I added all libs that I had installed to C:\tools\vcpkg\installed\x64-windows-static\lib. Project properties -> Linker -> Input. I set Additional Dependencies to archive.lib;bz2.lib;charset.lib;gif.lib;iconv.lib;jpeg.lib;leptonica-1.80.0.lib;libcrypto.lib;libpng16.lib;libssl.lib;libwebpmux.lib;libxml2.lib;lz4.lib;lzma.lib;lzo2.lib;openjp2.lib;tesseract41.lib;tiff.lib;tiffxx.lib;turbojpeg.lib;webp.lib;webpdecoder.lib;webpdemux.lib;xxhash.lib;zlib.lib;zstd_static.lib;%(AdditionalDependencies)

And after that it finally compiled and launched.

But... api->Init returned -1. To work with tesseract you should have tessdata directory with .traineddata files for the languages you need.

  1. Download tessdata. I got it from official docs. BTW, tessdata_fast worked better than tessdata_best for my purposes :) So I downloaded single "eng" file and saved it like C:\tools\TesseractData\tessdata\eng.traineddata.

  2. Then I added environment variable TESSDATA_PREFIX with value C:\tools\TesseractData\tessdata. I also added C:\tools\TesseractData to Path variables (just in case)

And after all this it is finally working for me.

Optometrist answered 1/2, 2021 at 23:27 Comment(0)
T
6

Install vcpkg ( MS packager to install windows based open source projects) and use powershell command like so .\vcpkg install tesseract:x64-windows-static. Dependency libraries like Leptonica will be auto installed for you. The tesseract can be auto integrated to your VS project using .\vcpkg integrate install.

Threaten answered 3/6, 2018 at 0:46 Comment(0)
A
2

Additionally I found that you also have to install the lzo2.lib through: ./vcpkg install lzo:x64-windows-static. And then pull in lzo2.lib as described by @Nick.

Some of the libraries listed above are no longer supported with the latest versions of Tesseract. VS19 will complain about it when you simply copy them; simply remove the ones that are no longer needed by cross-checking.

For example, tiffxx.lib, hashxx.lib and some others.

Amygdala answered 9/4, 2022 at 17:9 Comment(0)
R
0

write the command msys2
pacman -S mingw-w64-{i686,x86_64}-tesseract-ocr

Roustabout answered 22/6, 2023 at 6:23 Comment(0)
J
0

I also had to add the additional crypt32.lib as an included lib file. The good news is that it is already included in Windows.

Joist answered 25/8 at 1:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.