Tesseract - change language file location
Asked Answered
B

3

9

I am making an AIR project, which will need some OCR capabilities, so i decided to use tesseract (now i try to get it working on Windows).

My problem is, that can not change the location of the language file - it always tries to look in my Tesseract installation directory (program files (x86)\Tesseract-OCR\tessdata\mylang.traineddata)

Is there a way that i can configure Tesseract to look for this file where i specify? for example in the same folder as tesseract.exe. I dont want (or perhaps event cant) install an application with the AIR installer. I've tried it with the 3.0 version and the latest SVN version.

Thanks

Boyhood answered 5/8, 2011 at 3:9 Comment(0)
B
3

I have solved the problem by rewriting the source code of Tesseract (im using SVN 597). As nguyenq said, Tesseract tries to look for the data at the path set by the TESSDATA_PREFIX environment variable. If this is not found, then it does some trickery i dont understand :) . So if anyone needs a portable version of Tesseract (that is not dependent on a Tesseract installation) edit mainblk.cpp around line 60, this is my version:

// remove the stuff that Tesseract does to find the installation path
/* if (!getenv("TESSDATA_PREFIX")) {
#ifdef TESSDATA_PREFIX
#define _STR(a) #a
#define _XSTR(a) _STR(a)
    datadir = _XSTR(TESSDATA_PREFIX);
#undef _XSTR
#undef _STR
#else
    if (argv0 != NULL) {
      if (getpath(argv0, dll_module_name, datadir) < 0)
#ifdef __UNIX__
        CANTOPENFILE.error("main", ABORT, "%s to get path", argv0);
#else
        NO_PATH.error("main", DBG, NULL);
#endif
    } else {
      datadir = "./";
    }
#endif
  } else {
    datadir = getenv("TESSDATA_PREFIX");
  }*/
  datadir = "./"; // look for config things in the same folder as the executable.

Now you can pack things in the "tesseract executable location"\tessdata directory

Boyhood answered 5/8, 2011 at 3:9 Comment(0)
B
13

Yes, you can, by setting the TESSDATA_PREFIX environment variable, e.g.:

export TESSDATA_PREFIX=/usr/local/share/

Note that the directory path must end in a /.

Barbusse answered 5/8, 2011 at 16:56 Comment(1)
Thanks, i guess i will need to tweak the source code then. Its not a very elegant solution to modify an environment variable at each run of the program (just to make sure, that the user has not set this variable since the last run - for example by installing tesseract).Boyhood
B
4

i suggest you don't handle tessdata path by TESSDATA_PREFIX. you can define tessdata path in init tesseract. If you use tesseract.exe in command line use following syntax:

tesseract.exe  --tessdata-dir  tessdataPath  image.png  output  -l  eng

if you use tesseract::TessBaseApi, in api.init() init as following:

api->Init(tessdataPath, language) //api->Init("C:", "eng")
Bevis answered 28/5, 2017 at 1:24 Comment(0)
B
3

I have solved the problem by rewriting the source code of Tesseract (im using SVN 597). As nguyenq said, Tesseract tries to look for the data at the path set by the TESSDATA_PREFIX environment variable. If this is not found, then it does some trickery i dont understand :) . So if anyone needs a portable version of Tesseract (that is not dependent on a Tesseract installation) edit mainblk.cpp around line 60, this is my version:

// remove the stuff that Tesseract does to find the installation path
/* if (!getenv("TESSDATA_PREFIX")) {
#ifdef TESSDATA_PREFIX
#define _STR(a) #a
#define _XSTR(a) _STR(a)
    datadir = _XSTR(TESSDATA_PREFIX);
#undef _XSTR
#undef _STR
#else
    if (argv0 != NULL) {
      if (getpath(argv0, dll_module_name, datadir) < 0)
#ifdef __UNIX__
        CANTOPENFILE.error("main", ABORT, "%s to get path", argv0);
#else
        NO_PATH.error("main", DBG, NULL);
#endif
    } else {
      datadir = "./";
    }
#endif
  } else {
    datadir = getenv("TESSDATA_PREFIX");
  }*/
  datadir = "./"; // look for config things in the same folder as the executable.

Now you can pack things in the "tesseract executable location"\tessdata directory

Boyhood answered 5/8, 2011 at 3:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.