How to make tesseract-ocr read from coordinates on a screen?
Asked Answered
K

2

6

I have been trying to look for an example of how to make a class/function that would attempt to read text from a the screen at specified coordinates.

Something simple that would use bitblt to capture the specified section of the screen and run tesseract on it. All done in memory without having to create image files to disk.

Tesseract seems to have really poor API and requires a TIF image of all things, as far as I can see it can't even be made to accept a bitmap memory image without extensive delving into its code.

Any help would be appreciated, an actual example would be ideal.

Kaylenekayley answered 7/4, 2014 at 22:23 Comment(0)
B
6

http://i.imgur.com/HaJ2zOI.png enter image description here

Read on/view the below to see how to use Tesseract-OCR with images from memory..

#include <iostream>
#include <vector>
#include <stdexcept>
#include <fstream>
#include <memory>
#include <cstring>
#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>

#if defined _WIN32 || defined _WIN64
#include <windows.h>
#endif

class Image
{
    private:
        std::vector<std::uint8_t> Pixels;
        std::uint32_t width, height;
        std::uint16_t BitsPerPixel;

        void Flip(void* In, void* Out, int width, int height, unsigned int Bpp);

    public:
        #if defined _WIN32 || defined _WIN64
        explicit Image(HDC DC, int X, int Y, int Width, int Height);
        #endif

        inline std::uint16_t GetBitsPerPixel() {return this->BitsPerPixel;}
        inline std::uint16_t GetBytesPerPixel() {return this->BitsPerPixel / 8;}
        inline std::uint16_t GetBytesPerScanLine() {return (this->BitsPerPixel / 8) * this->width;}
        inline int GetWidth() const {return this->width;}
        inline int GetHeight() const {return this->height;}
        inline const std::uint8_t* GetPixels() {return this->Pixels.data();}
};

void Image::Flip(void* In, void* Out, int width, int height, unsigned int Bpp)
{
   unsigned long Chunk = (Bpp > 24 ? width * 4 : width * 3 + width % 4);
   unsigned char* Destination = static_cast<unsigned char*>(Out);
   unsigned char* Source = static_cast<unsigned char*>(In) + Chunk * (height - 1);

   while(Source != In)
   {
      std::memcpy(Destination, Source, Chunk);
      Destination += Chunk;
      Source -= Chunk;
   }
}

#if defined _WIN32 || defined _WIN64
Image::Image(HDC DC, int X, int Y, int Width, int Height) : Pixels(), width(Width), height(Height), BitsPerPixel(32)
{
    BITMAP Bmp = {0};
    HBITMAP hBmp = reinterpret_cast<HBITMAP>(GetCurrentObject(DC, OBJ_BITMAP));

    if (GetObject(hBmp, sizeof(BITMAP), &Bmp) == 0)
        throw std::runtime_error("BITMAP DC NOT FOUND.");

    RECT area = {X, Y, X + Width, Y + Height};
    HWND Window = WindowFromDC(DC);
    GetClientRect(Window, &area);

    HDC MemDC = GetDC(nullptr);
    HDC SDC = CreateCompatibleDC(MemDC);
    HBITMAP hSBmp = CreateCompatibleBitmap(MemDC, width, height);
    DeleteObject(SelectObject(SDC, hSBmp));

    BitBlt(SDC, 0, 0, width, height, DC, X, Y, SRCCOPY);
    unsigned int data_size = ((width * BitsPerPixel + 31) / 32) * 4 * height;
    std::vector<std::uint8_t> Data(data_size);
    this->Pixels.resize(data_size);

    BITMAPINFO Info = {sizeof(BITMAPINFOHEADER), static_cast<long>(width), static_cast<long>(height), 1, BitsPerPixel, BI_RGB, data_size, 0, 0, 0, 0};
    GetDIBits(SDC, hSBmp, 0, height, &Data[0], &Info, DIB_RGB_COLORS);
    this->Flip(&Data[0], &Pixels[0], width, height, BitsPerPixel);

    DeleteDC(SDC);
    DeleteObject(hSBmp);
    ReleaseDC(nullptr, MemDC);
}
#endif

int main()
{
    #if defined _WIN32 || defined _WIN64
    HWND SomeWindowHandle = GetDesktopWindow();
    HDC DC = GetDC(SomeWindowHandle);

    Image Img = Image(DC, 0, 0, 200, 200); //screenshot of 0, 0, 200, 200..

    ReleaseDC(SomeWindowHandle, DC);
    #else
    Image Img = Image(some_pixel_pointer, 200, 200); //pointer to pixels..
    #endif

    std::unique_ptr<tesseract::TessBaseAPI> tesseract_ptr(new tesseract::TessBaseAPI());

    tesseract_ptr->Init("/tesseract/tessdata', 'eng");
    tesseract_ptr->SetImage(Img.GetPixels(), Img.GetWidth(), Img.GetHeight(), Img.GetBytesPerPixel(), Img.GetBytesPerScanLine()); //Fixed this line..

    std::unique_ptr<char[]> utf8_text_ptr(tesseract_ptr->GetUTF8Text());

    std::cout<<utf8_text_ptr.get()<<"\n";

    return 0;
}
Bradley answered 7/4, 2014 at 23:0 Comment(7)
Oh man.. I made an error in the post.. I fixed it. It's: Img.GetBytesPerScanLine() Not Img.GetWidth * Img.GetBytesPerScanLine. That was my fault for copying the code from the Image class to the main due to laziness.. I just had time to test it and it works. If it still gives you problems do: SetImage(Img.GetPixels(), Img.GetWidth(), Img.GetHeight(), 4, Integer(Img.GetWidth() * 4)). That's how I debugged it. Shouldn't have any problems though! The above code should work as is..Bradley
Believe me, I understand! After all, the library alone is frustrating to get working. It really is a pain. Glad it works though =)Bradley
It printed fine for me. I've added a Save option to the Image class. You should now be able to save the image and see what your screenshot looks like. This will let you see what you're passing to tesseract-ocr.Bradley
If the screenshot came out fine and everything else is fine, most likely tesseract does not recognize the characters or some other option needs setting.. It's been a while since I actually used tesseract (other than for testing this post).. Seeing as the code above works and that tesseract is getting the right images, the only thing I can think of is that you might have to use black text on a white background? Not too sure but that's what I tested it with. Anyway, I've got school so I'll check back in the morning and see how things go with you. I'll try to come up with something.Bradley
If anything, you can make a thread explaining your problem that tesseract is not recognizing characters and see if anyone else has an idea until I get back from school to help debug it..Bradley
It does take a bit of debugging to get right. The guys at villavu (a pascal scripting site) use it pretty much everyday without problems iirc.. I'll test it on Firefox when I awake and I'll let you know how it goes. For now just try it on something like notepad and see if it recognizes your text. I'll add loading from file to the image class so you can test it on those.Bradley
Well I added the ability to flip the image now.. Bitmaps by nature are stored upside down in memory. This should now flip it up-right. Should work now.Bradley
U
2

You can do it like this on windows.

#include <tesseract/capi.h>
#include <windows.h>

void ReadFromScreen(RECT rc)
{
    HWND hWndDesktop = GetDesktopWindow();
    HDC hDC = GetDC(hWndDesktop);

#define BITS_PER_PIXEL   32
#define BYTES_PER_PIXEL  (BITS_PER_PIXEL / 8)
    int nWidth = rc.right - rc.left;
    int nHeight = rc.bottom - rc.top;
    BITMAPINFO bi;
    memset(&bi, 0, sizeof(bi));
    bi.bmiHeader.biSize = sizeof(BITMAPINFOHEADER);
    bi.bmiHeader.biWidth = nWidth;
    bi.bmiHeader.biHeight = -nHeight;
    bi.bmiHeader.biPlanes = 1;
    bi.bmiHeader.biBitCount = BITS_PER_PIXEL;
    bi.bmiHeader.biCompression = BI_RGB;

    void* pixels;
    HBITMAP hBitmap = ::CreateDIBSection(0, &bi, DIB_RGB_COLORS, &pixels, NULL, 0);
    HDC hMemDC = CreateCompatibleDC(NULL);
    SelectObject(hMemDC, hBitmap);
    BitBlt(hMemDC, 0, 0, nWidth, nHeight, hDC, rc.left, rc.top, SRCCOPY);
    int nDataSize = nWidth * nHeight * BYTES_PER_PIXEL;
    TessBaseAPISetImage(pTessBaseAPI, (const unsigned char*)pixels, nWidth, nHeight, BYTES_PER_PIXEL, BYTES_PER_PIXEL * nWidth);
    if (TessBaseAPIRecognize(pTessBaseAPI, NULL) != 0)
    {
        return;
    }
    char* szText = TessBaseAPIGetUTF8Text(pTessBaseAPI);
    // Todo something with szText

    TessDeleteText(szText);
    DeleteObject(hBitmap);
    DeleteDC(hMemDC);
}
Unstopped answered 9/6, 2018 at 3:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.