efficient TIFF tile extraction C++
Asked Answered
Z

5

8

I am working with 1gb large tiff images of around 20000 x 20000 pixels. I need to extract several tiles (of about 300x300 pixels) out of the images, in random positions.

I tried the following solutions:

  • Libtiff (the only low level library I could find) offers TIFFReadline() but that means reading in around 19700 unnecesary pixels.

  • I implemented my own tiff reader which extracts a tile out of the image without reading in unnecesary pixels. I expected it to be faster, but doing a seekg for every line of the tile makes it very slow. I also tried reading to a buffer all the lines of the file that include my tile, and then extracting the tile from the buffer, but results are more or less the same.

I'd like to receive suggestions that would improve my tile extraction tool!

Everything is welcome, maybe you can propose a more efficient library I could use, some tips about C/C++ I/O, some higher level strategy for my needs, etc.

Regards, Juan

Zeeland answered 30/10, 2009 at 17:25 Comment(6)
Can we assume they are uncompressed?Mercer
And that the data is organized in scanlines?Rikki
yes, data is uncompressed and organized in the most traditinal way: line1 line2 line3 ...Zeeland
What does LibTiff report for TIFFGetField(tif, TIFFTAG_TILEWIDTH, &tileWidth) and TIFFGetField(tif, TIFFTAG_TILELENGTH, &tileLength)?Crescentia
data is not organized in strips nor in tiles. The TIFFGetField with TIFFTAG_TILEWIDTH/TIFFTAG_TILELENGTH doesn't change the value of the variables I pass in.Zeeland
It has to be either strips or tiles.Hellenist
C
3

[Major edit 14 Jan 10]

I was a bit confused by your mention of tiles, when the tiff is not tiled.

I do use tiled/pyramidical TIFF images. I've created those with VIPS

vips im_vips2tiff source_image output_image.tif:none,tile:256x256,pyramid

I think you can do this with :

vips im_vips2tiff source_image output_image.tif:none,tile:256x256,flat

You may want to experiment with tile size. Then you can read using TIFFReadEncodedTile.

Multi-resolution storage using pyramidical tiffs are much faster if you need to zoom in/out. You may also want to use this to have a coarse image nearly immediately followed by a detailed picture.

After switching to (appropriately sized) tiled storage (which will bring you MASSIVE performance improvements for random access!), your bottleneck will be disk io. File read is much faster if read in sequence. Here mmapping may be the solution.

Some useful links:

VIPS IIPImage LibTiff.NET stackoverflow VIPS is a image handling library which can do much more than just read/write. It has its own, very efficient internal format. It has a good documentation on the algorithms. For one, it decouples processing from filesystem, thereby allowing tiles to be cached.

IIPImage is a multi-zoom webserver/browser library. I found the documentation a very good source of information on multi-resolution imaging (like google maps)

The other solution on this page, using mmap, is efficient only for 'small' files. I've hit the 32-bit boundaries often. Generally, allocating a 1 GByte chunk of memory will fail on a 32-bit os (with 4 GBytes RAM installed) due to the fact that even virtual memory gets fragemented after one or two application runs. Still, there is sufficient memory to cache parts or whole of the image. More memory = more performance.

Claudeclaudel answered 4/1, 2010 at 20:41 Comment(1)
TIFFs can be tiled or have strips of data. This is the only way to manage the creation and reading of large images.Borodino
G
2

Just mmap your file.

http://www.kernel.org/doc/man-pages/online/pages/man2/mmap.2.html

Grice answered 30/10, 2009 at 22:9 Comment(2)
I'm currently testing this option. Thanks for your reply.Zeeland
Interesting on 64 bit operating systems. Large tiff files easily go past 32-bit boundaries. On my xp I have problems reading bitmaps of 400MByte and above, because 'virtual memory' fragmentation. That is: I can't find a 400 MByte chunk of consecutive memory space, even with 2 GByte free (!) RAM.Claudeclaudel
Z
2

Thanks everyone for the replies.

Actually a change in the way tiles were required, allowed me to extract the tiles from the files in hard disk, in a sequential way, instead of a random way. This allowed me to load a part of the file into ram, and extract the tiles from there.

The efficiency gain was huge. Otherwise, if you need random access to a file, mmap is a good deal.

Regards, Juan

Zeeland answered 21/4, 2010 at 8:7 Comment(0)
I
0

I did something similar to this to handle an arbitrarily large TARGA(TGA) format file. The thing that made it simple for that kind of file is that the image is not compressed. You can calculate the position of any arbitrary pixel within the image and find it with a simple seek. You might consider targa format if you have the option to specify the image encoding.

If not there are many varieties of TIFF formats. You probably want to use a library if they've already gone through the pain of supporting all the different formats.

Isobar answered 4/1, 2010 at 20:51 Comment(0)
D
-1

Did you get a specific error message? Depending on how you used that command line, you could have been stepping on your own file.

If that wasn't the issue, try using imagemagick instead of vips if it's an option.

Doubleganger answered 8/11, 2010 at 20:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.