How to retrieve the header size using e.g. a BioFormats ChannelSeparator

Hi all,
Is there any way to read the size, in number of bytes, of the header of a file, using e.g. bio-format’s ChannelSeparator?
Thanks,
Albert

@albertcardona To my knowledge, in general, Bio-Formats cannot do that. Not all formats have the concept of a “header” with a given number of bytes. For example, for TIFF, what if the IFDs are interleaved with pixel data, versus all prepended before pixels, versus all appended after pixels? How many bytes would you count?

What you could do, I think, would be to create your own RandomAccessInputStream, map it using Location.mapFile("myRAIS", myRAIS) with some string constant, then call setId("myRAIS") and then call myRAIS.getFilePointer() to see what the offset is. And I think it is likely to be (but not necessarily) the offset after the header.

Why do you need this information? What are you trying to achieve?

1 Like

Hi @ctrueden, thanks for taking the time.

The goal is to find out, without loading a giant file, the offset to the first byte of the first image. Yes, I understand that some image formats can have both and offset and a separator (hence ImageJ’s “Raw” command asks for both), and even a trailer with more metadata at the end. What I’m interested in is an efficient way to map each 2D plane to a short[] array, that I can wrap with an ArrayImg from ImgLib2, without a lot of overheads of parsing the full header and trailer and more. When accessing millions of files, an overhead of 100 ms per file adds up fast. Given that all files are the same format, I’d parse that from a single file, and then apply the offset and image dimensions and type to all others. A bit like what’s shown here in my Fiji python tutorial, except that the approximate way of computing the header size in bytes (file size minus size of images) works only when there isn’t a trailer, which TIFF files can have.

The bioformats javadoc doesn’t seem to have a RandomAccessInputStream. Did you mean the RandomAccessInputStream from OME? I guess so–it has a getFilePointer method.

The one detail I am not getting is the setId: on which object? Neither RandomAccessInputStream nor Location have it. And the ChannelSeparator would do it on its own? And the RandomAccessInputStream was created with the filepath already?

1 Like

The setID would be on the ChannelSeparator using the input stream (see https://docs.openmicroscopy.org/bio-formats/5.9.0/developers/in-memory.html)

Though as suggested I would certainly not guarantee that the setID will leave the stream at the end of the header and the behaviour here will differ for each format. Which format is it specifically you are using? I can take a look and confirm if it will work for that particular format or not. I would also suggest that even if the files are all very similar that it does not necessarily mean that the offsets will be the same for each.

1 Like

Hi @dgault, thanks, but the ChannelSeparator cannot be given as argument to the Location.mapFile.

I am looking for a generic solution. If one doesn’t exist, so be it–I was hoping bio-formats could have one, as it abstracts away the image file format.

Im afraid at the moment its not something which Bio-Formats can provide a generic solution for.