Is there any way to read the size, in number of bytes, of the header of a file, using e.g. bio-format’s ChannelSeparator?
@albertcardona To my knowledge, in general, Bio-Formats cannot do that. Not all formats have the concept of a “header” with a given number of bytes. For example, for TIFF, what if the IFDs are interleaved with pixel data, versus all prepended before pixels, versus all appended after pixels? How many bytes would you count?
What you could do, I think, would be to create your own
RandomAccessInputStream, map it using
Location.mapFile("myRAIS", myRAIS) with some string constant, then call
setId("myRAIS") and then call
myRAIS.getFilePointer() to see what the offset is. And I think it is likely to be (but not necessarily) the offset after the header.
Why do you need this information? What are you trying to achieve?
Hi @ctrueden, thanks for taking the time.
The goal is to find out, without loading a giant file, the offset to the first byte of the first image. Yes, I understand that some image formats can have both and offset and a separator (hence ImageJ’s “Raw” command asks for both), and even a trailer with more metadata at the end. What I’m interested in is an efficient way to map each 2D plane to a
short array, that I can wrap with an
ArrayImg from ImgLib2, without a lot of overheads of parsing the full header and trailer and more. When accessing millions of files, an overhead of 100 ms per file adds up fast. Given that all files are the same format, I’d parse that from a single file, and then apply the offset and image dimensions and type to all others. A bit like what’s shown here in my Fiji python tutorial, except that the approximate way of computing the header size in bytes (file size minus size of images) works only when there isn’t a trailer, which TIFF files can have.
The one detail I am not getting is the
setId: on which object? Neither
Location have it. And the
ChannelSeparator would do it on its own? And the
RandomAccessInputStream was created with the filepath already?
The setID would be on the ChannelSeparator using the input stream (see https://docs.openmicroscopy.org/bio-formats/5.9.0/developers/in-memory.html)
Though as suggested I would certainly not guarantee that the setID will leave the stream at the end of the header and the behaviour here will differ for each format. Which format is it specifically you are using? I can take a look and confirm if it will work for that particular format or not. I would also suggest that even if the files are all very similar that it does not necessarily mean that the offsets will be the same for each.
Hi @dgault, thanks, but the
ChannelSeparator cannot be given as argument to the
I am looking for a generic solution. If one doesn’t exist, so be it–I was hoping bio-formats could have one, as it abstracts away the image file format.
Im afraid at the moment its not something which Bio-Formats can provide a generic solution for.