Bioformats Macro Extensions - Reading Arbitrary Metadata

Hi all,

I’m trying to access some non-typical OME metadata using the Bioformats macro extension functions. Abberior obf files have both dataset, e.g. <Dataset ID="Dataset:22" Name="nuc_topRight">, and folder, e.g. <Folder ID="Folder:7" Name="nuc_topRight"> that represents the hirachary in the way Lightbox displays the images.

I see from other posts here that I can access the Key Value pairs that are listed mostly in the latter part of the metadata, and also my guess is that the series number steps through the Dataset ID, but I can’t seem to get these non-standard tags from the metadata. One of the first things I’m trying to do is access the Name attribute in the Folder tag. It would be cool to access any element, tag and/or attribute if possible.

I’m using this basic macro structure to try to figure out the accepted strings:

Ext.setId(path);
Ext.getMetadataValue("attemptedStringsWentHere", value);
print("metadata value: " + value);

Is there a way to get arbitrary elements of the metadata structure using a macro, or should I open up Netbeans and bang my head on some Java :slight_smile:

Thanks in advance for your time.
Neil

The macro commands you are using look correct. For original metadata you will need to know the exact String to use for calls. The easiest way is probably to open the image with Bio-Formats normally and select Image > Show Info, this will give you the complete list of the metadata values stored and give you an idea of the correct string to be used.

Hi @dgault, thanks for the speedy reply.

From your response, I take it that only the detail in Info are available via this macro extension. Please let me know if that’s not the case.

As I need to access all of the ome xml metadata I’m working on a plugin instead. These files (Abberior .obf) don’t appear to have much in the Info, but the ome xml is extensive. What delineates items that are pulled into Info vs what’s only in the ome xml? I’m guess I’m heading down the route of what could Abberior do to make more of the metadata available in Info?

Thanks
Neil

The OME-XML which is displayed will contain values that are part of the OME-Model (Current Data Model overview — OME Data Model and File Formats 6.0.1 documentation), additional values which don’t fit in the model, usually specific to a particular format, get stored as annotations and these are the values that appear in the Show Info window and are accessed via Ext.getMetadataValue.

Are there metadata values missing which arent populated in either Show Info or the OME-XML windows? If so it may be that the Bio-Formats reader has not parsed the values. We have a rather limited number of obf samples, so it may be metadata which wasn’t present in those original samples or has been added in a newer version of the format.

Thanks for the data model structure; that’s cool :slight_smile:

I wish the info did have all of that. I’ve attached the info txt for the first in the series from ~20MB .obf file. I’ve also attached the ome-xml that fiji displays, together with the ome-xml that I pull using python bioformats.

Info for Ab4C_02.obf - Overview 1-STAR RED.txt (1.4 KB)
test_obf_omemeta_fiji_import_version.xml (257.8 KB)
test_obf_omemeta_python_bf_version.xml (796.1 KB)

Let me know if you need the obf file.

On a side note, it takes about 15s to open the obf to get the thumbnails, and then opening after that is sufficiently quick and related to the file size.

Thanks
Neil

1 Like

Thanks Neil, so you are able to access all the metadata you need from the OME-XML using python ok?

Yes, I can get at things via python-bioformats, and that’s the option I’ve been using for some applications.

Right now I need to open Abberior files in Fiji, so it’s either macros of Java plugins for me. As you can see, when you open the obf files they’re a bit of a mess, and the names in Lightbox (Abberior easy software version) don’t match those pulled by the list seen in the bioformats thumbs window (I think due to the overlap of <Folder> and <Dataset> tags).

I’ve switched track to write a plugin to pull the metadata and collate things based on a number of matching criteria, together with using the <Folder> elements. I think it’s the only way that get users of the core here to open obf files without getting mixed up.

While I have you distracted on my journey, I’ve gotten to this part of my java code and I’m getting stuck. I’m a part time coder, and I find the layers of readers and classes around these particular stages in the code a little opaque.

        ServiceFactory factory = new ServiceFactory();
        OMEXMLService service = factory.getInstance(OMEXMLService.class);
        IMetadata omexml = service.createOMEXMLMetadata();
        
        ClassList<IFormatReader> cl = new ClassList<>(IFormatReader.class);
        cl.addClass(OMEXMLReader.class);
        ImageReader reader = new ImageReader(cl);
        
        reader.setId(fPath.toString());
        Object metaObj = reader.getMetadataStoreRoot();

Am I on the right track? I’m getting errors on the reader.setId part… But not sure I’m barking up the right tree. Would you be able to advise, and/or point me in the direction of a simple example that pulls the raw xml?

Thanks in advance for your time.

Turns out I was able to distill the key points of bioformats/FormatReaderTest.java at 25645389e076a7bd0011e04c4dd8982c0f0614ed · ome/bioformats · GitHub to get a basic set of code to read the xml as a string:

        OMEXMLService omexmlService = null;
        ServiceFactory factory = new ServiceFactory();
        omexmlService = factory.getInstance(OMEXMLService.class);

        MetadataStore omexmlStore = null;
        omexmlStore = omexmlService.createOMEXMLMetadata();
        
        IFormatReader reader = null;
        reader = new ImageReader();
        reader.setId(this.fPath.toString());
        
        MetadataStore readerMetaStore = reader.getMetadataStore();
        MetadataRetrieve retrieve = omexmlService.asRetrieve(readerMetaStore);
        this.omexml = omexmlService.getOMEXML(retrieve);

Alas, it just gives me the first line of the xml, and I need to get all of it.

I also get the following list of debug messages/errors in the output bf_debug_output-041321.txt (9.1 KB)

Not wanting to duplicate, but wanting to connect to ensure others see both posts, it appears to hang on the last step of [main] INFO loci.common.xml.XMLTools - Validating OME-XML see Abberior files slow to open - #4 by s.besson

Is there a way to bypass all of the plethora of readers, and importantly, bypass the validation? I just want the xml as is.

Thanks
Neil

The debug output you posted looks fine, there is nothing there that shows any errors. If you know you are only opening obf files then you can replace ImageReader with OBFReader and that will prevent it from having to work out which format is being used.

The getOMEXML call looks to be correct so its odd that you only have the first line. Rather than parsing through the text of the XML it might be easier to retrieve the values you need from the Metadata object. For example to retrieve all of the folder names:

OBFReader reader = new OBFReader();
IMetadata omeMeta = MetadataTools.createOMEXMLMetadata();
reader.setMetadataStore(omeMeta);
reader.setId(this.fPath.toString());

int folderCount = omeMeta.getFolderCount();
String[] folderNames = new String[folderCount];
for (int i =0; i < folderCount; i++) {
  folderNames[i] = omeMeta.getFolderName(i);
 }

If it is of help there are some Java examples here: bio-formats-examples/src/main/java at master · ome/bio-formats-examples · GitHub
Or you can use the same API calls in a jython script if that is more convenient, you can see some image conversion examples here: bio-formats-examples/src/main/macros/jython at ngffWorkshop · dgault/bio-formats-examples · GitHub

At the minute we don’t have a way to turn off the validation step for OBF but we are looking it possible work arounds or solutions to resolve the long delay.

Hi @dgault, this is wonderful, thanks.

I didn’t know there were functions to all the different aspects of the metadata like that, very cool. Side note: I wish the API had separate links to each page, not just the one link; you don’t know if that’s an option? All the ‘get’ options in the IMetadata object in the 6.6.1 API Javadoc:

Also, thanks for the example links. jython is something I need to spend a little more time with.

I’ll likely use the brute force way for now, detailed below, but would love to get that long hang out of the omexml validation step.

Thanks for your time; this has been a fun journey into some different code and metadata.

Neil

I found that the omexml is stored in utf-8 bytes at the end of the obf file using this surprisingly fast python script (yet to check if .msr files keep it in the same way):

def get_bytes_from_file(filename):
    return open(filename, "rb").read()

fname = r"filepathgoeshere"
fbytes = get_bytes_from_file(fname)

i = 5
with open(fname, 'rb') as f:
    looking = True
    while looking:
        i += 1
        f.seek(-i,2)
        chkstr = f.read(5)
        if "<?xml" in chkstr.decode("utf-8"):
            looking = False
    f.seek(-i,2)
    xml = f.read().decode("utf-8")
        
print(f"{chkstr} at byte {i}")

Took me a couple of tries to get java to be faster, and the below beats the python by quite a bit:

    public void pullOMEXMLRaw() throws FileNotFoundException, IOException {
        
        // get memory mapped buffer of last ~2MB of obf file
        // TODO - check location of xml within .msr files
        final FileChannel channel = new FileInputStream(this.fPath.toString()).getChannel();
        long mapSize;
        long fileSize;
        fileSize = channel.size();
        mapSize = (long) Math.min(fileSize, Math.pow(2, 21)); // 2,097,152   ~2MB
        MappedByteBuffer buffer = channel.map(FileChannel.MapMode.READ_ONLY, fileSize - mapSize, mapSize);
        
        // create integer from xml 4 chars at start
        byte[] xmlStartBytes = "<?xm".getBytes(Charset.forName("UTF-8"));
        ByteBuffer xmlStartByteBuf = ByteBuffer.wrap(xmlStartBytes);
        int xmlStartInt = xmlStartByteBuf.getInt();

        //  scan through int sized chunks of buffer and compare to xmlStartByteBuf
        int offset = -1; int compInt = 0; int xorInt = 1;
        for (int i = 0; i < mapSize - 8; i++) {
            compInt = buffer.getInt(i);
            xorInt = compInt ^ xmlStartInt;
            if (xorInt == 0) { // XOR bitwise operator equals zero if all bits the same
                offset = i;
                break;
            }
        }
        if (offset == -1) {
            this.omexml = "";
            channel.close();
            return;
        }
        
        int readLength = (int)(mapSize - offset);
        byte[] xmlBytes = new byte[readLength];
        buffer.position(offset-1);
        buffer.get(xmlBytes);
        
        this.omexml = decodeUTF8(xmlBytes);
        channel.close();
    }

    private final Charset UTF8_CHARSET = Charset.forName("UTF-8");
    String decodeUTF8(byte[] bytes) {
        return new String(bytes, UTF8_CHARSET);
    }

For the MetadataRetrieve API docs it is probably better to check out MetadataRetrieve (OME XML library 6.2.2 API) as it gives more detail on the parameters and types