Structure of ilastik hdf5 Feature export results tables

@ilastik_team and @k-dominik I am looking for documentation on the format/layout of the ilastik HDF5 Feature Export Table. During an object classification, you can choose to export your features as HDF5 or CSV, I did a test round with CSV export, which output a nice CSV:

object_id timestep labelimage_oid User Label
0 0 1 0

then decided to use hdf5 format for my large batch processing run. I don’t have any experience with this format, but figured it would be easier to work with if I had a lot of data output… but I think I made a mistake.

I tried inspecting the hdf5 output from about 40 images > 0.5GB each, with python like so:

import h5py as h
h5path = "../example_h5_data_output/nav16.h5"
f = h.File(h5path, 'r')
f.keys()
# <KeysViewHDF5 ['images', 'table']>

images is an HDF5 group (http://docs.h5py.org/en/latest/high/group.html#group), and table is an HDF5 dataset (http://docs.h5py.org/en/latest/high/dataset.html#dataset). Neither has any metadata associated with it (f['table'].attrs.keys() returns nothing, and f['images'].keys() returns a view of integers as strings), and

f['table'][0]
# (0, 0, 1, b'0', b'nav16', 0.76, 0.24, 0.8555037,...) # length of 196
len(f['table'])
# 821

How do I interpret this data? How is the “Feature Export” table organized for HDF5 export? Is there any way to access a header field so I can interpret it as I would with a CSV?

Hi @Nick_George,

the output table is stored as a structured_array. You can see all “column-names” with

f["table"].dtype.names

All the columns can be conveniently access by their names, e.g. to find the Object Area of all objects in all timesteps:

f["table"]["Object Area"]

I’ll make sure to add this to our docs. Thank you for pointing it out to us!

awesome, thanks @k-dominik! Happy to contribute a sample analysis to the docs once I’ve worked through it.