How to trim the fat off ilastik `.ilp` files to ship with headless operation?

Dear @ilastik_team,

I have developed a workflow that uses ilastik in headless mode. I would now like to provide this workflow (together with the classifier) to a user.

Unfortunately, the .ilp file is very large as when creating the project, I have very sparse labels across about 10 volumes and even included a few volumes where I didn’t label anything.

Is there a simple way to trim the .ilp file down to only the features where there are actual annotations?
Or is the best way to reduce the size by creating a “lean classifier” from scratch (include just a few volumes and hope one can get a similarly good classifier by setting good labels) ?

1 Like

Hi @VolkerH,

the .ilp file is just a hdf5 file with a different file extension. So all tools to edit hdf5 files can help you in this case. I don’t think it’s possible to really delete parts of hdf5 files, so you’d have to copy relevant groups/datasets to a new one.

In general, are you using Pixel Classification?
On what OS are you?

The features should not be saved to the project file, and annotations are usually tiny.
One thing, that I could imagine has happened: You either imported stacks (that are then saved to the project file), or changed the location of the training files to “copied to project file” (what do you see in the Data Selection applet under “Location”)?

Cheers
Dominik

1 Like

Thanks for your help.
Ah, good to know that the .ilp is just a hdf5.
I’m developing this on Windows but the idea is to actually deploy this to Mac or Linux.
I think I may have copied the training files to the project file. Will check later.

Hi again @k-dominik,

related to my original question. The .ilp-file is nice and small if I don’t embed the training images. However, now I need to distribute the training images with the .ilp file if I want to hand the classifier to someone.

I also get the impression from the console messages that during headless operation, the first thing ilastik does is to re-calclulate the features from the training images and build the random forest again from the labels in the .ilp file. Is there a way to generate a model for headless operation that just contains the random forest classifier and the list of features that are used?

Could you maybe paste the log message, that makes you come to this conclusion? This would be a bug!

If you just want to apply the classifier to new data in headless mode, there is no need for the original data to be there. Only if the people are to use the gui, the data needs to be there.