Learning ImgLib2 development

To learn ImgLib2 plugin development I started on a plugin that converts a IntegerType image into a dense ranked version of itself. It can be found here: https://github.com/rharkes/Dense-Rank-plugin/blob/master/src/main/java/nl/nki/imagej/Dense_Rank_plugin.java
This is only my second attempt at trying to work with ImgLib2. I like it a lot, but I am not confident that I find the best way of building a plugin. Would someone maybe have a quick look at my code and see if there are things I could have done better? Any advise would be appreciated! For example, would it have been better to program an OPS instead of what I did here?

As a next step I would like to use this ranked image to calculate and subtract a temporal sliding median. Preferably running in parallel over each pixel. With ImageJ1 I made a threadarray depending on Runtime.getRuntime().availableProcessors(); and gave each thread an AtomicInteger that corresponded with the pixel it needed to work on. Can I still do that with ImgLib2, or is this method obsolete now and does ImgLib2 have smarter ways of doing parallel computing?

I use Eclipse as IDE.


So I tried, but it is 2.5x as slow compared to my old plugin. The code is referenced below, but this is what the old version does:

  1. copy everything frame by frame from the imagestack to a java short array in the order {t-x-y}
    a) Use ImageStack.getImageArray()
    b) Check which pixelvalues exists for use in the next step
  2. Dense rank the array
  3. Subtract temporal median background in parallel
  4. copy everything back frame by frame from the short array to the imagestack
    a) uses ImageStack.setPixels

The new version:

  1. Dense rank (as in the post above)
  2. Subtract temporal median
    a) Same parallel system as in the old version
    b) Two RandomAccess pointers per pixel for the start and end of the window that move in time.

By using ImgLib2 everything looks much more elegant. But what could be going on that makes it slower?

Old: https://github.com/rharkes/Temporal-Median-Background-Subtraction/tree/master/src/main/java
New: https://github.com/rharkes/Temporal-Median-Background-Subtraction/tree/imglib2_develop/src/main/java/

Hey @rharkes,

welcome to the ImageJ2 world. Your example code looks pretty professional :slight_smile:

The last link you posted is not functional. It’s this file, right?

Thus some general thoughts:

  • You are using an iterator to go through pixels. Using a cursor is more popular and makes the code a bit slimmer.
  • I would recommend creating Java packages. For example a package named like nl.nkiavl.rharkes might make sense. If you don’t create packages, you might get issues later when building more complex software or when shipping the plugins to other colleagues. Furthermore, it is recommended to make the group ID in the pom.xml identical to the first part of the package. However, package names are not allowed with minusses. Thus, I would also recommend renaming the group-ID.
  • I didn’t know about the statusService. Thus, I learned something. Thanks for this! :slight_smile:
  • Just for curiosity: Why do you have your own pow() method? Is it different from Math.pow()?
  • Regarding the parallelisation, some people work on that at the moment: There is some ImageJ2/SciJava magic on the way. You find example code here. Maybe, @maarzt can tell us if this is already part of the Fiji distribution?

I hope that helps!



Hi @haesleinhuepf,
Thank you for the compliment and the helpfull advise.

  • I updated the pom.xml and put my code in packages.
  • The custom pow() method was because I was hesitant to go to an external library for something so simple, but I will use the Math.pow() in the future.
  • Cursor indeed makes the loop very slim. Love it!

The new version of the plugin has the advantage of not duplicating the entire dataset in memory. Unfortunately it takes 2.5x as long. I think it is mostly spend moving randomAccess around.


I would try to forward this question to @tpietzsch - maybe he can tell us, what makes that code slower than the ImageJ1 counter part. I bet there is a way of speeding things up. :wink:


I hope @tpietzsch will have some time to look at it because indeed, l bet there is a way to get it faster. I duplicated imagedata to a java array the ImageJ1 way and with a cursor. The ImageJ1 way was about 3x faster. But duplicating all data is not something someone should do in general I expect. Working on the data directly should be the fastest method.

1 Like

@rharkes In your imglib copy example, you treat Z (dim[2]) as fastest-moving dimension:

That’s probably a mistake because the ImageJ1 version doesn’t do that?

Anyway, in this case, extracting the position out of the Cursor to compute an index is what is slow. Instead, you should just iterate in flat order and write to the output array consecutively. Something like this would be more idiomatic:

		final int pixels = (int)Intervals.numElements(img);
		final short data[] = new short[pixels];
		statusService.showStatus("loading data");
		int p = 0;
		for (UnsignedShortType s : Views.flatIterable(img)) {
			data[p++] = s.getShort();
			if ((p % 2000) == 0) {
				statusService.showProgress(p, pixels);
		statusService.showStatus(1, 1, "FINISHED");

(not tested.)


Some ways to improve the imglib2 version:
Do not create new RandomAccess for every pixel.

Do not do complicated index computations and position-setting for every pixel:

Instead, try to rewrite subtractMedian such that you pass in a RandomAccess (instead of i) that is already correctly positioned in x, y and only move it around in z. (as a first approximation to making it nicer, then maybe go on and try to highlight the 1D nature of the operation, e.g., abstract the dimension along which the 1D operation goes.)

Instead of randA1.move(1, 2) you can us randA1.fwd(2) which might be slightly faster.

I would split the output range between threads beforehand instead of competing for AtomicInteger every pixel. (But the same goes for the IJ1 version)

I would also not create new arrays tempres, hist for every pixel but try to reuse them.
This is the same as with creating RandomAccesses above.
Maybe easiest to make the subtractMedian into a functor object with reusable RA and arrays as members. If you want to go all-out imglibby, I would make this object implement Positionable so that you can just position it at x,y coordinates and then run() to do the z line from there, etc. (but this is not related to performance, just “nice”…)


Thank you @tpietzsch ! This makes the copyexample even a little (10%) faster than the old method. I will try to rewrite the subtractMedian to accept a RandomAccess that contains only the z component of the pixel. Should I use Views.offsetInterval to create the RandomAccess?

1 Like

First tried to implement the new copy method into the old ImageJ1 macro, but I missed the fact that the array that is created in the new method is ordered xyt instead of txy as in the old method. This slows down the algorithm (change from this to this) and makes the total runtime 50% longer.

1 Like

@tpietzsch: Could you maybe have a look at the new version. I wanted to go all-out imglibby ofcourse. :blush: So I think I implemented Positionable [1], and now have only one RandomAccess. However, without any parallel computing it is still very slow. Here something must be changed for sure. You suggested to split up the task between the threads of the processor?

@rharkes looks good! That is more or less what I had in mind.

Did you try to parallelize it the same way as before (like this https://github.com/rharkes/Temporal-Median-Background-Subtraction/blob/e1ea1184770798aeed3c667405f8c51399b961c1/src/main/java/nl/nkiavl/rharkes/TemporalMedian.java#L88-L106)?

Give every new Thread() it’s own subtractMedian (I would rename this class to SubtractMedian, it’s very uncommon to start a class name lower-case) and reuse it within the thread. How does this compare to before?

1 Like

@tpietzsch Thank you. I parallelized it. However, on my test-file (256x256x28746, uint16) with a window of 501 frames and an offset of 1000 it takes 2m40. The previous version takes 1m00.

1 Like

@rharkes Hmm, I don’t see any big saving opportunities immediately. I would expect some overhead, but more than factor 2 seems a bit too much. But I’m not sure… Could you provide an easy way to run the old vs new version from an IDE? Then I could play around a bit myself…

I tried, but found it difficult because I need both scijava 16.1 and 26, but must choose only one in my pom.xml

Could you give me instructions on how to set up a benchmark then? (Which branches of which project(s) and what to run?)

Hi @tpietzsch. I made two releases that with corresponding sourcecode and .jar files. A 2.2 version in the old environment and a 3.2 version using ImgLib2. I run them using a big file (256x256x26746 uint16) but you can generate any dataset that is about 30k frames in time with random numbers between 0 and for example 3000.
I run the mediansubtraction using a 501 window and 1000 offset.

@rharkes I finally had a chance to play with it for a little while.

The most important point first:
You have a bug (in all versions) in the median computation. The aux is initialized incorrectly.
It should be fixed like this:

I made a 256x256x26746 uint16 image with random values between 0 and 3000.
Then ran TemporalMedian with a 501 window and 3000 offset.

The 2.2 version takes 55 seconds on my computer.
(Spoiler: We’ll take that down to 27 seconds with the changes below.)

I started to imglib2-ify that. It’s neither cleaned up nor made into a plugin yet, but shows some ideas for encapsulating parts of the algorithm and where imglib2 stuff makes things easier.
This version https://github.com/tpietzsch/Temporal-Median-Background-Subtraction/blob/66956617ecf6eecc77e2a4c09767ed523d3face3/src/main/java/imglib/Playground.java
takes 35 seconds on my computer, so ~60% of original.
I verified that it produces the same result (after fixing the bug in the 2.2 version.

This helper class

computes a running median, using histogram code adapted from the old version and a ring buffer to keep track of history of added values (for removing oldest from histogram).
I think it is very helpful to encapsulate the median like this for keeping the main plugin cleaner.

In contrast to the 2.2 version, the input image is not copied to make a ranked version.
This is done on-the-fly by making a converted virtual View of the input image:

RankMap.build iterates once over image to build rank[]/unrank[] arrays.
ranked uses that to provide a virtual ranked view of the input image.

Because the median filter keeps track of history internally, it is also safe to just do the median subtraction in place on the input image.

What I would try if I had more time with this:

  1. How much does it impact performance when using int instead of short? If this is negligible, I would prefer that. I would be surprised if the current code wouldn’t break when the input actually goes over the (signed) short range. Using ints would be an easy fix.

  2. For parallelisation, try processing multiple adjacent pixels in batch. I suspect this is more cache-friendly. As a very preliminary experiment, this version https://github.com/tpietzsch/Temporal-Median-Background-Subtraction/blob/66956617ecf6eecc77e2a4c09767ed523d3face3/src/main/java/imglib/Playground2.java
    processes image lines in batch (takes runtime further down to 27 seconds, ~50% of original). But I wouldn’t do whole lines, this was just an experiment.

I hope that gives you some hints how to proceed…

  1. Make it applicable to other (integer) types (in particular UnsignedByteType)
  2. Make it applicable along other dimensions than Z. Deal with images with dimensionality != 3, e.g., it should be possible to apply it along Z, also if your image is XYCT or XYCZT.

@tpietzsch Wonderfull! Thank you for this. I hope to have some time this week to implement your suggestions. Already I’m very happy that it is indeed possible to speed it up.

The error was indeed present in the old version, when I made the ImgLib2 version I also found it when writing the test cases and fixed it in that version, but never updated the old version.
With respect to the short->int, I checked it a week ago and saw no loss or gain in performance when I changed it. At the beginning of this plugin (copy the whole image to java-primitive array) I couldn’t afford 2x more memory usage, but now that everything happens pixel-by-pixel it is no longer a problem.