Imglib2 Split image into chunks for multi-threaded processing

Hello all,

I’ve been looking to speed up my plugin using the multi-threaded processing tools in imglib2. I noticed in the Erosion class that they have use the method SimpleMultiThreading.divideIntoChunks(), which I’m assuming is meant to efficiently divide your image into chunks for you, but this method is listed as deprecated and hasn’t been updated in some time. In the new multithreading TaskExecutor class, it has an example which has the line:

List< RandomAccessibleInterval< IntType > > chunks = splitImageIntoChunks( image, numTasks );

However, I can find no method splitImageIntoChunks() (or similar) in the library, is this something that we have to implement ourselves now? It’s fine if that’s the case, I just don’t want to reinvent the wheel if it’s already been done.

Andrew

The class SimpleMultiThreading is deprecated.

You should be able to achieve multithreaded processing on chunks using LoopBuilder:

LoopBuilder.setImages(yourImage).forEachPixel(t -> doSomethingWith(t))

There’s also forEachChunk(), but I don’t have any example code for its usage. Maybe others (@maarzt, @tpietzsch) can help here.

1 Like

Thanks, I’ll start looking at LoopBuilder. I think I would have to use forEachChunk() for my purposes.

Hi @amccall & @imagejan ,

I wrote LoopBuilder to make it simple to write pixelwise operations on imglib2 images. If that’s what you want to do, than LoopBuilder also conveniently supports multi threading. I started to write some introduction. I’m not yet sure where to put it, so I will for now, post it here:

LoopBuilder

LoopBuilder provides a simple way to apply a pixelwise operation onto one or more images. Lets assume we want to calculate the pixelwise sum of two images A and B and copy the result into an image S. Using imglib2 cursors this would look like:

// Input and output images must be given as RandomAccessibleInterval<...>
RandomAccessibleInterval<FloatType> imageA = ...
RandomAccessibelInterval<FloatType> imageB = ...
RandomAccessibleInterval<FloatType> imageS = ArrayImgs.floats(...);

// calculate sum:
Cursor<FloatType> cursorA = Views.flatIterable(imageA).cursor();
Cursor<FloatType> cursorB = Views.flatIterable(imageB).cursor();
Cursor<FloatType> cursorS = Views.flatIterable(imageC).cursor();
while (cursorA.hasNext) {
    FloatType pixelA = cursorA.next();
    FloatType pixelB = cursorB.next();
    FloatType pixelS = cursorS.next();
    pixelS.set(pixelA.getRealFloat() + pixelB.getRealFloat());
}

With LoopBuilder this operation can be written in a few lines:

// calculate sum:
LoopBuilder.setImages(imageA, imageB, imageS).forEachPixel(
    (pixelA, pixelB, pixelS) -> {
        pixelS.set(pixelA.getRealFloat() + pixelB.getRealFloat());
    }
);

That’s simpler and often even faster than the loop using cursors. LoopBuilder will either use random accesses or cursors, depending on what promises to be faster for the given images. Additionally quite some measures are taken to make the loop better suitable for Just-In-Time-compilation.

LoopBuilder can do more than just adding two images. If you want to calculated imageR = imageA * imageB - imageC * imageD, here you go:

LoopBuilder.setImages(imageR, imageA, imageB, imageC, imageD).forEachPixel(
    (r, a, b, c, d) -> {
        r.setRealFloat(a.getRealFloat() * b.getRealFloat() -
                c.getRealFloat() * d.getRealFloar());
    }
)

Multi-Threading

If your images are big, than multi-threading might speed up the operation. And it’s super easy if you use LoopBuilder. Let’s calculate the sum using multiple threads. The only thing you need to do is write .multiThreaded() before the call to forEachPixel(...).

// calculate sum, using multiple threads:
LoopBuilder.setImages(imageA, imageB, imageS).multiThreaded().forEachPixel(
                                              ///////////////
    (pixelA, pixelB, pixelS) -> {
        pixelS.set(pixelA.getRealFloat() + pixelB.getRealFloat());
    }
);

This should run about four times faster, on a CPU with four CPU cores. (Your image needs two be big enough.) What LoopBuilder internally does, is the following: It splits the images into chunks, and the chunks are than distributed to a pool of threads. But be careful, this only works reliable if your operation (in this case “pixelS = pixelA + pixelB”) is thread safe.

Let’s take a look at an example, where the operation is not thread safe. Let’s calculate the accumulated sum over the square of all the pixels of an image.

FloatType squareSum = new FloatType(0);
LoopBuilder.setImages(image).forEachPixel(
   pixel -> {
      squaredSum.setReal(squaredSum.getRealFloat() + Math.pow(pixel.getRealFloat, 2));
   }
)

Only adding .multiThreaded() here can cause a wrong result. Because the operation updates one variable, the squaredSum. And multiple threads updating one variable is bad. You will get wrong results. Wrong results are bad. Bad is not good. The solution is to use a squaredSum variable per chunk. (Each chunk is always only processed by one thread.) This will leave us we a list of squared sums, one individual sum per chunk. To get the total squared sum over the complete image, we need to add all the individual squared sums.

// For each chunk calculate an individual squared sum.
List<FloatType> listOfSquaredSums = LoopBuilder.setImages(image).forEachChunk( chunk -> {
   FloatType squaredSum = new FloatType(0);
   chunk.forEachPixel( pixel -> {
      squaredSum.setReal(squaredSum.getRealFloat() + Math.pow(pixel.getRealFloat, 2));    
   } );
   return squaredSum;
})

// Calculate the sum over all the individual squared sums.
FloatType totalSquaredSum = new FloatType(0);
for(FloatType squareSum : listOfSquaredSums) {
   totalSquaredSum.add(squaredSum);
}
6 Likes

Thanks @maarzt for these illustrative examples!

Just for my understanding: would it also suffice to use synchronized for the common variable here?

Something like this:

FloatType sum = new FloatType(0);
synchronized (sum) {
    LoopBuilder.setImages(image).multiThreaded().forEachPixel(
        pixel -> {
            sum.add( pixel )
        }
    )
}

(Of course this would only make sense if there are more costly computations inside the loop…)

1 Like

@maarzt I just started with these LoopBuilder examples in the ImageJ tutorials, I didn’t see this post before, what do you think about adding your notes / more examples, also the synchronized example to this PR?

2 Likes

Thank you @maarzt and @imagejan, this was very helpful and should help speed up my plugin significantly.

Hello @imagejan,

Thank you for pointing out synchronize. But the synchronize in the example was wrongly placed. It will only synchronize the call to LoopBuilder. But LoopBuilder will internally still start multiple threads and distribute the operation. The threads are not affected by the synchronize statments. So you still get the wrong result.

The synchronize needs to go inside the operation to take effect. This means that synchronize is now called ones per pixel. This makes this usually fast operation very slow.

FloatType sum = new FloatType(0);
LoopBuilder.setImages(image).multiThreaded().forEachPixel(
    pixel -> {
        synchronized (sum) { // works but VERY SLOW
            sum.add( pixel )
        }
    }
)

Here is a detailed example & benchmark: https://gist.github.com/maarzt/eddc7901572e2b17baa5b71547f439cf

1 Like

Thanks for the correction. Cool that you also created a benchmark!

As sum.add is called in a synchronized way inside the forEachPixel loop, I’d expect very poor performance, as it is in fact a single threaded processing now, adding the overhead for synchronization.

If you’d replace the simple pixel access inside the add() call by some heavy computation, it might still be valuable, no? Or is the approach with forEachChunk and subsequent list processing always to be preferred?

@imagejan That’s a very good question!
When using multi-threaded LoopBuilder, should one use forEachPixel of forEachChunk?

forEachPixel is simpler, and therefor usually the better choice. You can use it, if you have

  • a fast thread-safe operation.
  • a heavy per pixel computation, that has a small part that needs to be synchronized.

ALLWAYS, make sure to MEASURE YOUR EXECUTION TIME. To see if you actually get faster using multi-threading.

forEachChunk on the other site, is more complicated. You should use it, if you have:

  • a fast per pixel operation, that needs some resources. A resource is for example a temporary variable, that would otherwise be created for each pixel. Her is a small example: Swap the content of two FloatType images:
LoopBuilder.setImages(imageA, imageB).forEachChunk( chunk -> {
    FloatType tmp = new FloatType(); // create require resource
    chunk.forEach((a, b) -> {
        tmp.set(a);   // operation using the resource
        a.set(b);
        b.set(tmp);
    });
    // Optional: free your resource if needed.
    return null; // Optional: return some results.
})
  • Another use case, is to quickly accumulate a result over all pixel, as shown comment’s above.
1 Like