Ops on large files



Dear forum,

Consider the following code that do pseudo flat field correction:

public <T extends RealType<T>> Dataset doPseudoFlatFieldCorrection(Dataset input, double gaussianFilterSize) {
		Dataset dataset = input.duplicate();

		// Get Gaussian filtered image and use it as a background
		status.showStatus("Apply Gaussian filter to estimate the background.");
		Dataset background = this.applyGaussianFilter(dataset, gaussianFilterSize);

		// Convert to 32 bits
		IterableInterval<FloatType> out = (IterableInterval<FloatType>) ops.run("convert.float32",
		IterableInterval<FloatType> original = (IterableInterval<FloatType>) ops.run("convert.float32",
		IterableInterval<FloatType> backgroundFloat = (IterableInterval<FloatType>) ops.run("convert.float32",

		// Do subtraction
		status.showStatus("Subtract image to background.");
		IterableInterval<FloatType> out2 = (IterableInterval<FloatType>) ops.create().img(out);
		ops.math().subtract(out2, original, backgroundFloat);

		// Clip intensities to input image type
		Img<T> out3 = (Img<T>) ops.create().img(dataset.getImgPlus());
		RealTypeConverter op2 = (RealTypeConverter) ops.op("convert.clip", dataset.getImgPlus().firstElement(),
		ops.convert().imageType(out3, out2, op2);

		Img<T> out4 = out3;
		if (normalizeIntensity) {
			out4 = (Img<T>) ops.create().img(out3);
			RealTypeConverter scaleOp = (RealTypeConverter) ops.op("convert.normalizeScale", out4.firstElement(),
			ops.convert().imageType(out4, out3, scaleOp);

		return matchRAIToDataset(out4, dataset);

While it works well for small to regular size images, it fails for big one (2.8 GB for example). Instead of buying more RAM I would prefer to optimize the code.

Currently, the code does the following:

  • Create a copy of the Dataset and apply a Gaussian filter on that copy.
  • Convert the input to 32bit.
  • Subtract the copy Gaussian filtered and the original input Dataset.
  • Clip the result image.
  • Optional: rescale intensities.

I was naively thinking that I could just iterate over all the frames one by one and do the Gaussian filter + subtraction individually for each frame to save some memory.

Would that make sense to you? Or do you see a better way to optimize it?


Hi @hadim

Depending on the modality of the data, and the nature of the artifact you are correcting for, doing your processing in 2D could make sense.

If you are trying to correct for 2D field variation than the 2D approach may actually work better.

Do you need to do the processing with floats?? Have you tried just using the original pixel format of your images (assuming it’s 8 or 16 bits, this could save some memory). Can you do some of the ops in place?? You may be able to avoid creating extra copies.


I don’t have any Z stacks only Time and Channel.

I am converting to 32 bit because of the subtraction but I guess I can try without (I also remember doing that for op matching type issues… but I might be wrong).

Thank you for the hints, I’ll try to take them into account.


Does Dataset.duplicate() or similar have the possibility to specify an ImgFactory?
In this case, you could just use a DiskCachedCellImgFactory (from imglib2-cache) and most probably it would just work without modifications on arbitrarily large stuff. If this is an option, there is a repository with examples of how to use imglib2-cache: https://github.com/imglib/imglib2-cache-examples


It looks very interesting @tpietzsch. I’ll try to test it and let you know.


Hi @hadim,

just a short thought rather on the methodological part than the coding-related one.
Potentially, you want to implement the pseudo-flat field correction exactly like you did it here (then just ignore my thought) but if you consider dividing your image with the background image created by the gaussian blur and finally multiply each pixel with the total mean value of the background, you might be able to skip the intensity clipping.

I am not sure if this saves some calculation intensive resources (especially in the light with the problem you had using the method on big images), but it also might reduce the chance of any value misassignement during the clipping process.

Anyway, it also needs at least one 32-bit image as the result of the division. Potentially, then you could keep the original image and the background in their original bit-depth and also save some memory and only retrieve the corrected image as a 32-bit one. I don’t know if the ops.math part would do this automatically that way(?)


Thanks for the input @biovoxxel but I don’t do any division after this processing. But it’s a good thing to know and I’ll keep it in mind for the future.


Dear @hadim,

sorry for being late to the party :wink:

You can do the subtraction inplace, to save the memory for at least one copy:

// ...
UnaryInplaceOp<? super IterableInterval<FloatType>, IterableInterval<FloatType>> subtract =
		Inplaces.unary(ops, Ops.Math.Subtract.class, original, backgroundFloat);
// ...

In addition, couldn’t you write the clipped and/or normalized image directly into the input Dataset without creating additional objects? That is, don’t create out3 until you want to normalize.



Hi @hadim,

In relation to this, I don’t know where your input image comes from, but if you’ve opened it with SCIFIO’s DatasetIOService you can control which kind of ImgFactory it uses by calling the open() method with the SCIFIOconfig parameter. As a side note, afaik SCIFIO should automatically open large images as cell images. The end user needs to have SCIFIO enabled from Options > ImageJ2 though.

Best regards,