How to write compressed output images using RegisterVirtualStackSlices?

Hi,

is anyone aware of a way to change the compression setting of the output files produced by the RegisterVirtualStackSlices plugin? I’m running the plugin via a jython script (through the ImageJ-MATLAB interface, although that shouldn’t be relevant for the problem at hand). The uncompressed TIFF output images take up a lot of disk space so I was hoping to use some form of lossless compression (e.g. PackBits or LZW).

Cheers,
Kimmo

Hello @Kimmo_Kartasalo,

I’m afraid the plugin does not have an option for compressed output files right now. You will have to convert the files yourself with a macro or script.

ignacio

Thanks @iarganda ! It’s not a big deal for me, just thought I’d ask to make sure I didn’t miss anything in the API.

I’m actually doing some parameter sweeping with the plugin (running registrations -> computing quality metrics with some external code -> deleting the registered images -> changing plugin parameters -> repeating the whole thing) and it seems that the disk I/O with the uncompressed images is a bit of a bottleneck. Compressing the files with a macro or script of course wouldn’t really help in this scenario, since I’d have to read the uncompressed files into memory anyway. But as said, I can live with that, no worries.

Kimmo

1 Like

OK! I can eventually add the option for compressed file writing but that won’t help if you need to load them into memory, of course.

Hi again,

I guess it would actually help, since the bottleneck in my case seems to be the hard drive I/O rather than shortage of RAM or processing power. So if I’m not mistaken, it would be helpful to be able to compress the images on-the-fly and write the images to disk in a compressed format to begin with, even though this requires more computation because of the compression & decompression. With the memory part I meant that I wouldn’t get any benefit from a workflow like: write uncompressed TIFFs to disk -> read uncompressed TIFFs to memory -> compress -> write compressed TIFFs to disk, which would be possible by converting the output files of the plugin with an external macro or script. This would still include the bottleneck part of writing/reading the uncompressed data to/from disk. But I’m fine with the uncompressed files for now, so no pressure!

Cheers,
Kimmo

1 Like

Actually, after looking into this a bit more, I’m starting to get doubtful about the disk access being the bottleneck in my analysis…I’m wondering if there’s something going wrong with the parallelization instead.

I’m currently running RegisterVirtualStackSlices on Fiji (ImageJ version 1.51) via the ImageJ-Matlab interface (v.0.7.1) on Matlab R2016b. I’m using an entire node of a cluster (2 x 12-core Intel Xeon E5-2680 v3 with hyperthreading disabled, 128 GB of memory). Thus far, I had been running the exact same thing on a desktop workstation (1 x 8-core Intel Xeon E5-1660 v3 with hyperthreading enabled, 64 GB of memory). I’m not using the headless mode due to some compatibility issues in the ImageJ-Matlab interface, so I’m using Xming to access the GUI on the cluster from the Windows 7 desktop machine. Fiji on the cluster is using 24 parallel threads as shown in the Options->Memory & threads and 16 threads on the desktop (due to the hyperthreading).

Much to my surprise, the runtime is almost identical on the desktop and on the cluster node even though the number of threads is 16 vs. 24 and the number of physical cores is 8 vs. 24. RAM is not an issue on either machine, less than 50 % is in use. First I thought that disk access on the cluster was the bottleneck, since the CPU load is only < 10 % with 1-2 cores at 100 % and others idling. However, I tried using a virtual RAM disk at /dev/shm instead of the hard disk, and this did not have any effect. Now I noticed that CPU load on the desktop is similar to the cluster node: a bit over 10% with 1-2 cores at close to 100 % and others pretty much idle. Moreover, my TIFF images are not that big (around 900x1400 RGB, 1-2 MB, or 1800x2800, 5-6 MB, number of sections/images 260), so I’m very doubtful that the data transfer could be the bottleneck even with the RAM resident virtual disk. In addition, MATLAB’s image registration functions use around 100 % CPU with the same images on both machines, so it can’t be the disk, right?

So, is there some kind of inherent limitation in the parallelization of RegisterVirtualStackSlices or the underlying SIFT feature extraction plugin or bUnwarpJ, which prevents the cluster node from taking advantage of the larger number of cores (and even the desktop from using all 8 cores)? Or should I somehow specify for the plugin the number of cores to use, apart from the general Fiji “parallel threads” setting? It’s a case of pairwise image registration, where each pair of consecutive sections is registered separately in parallel, so it should scale well with more CPUs?

Many thanks for any ideas,
Kimmo

It all depends on the options you selected. What kind of registration are you using? The most multi-threaded execution comes when you select the “Shrinkage constraint” option because no reference image is needed.

Okay, I think I have an idea about where you’re heading with that. I’ve been using both the basic affine feature-only based registration and the bUnwarpJ-based elastic registration. I haven’t used the shrinkage constraint, because it’s not implemented for the elastic model (at least according to the documentation at http://imagej.net/Register_Virtual_Stack_Slices) and for the feature-based registration, I just needed an implementation of the “classical” SIFT + robust model fitting algorithm for use as a baseline method, and thus did not use the extra constraint.

My initial assumption was that the registration would have been implemented such, that each pair of images is registered separately (in parallel), followed by composition of the resulting pairwise transformations (in serial) and application of each transformation to the corresponding image (in parallel). Based on your comment and the way RVSS outputs the registered images during the run, I guess the actual implementation is like this: register image i+1 with image i (i being the reference), apply the resulting transformation to image i+1, register image i+2 to the transformed image i+1 and so on. Then the whole thing will have to be run in serial manner, unless the stack is split into smaller substacks or at least into two parallel runs starting from the reference towards the two ends of the stack. On the other hand there’s no need to compose the transformations, which can be expensive and has to be done serially. I would think that at least if a large number of CPUs is available, doing the registrations in parallel would outweigh the cost of compositing the transformations, but of course that also depends on image size etc.

How does the shrinkage constraint actually work? Can this option be used just to enable the more multi-threaded computation approach outlined above for the affine registration, without really applying any actual regularization e.g. by setting all three weights (shear, scaling, isotropy) to 1? For the elastic case, I guess there’s nothing I can do to improve the situation?

That’s correct.

That is only true if you select the “Shrinkage constraint” option. Otherwise the registration has to be done from a reference image and therefore the process is sequential.[quote=“Kimmo_Kartasalo, post:8, topic:4011”]
Based on your comment and the way RVSS outputs the registered images during the run, I guess the actual implementation is like this: register image i+1 with image i (i being the reference), apply the resulting transformation to image i+1, register image i+2 to the transformed image i+1 and so on. Then the whole thing will have to be run in serial manner, unless the stack is split into smaller substacks or at least into two parallel runs starting from the reference towards the two ends of the stack.
[/quote]

Exactly, that’s how it is done.[quote=“Kimmo_Kartasalo, post:8, topic:4011”]
How does the shrinkage constraint actually work? Can this option be used just to enable the more multi-threaded computation approach outlined above for the affine registration, without really applying any actual regularization e.g. by setting all three weights (shear, scaling, isotropy) to 1?
[/quote]

When selecting the “shrinkage constraint” option, this is what happens:

  1. all the SIFT correspondences are extracted and matched in parallel for each pair of consecutive images
  2. all inliers are rigidly (unless the user selected translation) registered into a common space (sequentially but fast).
  3. relax the whole system of inliers based on the desired transform (the relaxation iteratively takes the inliers of a given slice n and fit them to the inliers of n-1 and n+1, this is also sequential but very fast).

Here you are the method where this process is done.

If you really need elastic registration, then you can use the Elastic Alignment and Montage plugin. It is much more reliable for large sequences of images than bUnwarpJ.

3 Likes

Thanks for your time and the comprehensive answer! This makes it quite clear then, no need for me to worry about any hidden issues with the computation cluster or anything like that.

1 Like