Big Data stitching

Hi All,

I have a bit of an issue in stitching lots of volumes together. I have tried to use the fiji stitcher plug in and the big stitcher but my files are just too big! As an example, say I have 3d volumes (5 in total), each volume is say 50GB. They are in a colomn format of x1, x2, x3, x4 and x5.

x1 overlaps x2 by 20%
x2 overlaps x3 by 20%
x3 overlaps x4 by 20%
and x4 overlaps x5 by 20%.

I can see that scikit image allows for feature detection but I can’t really see any library for image stitching with overlap. An example in 2d would be multiple images for a panorama. The issue then becomes, memory. I definitely don’t have the memory to hold these. So is there a way with memmap in python with scikit image or with Fiji? The files are beyond the realms of the current libraries i can see? Is there a way to do homography on just the overlapping chunks then to blend the images into a larger array or volume?

I have been working on processing then segmentation on smaller ROI’s but this registration problem has me in a bind. Any thoughts?

Shaz

I have looked at the docs for a few libraries but most involve registration of images over each other for multimodal modality.

Have you tried Terastitcher? Maybe that can help.

Hi @Sh4zKh4n,

What happened exactly when you tried big stitcher?
I’ve seen it successfully used for sets of images that are larger than 5 x 50GB images.

John

Are your images 3D volumes or 2D panoramas?

Hi all thanks for the quick reply. Sorry about the delay, I was flying back home from visiting family for xmas…so ive been a bit distracted with jet lag!

@mendel

I came across this and am looking at this as a posibility. I will have to convert my raw files to hdf5 (i think for terastitcher).

@bogovicj
On Big stitcher, I used the file position names. If I downsized the file significantly and used virtual file option, the stitcher ran but failed to do any registration or stitching. I then tried to do slightly less downsizing, this would begin by opening (virtual) the first two files but would then get stuck on the 3 file every time. It would hold memory but not do anything. Maxing out my ram. (12GB available to fiji). When I ran with a massive amount of downsizing, the ram wasnt particularly taxed. You have made me think though, the downsizing should have worked. I have assumed (due to file naming) that I have 5 columns with the spacing as (x1, y1), (x2, y1), (x3, y1), (x4, y1), (x5, y1).

The failure in the down sizing, should have registered and stitched (a bad version) at least. Which may suggest that the columns may be wrong. I will have a go at trying it rows instead with the spacing as (x1, y1), (x1, y2), (x1, y3), (x1, y4), (x1, y5). This will give a rough answer to whether the orientation are correct.

@axtimwalde
My images are 3d volumes from XCT. The original file format is raw, so if I work in python, i tend to convert to hdf5 or into zarr format. For the stitching, I have been converting the raw file to tiff, using fiji. This could be an issue i wonder in the conversion process. The #bigstitcher plugn doesnt read raw files and the docs advise converting to tiff.

I wonder if this has been something that the #python library image processing community should look into? #itk appears to have the capability but the docs are missing the #python examples/use case. #elastix seems much more towards registration and not registration and stitching beyond the size of the original fixed image. #scikit-image has a paper where they show feature based registration then use emblend to stitch the two images together. I have seen some note book examples of #scikit-image with 2d panorama stitching but not for 3d volumes. Since this is also needs to spill on to disk for my use case. It would be interesting to hear some thoughts on that… for the future

Oh but also to work out how i can stitch this now and we can discuss future cases…

@Sh4zKh4n

This is something I am interested in working on for scikit-image, but I can tell you that we are a very long way from this (we don’t even have a concept of “spilling to disk” right now), so I suggest that you focus your efforts on working with existing FOSS such as BigStitcher for now.

Hi, you do not need to convert to hdf5 to use terastitch, Terastitch can process common image format.

1 Like

@jni

Completely understand. I was wondering if that was something that may be developed in the future. Glad its on the radar, if a long way out. Thanks for the input.

@mendel

My images aren’t in the format that terastitcher has as a standard input. They are in .raw format. No header just an array of numbers. According to the documents and the software it appears there are 5 options that the file format can be in. I’m on the phone right now, when I go back to the laptop, I’ll send the list. I hope that makes sense?

you can try converting your image to other formats, such as tif, jpg

@mendel I tried converting volumes to tiff. It didn’t work which is why I thought hdf5 format instead.

you can try to convert to ims format, which is also hdf5 format but with certain architecture.
Can you load a subportion of the image matrix to memory using python or matlab?If so, then you can write into hdf5 file chunk by chunk. If without compression, the speed would be quite fast.

In that way you can also open it inimaris.
Or you can open in imaris and save as ims file or tif file.
You can try imaris file converter which is free.

There are two issues to address: registration and rendering the stitch. For the first, you may well be able to break it down into a simpler problem: registering two volumes, since you know the general layout of your images.

Registering two images would be simplest if you had features to register, because then memory would become a non-issue. If not, you may still be able to use downscaling to get an initial fit, and then use sub-volumes to more finely adjust that registration.

With the two-volume-registration problem solved, you can co-align all your volumes (it’s not ideal, because they’re not all co-registered at the same time, but may be good enough for your needs). Then, you can proceed to the rendering step, for which I believe enough solutions exist.

In the Python world, the registration story is a bit all over the place. There’s some work in DiPy (https://dipy.org/documentation/1.1.0./reference/) and in ITK (which, by the way, is now easily pip-installable). As far as I know, the ITK team has done some work on making lazy access of data more accessible, so they may have a pre-existing solution—I will ask Matt McCormick to take a look at your post.

So, yes, sorry—no “canned” solution that I am aware of.

So I went back to big stitcher. Tera stitchers wouldn’t pick up my Tiff files. So before I went and converted it all go hdf5, I tried big stitcher… And it loaded up! I followed the instructions again and the defining new tasks worked. I had to manually align everything, way to much data for it to search through to find the right match but it loaded. The only problem now is the fusion step keeps being a pain. I tried saving the data, to hdf5 but the blending function was a pain. I probably should have chunked the data . But It took a while to do all of that work. I’ll see what the outcome is tomorrow. The final file is about 142GB. There was slot of overlap, much more than I thought I had. So I’m glad there’s a reduction in data size! I might come back to you all if I have any issues. I’m a bit worried about the fusion process, so might use the chunk function in big stitcher to make life easier.

1 Like

@stefanv thanks for the note! Yes, we have been working hard on this problem for quite some time.

The code is here:

This works for 2D or 3D stitching problems, and the implementation works out of core; we have stitched volumes that are hundreds of gigabytes. Like many stitching implementations, it is phase correlation-based. There are a number of strategies implemented for speed and robustness, e.g. intelligent handling of the overlap, false correlation peak suppression, merging with a distance-map based blending, etc. It builds on the work of @StephanPreibisch @axtimwalde @tomancak to provide global optimization with outlier detection. Sub-pixel accuracy is supported, and there is work in progress to implement the current best-known subpixel interpolation method. The FIJI stitching tile configuration file format is used for input and output, so it can be used in conjunction with FIJI stitching tools. The registration and merging steps can be executed independently, which enables manual intervention and reproduction. For out-of-core processing of large datasets, MetaImage and an HDF5 format output are supported, but we are looking towards N5/Zarr support in the future. TIFF or many other file formats can be used as inputs.

I will work on a few Jupyter notebooks that demonstrate example usage.

each volume is say 50GB

Unless your system has a lot of RAM, it will likely run out of memory. But, it is unlikely that the entire 50GB’s subvolumes are required for accurate registration. So, downsampled or cropped versions of the subvolumes could be used for registration, then the full volume could be merged at full resolution. However, more work is currently required to test / support / document this.

#itk appears to have the capability but the docs are missing the #python examples/use case. #elastix seems much more towards registration and not registration and stitching beyond the size of the original fixed image.

Yes, the itk-montage package is more appropriate for this use case. For elastix, we now have a Python package, itk-elastix. Here are example notebooks.

2 Likes

@thewtex Well thats annoying, I’ve got my data stitched in #bigdatastitcher using #fiji , but I am interested in the ITK montage method. No, I dont have alot of ram unfortunately. I was kind of hoping that a method was avaliable in python but if fiji work for now that’s fine.

I had thought since there where implementations of transformations in #scikit-image , and there was a paper i read but they finished with enblend. That there was a ready to use option. Looking forward to having a look at your jupyter notebooks! The data set I am working on actually has more to register but at different resolution. Im concentration on this before I even play with those. Would be good to have a go in that area as well.

For everyone else, thank you for the advice. It’s been a difficult couple of weeks so trying to work on this and having my boiler go down (no hot water or heating for 2 weeks now). My wife and children are in a hotel while I use the freezing cold house to work in!

It’s taken a few times for me to get the #bigdatastitcher plugin to work because the plugin is built for light based microscopy and some of the steps weren’t applicable. I had to learn the hard way when I completed some of them and got very weird results. I got through it by muddling and the data blending is under way now. everything looks good bar the massive amount time to copy to disk.

Hi All, so I have got the image to stich together and performed all the steps suggested by the BigSticher info but I am getting lots of weird repetitive artefacts. It appears to be because the volumes are cylinders and the bending is creating high and low pixel values. I have tried reading up and even tried different parameters but to no avail.

Any thoughts on why the blending here is going so badly? I have saved the file as a tiff, as the time to save vs HDF5 was significantly faster. I am I think going to have to just split the file up into lots of smaller files and stitch together if I dont get anywhere tomorrow. Its a shame.

Edit:
Initially when I ran this with smooth blending, the overlapped areas were increased in values. So it appears that because the volumes are not perfectly square, the stitching is averageign between image (1) pixel intensity vs 0 since there isnt a image 2 where there is no overlap. Would appreciate any thoughts.

When I look at it, I think the abrupt line is because the volume is not a nice square and the overlap area is approx 20% which is causeing these artefacts. Is there a way to reduce the blenign volume in the plugin? I could always trim the data, the processing might be faster. I can understand if someone suggests splitting it up further and working on each sub volume. The issue I see is that there are small inclusion between those wedges im interested in. That overall are not particulary in high numbers in some of the samples. So that could mean that when it come to segmentation via histogram intensity. The peaks will become smaller and will lose definition. If that makes sense.

So, I’m hoping my pathway through this will help someone at some point! This is where I’ve gotten to.

1)#Bigstitcher works fine but because my volumes are cylinders, the areas are treated as cubes because of the math in the array being X*y. Causing an average of a 2 pixels where there is only 1. Reducing the intensity by half in those regions.
2)so I tried to batch the process in Fiji , but unfortunately I kept hitting a memory wall when trying to crop the virtual file. I’ve tried to look up if there’s a way to do it without splitting the files then reattaching but to no avail.
3) I have gone and scripted a simple script in python with #dask. The process works really nicely and to save one file cropped took 4.5 mins on average. The size of my files are also smaller.
3) I have tried to use the big stitcher to load the hdf5 files. It looks like that’s a no no. What I did do was modify a XML file to point to the files with all the info on each file. This loads the data set with the shape of the file. Unfortunately when it came to opening the bigdataviewer. This crashed because the library doesn’t read the file… I had hoped but I’m putting this route down. May one to try another day.
4) this morning, I’m going to go back to python and dask and script something to save the files as tiff. Then reload back into bigdatastitcher. I’ve never got pyimagej to work well for me. And so will have to back to the app. Hopefully it will work!

I’ll keep people who may be interested in the future updated. (I’ll also share the terrible scripts I’ve written too)

So I don’t know what I did but somehow I saved the HDF5 files in a way that I couldnt access them (or I might have not had ‘r’ added in the file load comman and changed the underlying data shape…oops!

So going back I have used my python script with #scikit-mage image and #dask.array to

  1. repeat the cropping and sizing of the data sets (much easier with the chunking sizes).
  2. review a middle slice (so if the sample is miss orientated, here we can guess if the cropping fits all the image and adjust if needed. repeating individually.
  3. saving the output cropped files directly as tif format using #scikit-image imsave tif method. The process is much slower than hdf5 format but at least it saves and i should be able to open fiji with the dimensions then input from each crop. (im not a great coder so no fancy loops…just manual coding here…) So saving approx 20gb cropped from a 40gb file is taking roughly 17-18mins per file for a tif file vs hdf5 file. At least it is saving the file in a format that #bigstitcher can read.

I should point out all of this is on my laptop using with 16 GB Ram and a 2.8GHz processor (though it appears its been overclocked and is running at 3.4GHz…

which isnt too bad for cropping and cutting 10367 slices at 2050x2050 x*y!

1 Like