@Sh4zKh4n I recommend that you provide a complete failing example as code (not screenshots) that people can run on their own machine. StackOverflow has a good article on this.
This makes it much easier for people to debug your use case rather than try to think about it just from reading it, which is hard, or have to abstract away the problem.
In your case, I agree with Matt Rocklin that it seems the ndimage median filter is the failing function. Unfortunately, ndimage just does not do great for large kernel sizes. For example:
import numpy as np
from scipy import ndimage as ndi
return ndi.median_filter(image, size=25)
res = median3d(
np.random.randint(0, 256, size=(285, 245, 310)).astype(np.uint8)
ie running your code for just one chunk is taking forever on my machine (3min and counting), using up “only” 2GB of RAM. There is an open issue in SciPy about this excessive RAM usage, but it is really quite a difficult one to fix, which is why it remains open 5 years later.
Now, the thing I totally don’t understand is why
dask_image.ndfilters.median_filter is working at all for you, since literally all it is doing is wrapping
ndimage.median_filter in a
map_overlap call. You can see the source code and find that it boils down to a
map_blocks call just like yours. I can also reproduce that behaviour as follows:
import numpy as np
from dask_image.ndfilters import median_filter
from dask.diagnostics import ProgressBar
pbar = ProgressBar()
arr = da.random.randint(
0, 256, size=(10260, 1225, 1550),
chunks=(285, 245, 310)
smoothed = median_filter(arr, size=25)
[# ] | 3% Completed | 3min 36.8s
But, it does eventually start using too much RAM and spilling to disk, and brings my computer to a standstill…
On that I really must say I am stumped.
Things I would try next in your situation:
- try using the dask.distributed scheduler and some local workers. By default, dask.array uses the threaded scheduler, which as I understand it will request as much RAM as the OS will give, and the OS will give more than it has and spill to disk, which is inefficient. Distributed gives you fine control over the resources you allocate to each worker. We saw that a single chunk takes up about 2GB of RAM to process, so you should be good with e.g. 3 workers with 5GB each (it’s good to have a bit of overhead space!).
- We have an open PR for 3D rank filters in scikit-image that should not suffer from the ndimage excessive RAM issue here. The PR needs review but as I understand it it’s ready to go, so you should be able to build that branch locally and try it. It should be much faster than ndimage for large kernels such as the one you are using.
Again, I totally don’t understand why dask_image median_filter works at all, given that I can’t process even a single chunk with the SciPy version. That is something to discuss in the dask_image issue — I’ll add this comment there.