it’s really interesting to see how people use dask in the imaging context!
Agreed! To be honest I don’t think we have quite converged on “best practices”, but there are some interesting patterns emerging!
For me this has the advantage that I can run the same code entirely without dask by changing that single line to
But the problem is that this requires changing code in many places — every function call. I prefer solutions that only require one line of change. I used to worry about making dask an optional dependency, but I think it is now mature enough that I am comfortable depending on it, and using its built-in functions to control execution.
I know that if you move large data between workers and local process
For me, in this case, it was a matter of using dask-jobqueue with the cluster. Clusters have always demanded a significant cognitive overhead because you had to do some mental calculations about how many resources you needed. Usually this meant oversubscribing by a lot (say, 10x), and consequently getting stuck in the queue for a long time. dask-jobqueue changes the game because it submits jobs for you, and those jobs can be small, say, 4 cores for 15min, so they just sneak in whenever there is space available. Once a 15min job finishes, it spins up another to replace it. And so on. It’s really quite magical.
And, as mentioned above, I can in two line changes replace the SLURMCluster with a LocalCluster to test my code runs locally on a subset of the data. This makes debugging large jobs much easier. (There are of course additional issues when running on a cluster — see this discussion for example, though don’t miss the resolution in this comment — but the bulk of the work can be done locally with minimal transition pain to the cluster.
So, in summary, it was not about whether or not I was moving data, but about using the advanced schedulers that are available now with dask.distributed. I expect that things will continue to get better on this front. Also, the distributed schedulers come with a lot of diagnostics, though tbh I have used those less. But I am sure they would be great to identify performance issues.