Parallelizing Fiji headless

I have some Jython FIJI scripts I’m currently running locally on a large analysis workstation; most of the code is pretty well locked, but there are a few configurable parameters that need to be tuned run-to-run since the wet-lab side is currently NOT locked and we often need to check a few parameter sets before we find the one that matches the setup-of-the-day.

I’d like to be able to move to the cloud to run headless in parallel. Does anyone (@ctrueden?) have advice on how best to do this? I was hypothesizing the best thing might be to make everything into PyImageJ notebooks and then try using IPyParallel, but I wasn’t sure it would play nicely so I wondered if anyone had experience with that (or any other) solution.

(I’m not in theory opposed to making the scripts pure Java, but as I’m much more comfortable in Python I’d ideally like to stay in that space; if it makes the overall process much easier though, definitely open to Java solutions)

We approached it using snakemake (https://snakemake.readthedocs.io/en/stable/) as workflow manager and driving Fiji in a Virtual Frame Buffer using Beanshell scripts:


You don’t need snakemake but it is nice for managing large workflows. You can create these jobs using shell scripts and I guess any other interface that creates jobs. I think using python scripts instead of beanshell scripts is also doable. There is some code for this here:

The key for this is to create chunks of jobs with these Fiji calls in the xvfb plus the arguments specific for the jobs:

/sw/bin/xvfb-run -a /sw/bin/python ./project_stacks.py argument

then unpack the arguments in the script:

sys.argv[1]
1 Like

Perhaps relevant, at the most recent FIJI learnathon Pavel Moravec presented sci-java parallel with a nice presentation and examples.

3 Likes

Great question. It might be nice to have a wiki page on imagej.net discussing all the ways forward people are using, highlighting cost/benefit of each. But right now, as far as I know, that documentation does not exist. So for now I’ll just list the ways I know about, off the top of my head:

Headless scripting from the CLI. You can run the scripts headless as described here:

On a compute cluster, each node executes one or more scripts in this fashion.

  • Pro: Easy to implement as bash scripts or whatnot; standard well-established infrastructure.
  • Con: Each startup of Fiji pings for updates by default. If you launch 10000 Fijis, you’ll hit sites.imagej.net 10000 times, and soft DoS the server. This has been happening more and more lately as people are running more and bigger cluster jobs like this. When it gets really bad I end up blocking the IP(s), and those machines can’t talk to imagej.net anymore.

KNIME Server. Workflows using the KNIME Image Processing nodes can be run via KNIME Server. I am not sure exactly what the characteristics are as processing scales up; I defer to @gab1one or @dietzc or other KNIME team member to comment on that.

  • Pro: Some people really like developing workflows graphically. And KNIME has tons of advantages like easy inspection of intermediate results, reproducible execution, etc.
  • Con: Different paradigm from Fiji Jython scripts would requiring adapting or redoing the workflow, so probably not a solution for you here @bcimini. But wanted to mention it as way forward more generally.
  • Con: While KNIME Analytics Platform is FOSS, KNIME Server is a paid feature.

SciJava parallel. As @bnorthan mentioned, a team at IT4I in Ostrava is developing SciJava parallel, a layer on top of SciJava that offers a generic way of distributed programming on top of clusters, without any assumptions about the specific cluster hardware or architecture. If I understand it correctly: support for each kind of cluster is offered as a SciJava “paradigm” plugin, with user code not needing to care about which paradigm plugin is used. I believe the main goal is to make it easy to execute ImageJ/Fiji functionality from within the UI itself. Just click some button like “Run in parallel” and the framework does the rest. I am not sure how far the current development reality is from that ideal, though. I defer to Jan Kožusznik and others on that team regarding technical details.

Container-based execution. Using a tool like Docker or Singularity, run Fiji headless in a container. If you have some container-based workflow tool or orchestration mechanism, these units become more convenient to work with. ImJoy (a partner on this forum) works that way, as does Galaxy, as does Zeiss’s APEER platform. I am not sure how easy it is to scale up ImJoy-based computations; I defer to @oeway on that. And I defer to @sebi06 and @rkirmse regarding APEER. In general, I personally have not dived into these approaches yet, so am ill-equipped to comment on how practical they are to use from a pragmatic or scientific perspective.

ImageJ Server. You can run ImageJ as a server using the ImageJ Server project:

Aside: Funnily enough, we originally developed this approach for CellProfiler after visiting with @agoodman at Broad in 2016, but at this point I think that PyImageJ will be a better way forward for CellProfiler-ImageJ integration—at least if you want the tools running on the same machine.

The ImageJ Server is a pretty general thing, but one use case would be to run one server instance on each cluster node, and then farm out jobs to them from a controller node. IIUC, this is one approach the scijava-parallel project has used in practice to run ImageJ operations in parallel on a cluster.

  • Pro: Run ImageJ on a separate machine. Offers scalability and process isolation.
  • Con: ImageJ Server’s security is not currently robust or up-to-date. It is exploitable remotely if the server machine accepts connections indiscriminately.
  • Con: ImageJ Server’s REST API is pretty minimal right now. This is both good and bad—one downside is that client-side Python code to run commands is not super elegant. But it’s powerful.

The ImageJ Server is also useful for same-machine interprocess communication. For example, I know @kephale has used it to execute ImageJ scripts from Atom, without needing to launch a new ImageJ instance every time.

I’m probably forgetting other approaches people have used. In practice, launching Fiji headless from the command line is probably your best bet here, since you already have a Jython script. When combined with non-ImageJ-specific cluster job infrastructure e.g. like @schmiedc talks about, you should be able to get “embarrassingly parallel” things going.

5 Likes

I believe there is a simple approach for embarrassingly parallel jobs, if you are willing to toss together a super minimal java project:

  1. Make your maven project, add your dependencies (including the respective scripting libraries)
  2. Have your main() function instantiate an ImageJ instance
  3. Use the ScriptService to run your Jython script with parameters of your choosing (perhaps extracted from main()'s arguments

If you are running in a homogeneous cluster environment then you don’t need imagej-launcher, so this solution should be fine.

Pros:

  • It also wont ping the imagej update server (I believe?)
  • No rewriting necessary
  • No containerization
  • No need to introduce potential security risks

Cons:

  • It won’t be natural to transition to complex parallelization, but it doesn’t sound like you are worried about that for this use case

~Kyle

4 Likes

Thanks all! You’ve given me a lot to ponder and tinker with, I promise to report back with what I end up doing!

1 Like

If you’re using a cluster queue manager like Slurm or Torque, you can queue up scripts that will each run independently. If your script only needs 1 (or a few) cores out of a big multi-core node then the queue manager can handle all the heavy lifting. Something like sbatch --cpus-per-task=1 ....

Adding containerization on top of that (I prefer Singularity for isolation, and so do most HPC admins) further isolates everything so you wouldn’t need to worry about the ImageJ instances talking to one another. I wonder if a Singularity “service” would do the trick by relying on Fiji’s single instance listener so you only have to initialize Fiji once, i.e.:

singularity instance start fiji.sif instance1
singularity exec instance://instance1 ImageJ-linux64
singularity exec instance://instance1 ImageJ-linux64 --run script.py

Another trick I’ve used (with or without a queue manager) is nested scripts. Have one script loop through the parameters you want to test, replacing variables in a template and spitting out a folder of new scripts for the parameter sweep.

How does one go about disabling the update ping?

I think this was fixed here:

and will be available with the next release of imagej-updater.

2 Likes

Thanks for bringing up this topic and sorry for the late response.

As an ImJoy core developer, I will comment mostly from our perspective. ImJoy is a frontend centric software, it can basically work with many different type of backends running remotely, all you need to do is to build a corresponding plugin to connect it.

For running remotely, with the Jupyter-Engine-Manager Plugin (shipped by default with https://imjoy.io ), we can connect to remote or local jupyter or BinderHub server and use it as a computational backend. E.g. running pyimagej (a pyimagej plugin demo) via free servers on MyBinder.

I think pyimagej + IPyParallel would worth trying, but you may also want to take a look at Dask, especially you if you are running on a cluster.

ImJoy can be used in addition to that, to provide an user interface for selecting parameters, selecting remote files via a remote file dialog, visualizing results etc.

1 Like

I promised to update, so here I am back again- all of these suggestions were great, but I ended up deciding that since we’ve had good success running Dockerized image analysis in CellProfiler with our Distributed-CellProfiler project, to create a companion workflow called Distributed-FIJI since you guys have helpfully already provided a FIJI docker!

In general, you provide a pointer to a script that can run headless in FIJI, an output file location, and any other parameters your script takes at the command line that aren’t hard-coded (such as input file location, a number of time points, whatever); you can also specify your machine size in terms of CPUs, memory, and disk size. Distributed-FIJI spins up the infrastructure you need, monitors job process (and provides logs), then cleans it up when the job has been marked as successfully completed.

The documentation is still pretty light, though in terms of AWS account setup, etc, a lot of it is identical to the D-CP documentation so if you’re interested in getting started with this right away, that’s a good place to start.

Please feel free to explore, ask questions, etc!

2 Likes