GUI for job submission to a cluster?

gui
cellprofiler
cluster
createbatchfiles

#1

I’ve configured a small multi-node cluster using Torque/PBS on Ubuntu 16.04 LTS. Batch analysis works well with job submission over command line. Now I’d like to add some kind of (web?) GUI for users not familiar with / scared of the command line. Is BatchProfiler the way to go? How do I go about deploying such a solution? From what I can see on GitHub, it’s a set of python scripts. Is it a working web site that I can put into apache directory?

Apologies for possibly naive questions but I have no experience in configuring web services and the documentation for BatchProfiler is rather scarce. Are there any screenshots of BatchProfiler in action to see what it can do? Googling for images shows CP authors :wink:


#2

I think BatchProfiler is no longer a working website, sorry.

Speaking as a biologist for whom running stuff on the cluster was VERY scary… if you use CreateBatchFiles inside your pipeline so that they just have to submit one line of cellprofiler -p WholeBatch.h5 blah blah , you learn to get over that fear pretty quickly :wink: .

I think a GUI is a really nice thing to have though!


#3

I’ve never used it, but Novartis recently released Jenkins with Cellprofiler which looks interesting.
paper
github


#4

@Swarchal Thanks for the info, this looks very promising!

@bcimini That’s reassuring but from my experience my colleagues don’t respond well to command line :wink: Don’t you guys use GUI for cluster submissions at Broad Institute? It would be great to have batch submission right from CP, just like you can do it in Matlab.


#5

Indeed at the Broad we used to use BatchProfiler (which was never intended to be super-helpful to the general public, it was only shared as an example of how we set things up custom to the Broad cluster). But it broke when our cluster changed this year, and was finicky wrt to maintenance, so we moved to Amazon Web Services to do all our processing. Distributed-CP is now what we use (and also provide publicly in case it’s helpful to others).


#6

I’ll give Distributed-CP a try but also would love to hear your experience with AWS in terms of cost, upload/download speed, etc. Perhaps something to share on CP blog?

My lab is actually heavily weighing pros and cons of expanding our computing/storage facilities vs AWS. My main concern about AWS is data transfer. We generate lots of time course data where a single experiment is 5-20 GB, with a stack for a single FOV between 200 and 600 MB. Browsing such data on the cloud, e.g. being able to quickly scan through a time lapse movie, might not be as smooth as if it was sitting locally.


#7

I can highly recommend the screeningBee workflow manager “BeeWM”. Its not published yet, but working very well and has been used to process thousands of plates. It currently supports DRMAA clusters, SGE, UGE, and LSF. Adding support for other clusters is not super hard, and I could probably help.

BeeWM supports not only CellProfiler, it can execute any kind of executable. It works very well with CellProfiler though, and that is one of our core use cases.

It can fully batch CellProfiler jobs on the cluster, i.e. you can start from a pipeline and images only, and it will process all images. You can also automatically split into batches of image sets, like 200 images per job or whatever. Furthermore you can create more complex workflows, with multiple (possibly chained) jobs. As an example, we execute shading correction on all images first, then we execute CellProfiler on the shading-corrected images. If a job in the queue fails (multiple times), subsequent jobs are aborted. Of course it supports email notification to admins and users, depending on success or error. It also supports automatic re-tries in case of errors. It understands the concept of multiple cluster queues with different runtimes (i.e. for shorter and longer jobs). And it has many other helpful goodies. For example it can copy your data to a cluster scratch before execution, and copy results back afterwards. Its programmable via a REST API. And its fully open source.

Let me know if you’re interested and/or need help. Download from https://www.screeningbee.org/

Disclaimer: I’m one of the main authors of BeeWM :wink: