Running CP in batch mode

Hi,

(i)
in the old Batch_data.mat one could extract the total number of sets with

batch_info = loadmat("%(datadir)s/Batch_data.mat"%(locals()))
num_sets = batch_info’handles’].Current.NumberOfImageSets

however, in the new Batch_data.mat the main variable seems to be “Settings” and not “handles” and i cannot find the “NumberOfImageSets”
could you please tell me where this is “hidden” ?!
or some other way…in fact i am not sure whether the Batch_data.mat is really needed (see ii)
:wink:

(ii)
and another question: when running CellProfiler in the headless mode, must the input pipeline actually be the BachData.mat or can it also be the corresponding pipeline.cp file?
what is the difference? some first tests on our cluster seemed to indicate that it works to run CP with the -p option specifying a normal .cp pipeline…

(iii) Regarding the output files:

  • is it possible to also give the *.csv output files unique names during the batch analysis? e.g. dependent on the sets that are contained in them?

Thanks!
Tischi

[quote=“tischer”]could you please tell me where this is “hidden” ?!
or some other way…in fact i am not sure whether the Batch_data.mat is really needed (see ii)
[/quote]

NumberOfImageSets is no longer retained in the batch file. The only reason it was needed was to determine the total number of batches so it could be split correctly into batches by a submission script, but this is now done internally.

The contribution of CreateBatchFIles is to perform the path substitution if your computer and the cluster reside on different file systems. The batch file contains the pipeline with these substitutions included. But if the file structure for the input/output folders are the same locally and remotely, there’s nothing that prevents the original pipeline file being used.

If you use metadata collection and grouping in LoadImages/LoadData, you can use metadata substitution to yield multiple .csv output files. For example, you can generate a separate file named after the input folder for each folder being analyzed. See my response to your other metadata for more details on metadata specification and the help for LoadImages/LoadData for metadata grouping…

Regards,
-Mark

Hi Mark,

thanks for your answer.

what do you mean by:

“NumberOfImageSets is no longer retained in the batch file. The only reason it was needed was to determine the total number of batches so it could be split correctly into batches by a submission script, but this is now done internally.”

…somehow the script that distributes the jobs on the cluster must know how many jobs there are in total, or?
my problem is that at the moment i don’t know what the easiest way would be to get this information in order to transfer it to my script…

write an additional python-script that scans the input folder in order to figure out the number of images?
…this feels suboptimal to me, because one would have to recode part of the LoadImage Module…

Tischi

This is actually what our scripts do in an indirect way, and it is what I mean by “internally”.

I don’t know if you are using (or have access to) the developer version of CP2.0 from our svn repository, but if you do, take a look at NewBatch.py in the BatchProfiler folder, in particular, these two lines (around line 130)

pipeline.load(batch_file)
image_set_list = pipeline.prepare_run(None)

Essentially, what this does is (a) load the batch file as a pipeline, and (b) executes the prepare_run function for each module. For LoadImages, prepare_run generates the image list. This list is then handed to other functions in NewBatch to build to batches for submission.

Regards,
-Mark

Hi Mark,

thanks! it works! (i was on holidays, thus the late response)

the only issue i have it that this requires that an X Display is set although it is not really needed.

can one change this?

Cheers,
Tischi

I might need to defer to one of our software engineers on this question, but we use Xvfb on our unix servers for this purpose. You can look at the python-2.6.sh file in the svn repository for more details on how we interface with it.
-Mark