CellProflier NamesAndTypes Performance Question

Hi guys,

I have a question related to the NamesAndTypes module.

When i load an single CZI file (200frames, 2channels, 16MB) and use NamesAndTypes to assign the channels based on the metadata, the Pipeline runs really slow. The output of CP shows:

Worker 1: Thu Feb 6 13:50:42 2014: Image # 200, module NamesAndTypes # 3: 5.77 secs

When I split the CZI in two channels and save those as single images, I end up with 400singel TIFFs. The same pipeline (adapted to read those TIFFs runs now much faster. One line of the output is:

Worker 1: Thu Feb 6 14:11:14 2014: Image # 200, module NamesAndTypes # 3: 0.20 secs

My question is, what can I do the speed up the first way to process the data? Because the splitting and saving is something I actually want to get rid of. But if the Analysis itself is much slower afterwards, reading the data as a single file might be the wrong approach. Is there are reason why NamesAndTypes is much slower for the single (large) file? Is it reading in the whole thing every cycle?

Thanks for the help?

Sebastian

I’m sorry, Sebastian. There’s a defect in the cacheing of the open file and CellProfiler is reparsing all 16 MB of the file on each image set. You’re only parsing two images instead of 200 when you break it into pieces. I’ve entered an issue into our tracking system for this:
github.com/CellProfiler/CellPro … ssues/1048

I’ll try to get this fixed straight away, but if it is blocking you, you might consider reverting to CellProfiler 2.0 which should handle this better.

Dear Lee,

thanks for the update. It is fine for me to wait for the fix, since it is not really blocking me currently. But in the (not so far) future people will really appreciate the fact, that CellProfiler can easily read in CZI files directly. This makes it easy for someone t o setup an complete workflow, since CP can be controlled from the command line. The plan is to start CP directly from the ZEN software from their Python macros.

Cheers, Sebastian

Hi Sebastian, it took a bit of time to get around to this, but I’ve implemented a strategy that should work pretty well for large image stacks. It should be available in the trunk build (cellprofiler.org/cgi-bin/trunk_build.cgi) for all platforms in about an hour. I’d appreciate it if you could try it on your cases and let me know if it is sufficient or if you find any problems.

–Lee

Hi Lee,

I was really busy I could not test your fix, but today i did it … :smile:.

Now both pipelines run more or less equally fast. In both cases it took CP 7min to run the complete pipeline. But I have the feeling, that the processing of a single TIFF is now slower than it used to be. Or do you think, 7min processing time is fine. The Pipelines are attached.

The single TIFFs are 40kB each (400 files) and the CZI is 16MB. my machine is a Windows7, 64bit, 8GB with a QuadCore.

Cheers,

Sebi
Ratio_CZI_One_File.cpproj (239 KB)
Ratio_CZI.cpproj (614 KB)

Would you be able to zip up the TIFs and the CZI and post them, so we can test?
-Mark

Lee’s caching strategy has been addressed with the 2.1.1 release.
-Mark