CellProfiler memory issue? Linux Server

Hiya,
I am trying to run my pipeline on a Linux Ubuntu 16.04 server with 120Gb RAM and 32 workers to analyze ~38Gb of images (each image about 900Mb) of different treatments and create a CPA file to further analyse results in analyst.

At the beginning I tried with 32 workers but after a while CP got stuck/quit.
I decreased to 16 workers and initially CP indicated it would take 6h46 and the next day it was only at ~3000/8000 image sets and indicated it would take 55h. and at some point it quit …
I have been trying for a while now, changed the temp folder to one on my data drive (for the intermediate hdf5 files) with 190Gb free, change number of workers, etc and am at my wit’s end!

Should I do a smaller subset of images? Though that would prevent me from analyzing them side-by-side with CPA… (though there are surely database tricks??)
Any help would be useful!

Pipeline does:

  • UNET-based nuclei segmentation
  • nuclei measurements (all + zernike)
  • nuclei tracking
  • spot in nuclei detection
  • spot measurements (all of them)
  • classification using rulesets (that include zernike features hence present in measurements)
  • overlay outlines
  • save different images: nuclei + spots, classes, tracking ID
  • export to CSV
  • export to database
    The entire pipeline works on a smaller subset of test images but I get some warnings at startup and during analzsis (see below)

tagging @bcimini @DStirling @anyonewhocouldhelp

Couple of Warnings I get

(cellprofiler:22678): Gdk-WARNING **: gdk_window_set_icon_list: icons too large
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/wx-3.0-gtk2/wx/lib/buttons.py", line 148, in GetDefaultAttributes
    return wx.Button.GetClassDefaultAttributes()
  File "/usr/lib/python2.7/dist-packages/wx-3.0-gtk2/wx/_controls.py", line 247, in GetClassDefaultAttributes
    return _controls_.Button_GetClassDefaultAttributes(*args, **kwargs)
wx._core.PyAssertionError: C++ assertion "Assert failure" failed at ../src/gtk/window.cpp(4096) in GTKSendPaintEvents(): unsupported background style
/home/dolivierhub/CellProfiler/cellprofiler/gui/moduleview.py:2091: wxPyDeprecationWarning: Call to deprecated item. Use `SetInitialSize`
  grid.SetBestFittingSize(v.min_size)
/home/dolivierhub/CellProfiler/cellprofiler/utilities/hdf5_dict.py:539: FutureWarning: Conversion of the second argument of issubdtype from `int` to `np.signedinteger` is deprecated. In future, it will be treated as `np.int64 == np.dtype(int).type`.
  np.issubdtype(hdf5_type, int) or
/home/dolivierhub/CellProfiler/cellprofiler/utilities/hdf5_dict.py:541: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  hdf5_type_is_float = np.issubdtype(hdf5_type, float)

STARTUP errors:

dolivierhub@ficoides:~$ cellprofiler
/home/dolivierhub/.local/lib/python2.7/site-packages/requests/__init__.py:83: RequestsDependencyWarning: Old version of cryptography ([1, 2, 3]) may cause slowdown.
  warnings.warn(warning, RequestsDependencyWarning)
Gtk-Message: Failed to load module "gail"
Gtk-Message: Failed to load module "atk-bridge"
Gtk-Message: Failed to load module "gail"
Gtk-Message: Failed to load module "atk-bridge"
Could not load runimagej
Traceback (most recent call last):
  File "/home/dolivierhub/CellProfiler/cellprofiler/modules/__init__.py", line 325, in add_module
    m = __import__(mod, globals(), locals(), ['__all__'], 0)
  File "/home/dolivierhub/CellProfiler-plugins/runimagej.py", line 11, in <module>
    import imagej
  File "/home/dolivierhub/.local/lib/python2.7/site-packages/imagej/__init__.py", line 1, in <module>
    from .imagej import *
  File "/home/dolivierhub/.local/lib/python2.7/site-packages/imagej/imagej.py", line 10, in <module>
    import jnius_config
ImportError: No module named jnius_config
Could not load DoGNet
Traceback (most recent call last):
  File "/home/dolivierhub/CellProfiler/cellprofiler/modules/__init__.py", line 325, in add_module
    m = __import__(mod, globals(), locals(), ['__all__'], 0)
  File "/home/dolivierhub/CellProfiler-plugins/DoGNet.py", line 6, in <module>
    import dognet
ImportError: No module named dognet
Could not load transformfilters
Traceback (most recent call last):
  File "/home/dolivierhub/CellProfiler/cellprofiler/modules/__init__.py", line 326, in add_module
    cp_module = find_cpmodule(m)
  File "/home/dolivierhub/CellProfiler/cellprofiler/modules/__init__.py", line 313, in find_cpmodule
    raise ValueError("Could not find cellprofiler.module.Module class in %s" % m.__file__)
ValueError: Could not find cellprofiler.module.Module class in /home/dolivierhub/CellProfiler-plugins/transformfilters.pyc
Could not load save_16bit_pngs
Traceback (most recent call last):
  File "/home/dolivierhub/CellProfiler/cellprofiler/modules/__init__.py", line 325, in add_module
    m = __import__(mod, globals(), locals(), ['__all__'], 0)
  File "/home/dolivierhub/CellProfiler-plugins/save_16bit_pngs.py", line 11, in <module>
    import imageio
ImportError: No module named imageio
Could not load enhancedmeasuretexture
Traceback (most recent call last):
  File "/home/dolivierhub/CellProfiler/cellprofiler/modules/__init__.py", line 325, in add_module
    m = __import__(mod, globals(), locals(), ['__all__'], 0)
  File "/home/dolivierhub/CellProfiler-plugins/enhancedmeasuretexture.py", line 124, in <module>
    import calculatemoments as cpmoments
  File "/home/dolivierhub/CellProfiler-plugins/calculatemoments.py", line 38, in <module>
    import cellprofiler.cpimage as cpi
ImportError: No module named cpimage
Could not load calculatemoments
Traceback (most recent call last):
  File "/home/dolivierhub/CellProfiler/cellprofiler/modules/__init__.py", line 325, in add_module
    m = __import__(mod, globals(), locals(), ['__all__'], 0)
  File "/home/dolivierhub/CellProfiler-plugins/calculatemoments.py", line 38, in <module>
    import cellprofiler.cpimage as cpi
ImportError: No module named cpimage
Could not load rescale_mode_percentile
Traceback (most recent call last):
  File "/home/dolivierhub/CellProfiler/cellprofiler/modules/__init__.py", line 325, in add_module
    m = __import__(mod, globals(), locals(), ['__all__'], 0)
  File "/home/dolivierhub/CellProfiler-plugins/rescale_mode_percentile.py", line 11, in <module>
    import statistics
ImportError: No module named statistics
Could not load identifylinearobjects
Traceback (most recent call last):
  File "/home/dolivierhub/CellProfiler/cellprofiler/modules/__init__.py", line 325, in add_module
    m = __import__(mod, globals(), locals(), ['__all__'], 0)
  File "/home/dolivierhub/CellProfiler-plugins/identifylinearobjects.py", line 14, in <module>
    import cellprofiler.cpmodule as cpm
ImportError: No module named cpmodule
Could not load transform
Traceback (most recent call last):
  File "/home/dolivierhub/CellProfiler/cellprofiler/modules/__init__.py", line 325, in add_module
    m = __import__(mod, globals(), locals(), ['__all__'], 0)
  File "/home/dolivierhub/CellProfiler-plugins/transform.py", line 31, in <module>
    import cellprofiler.cpimage as cpi
ImportError: No module named cpimage
Using TensorFlow backend.
Could not load calculatehistogram
Traceback (most recent call last):
  File "/home/dolivierhub/CellProfiler/cellprofiler/modules/__init__.py", line 325, in add_module
    m = __import__(mod, globals(), locals(), ['__all__'], 0)
  File "/home/dolivierhub/CellProfiler-plugins/calculatehistogram.py", line 18, in <module>
    import cellprofiler.cpimage as cpi
ImportError: No module named cpimage
Could not load measureimagefocus
Traceback (most recent call last):
  File "/home/dolivierhub/CellProfiler/cellprofiler/modules/__init__.py", line 325, in add_module
    m = __import__(mod, globals(), locals(), ['__all__'], 0)
  File "/home/dolivierhub/CellProfiler-plugins/measureimagefocus.py", line 8, in <module>
    import microscopeimagequality.miq
  File "/home/dolivierhub/.local/lib/python2.7/site-packages/microscopeimagequality/miq.py", line 14, in <module>
    import tensorflow.contrib.slim
  File "/home/dolivierhub/.local/lib/python2.7/site-packages/tensorflow/contrib/__init__.py", line 47, in <module>
    from tensorflow.contrib import image
  File "/home/dolivierhub/.local/lib/python2.7/site-packages/tensorflow/contrib/image/__init__.py", line 70, in <module>
    from tensorflow.contrib.image.python.ops.single_image_random_dot_stereograms import single_image_random_dot_stereograms
  File "/home/dolivierhub/.local/lib/python2.7/site-packages/tensorflow/contrib/image/python/ops/single_image_random_dot_stereograms.py", line 27, in <module>
    "_single_image_random_dot_stereograms.so"))
  File "/home/dolivierhub/.local/lib/python2.7/site-packages/tensorflow/contrib/util/loader.py", line 56, in load_op_library
    ret = load_library.load_op_library(path)
  File "/home/dolivierhub/.local/lib/python2.7/site-packages/tensorflow/python/framework/load_library.py", line 73, in load_op_library
    exec(wrappers, module.__dict__)
  File "<string>", line 27
    def single_image_random_dot_stereograms(depth_values, hidden_surface_removal=True, convergence_dots_size=8, dots_per_inch=72, eye_separation=2,5, mu=0,333299994, normalize=True, normalize_max=-100, normalize_min=100, border_level=0, number_colors=256, output_image_shape=[1024, 768, 1], output_data_window=[1022, 757], name=None):
                                                                                                                                                   ^
SyntaxError: invalid syntax
/home/dolivierhub/CellProfiler-plugins/cellstar/utils/debug_util.py:38: UserWarning: 
This call to matplotlib.use() has no effect because the backend has already
been chosen; matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.

The backend was *originally* set to 'WXAgg' by the following code:
  File "/home/dolivierhub/.local/bin/cellprofiler", line 11, in <module>
    load_entry_point('CellProfiler', 'console_scripts', 'cellprofiler')()
  File "/home/dolivierhub/CellProfiler/cellprofiler/__main__.py", line 162, in main
    app = cellprofiler.gui.app.App(0, workspace_path=workspace_path, pipeline_path=pipeline_path)
  File "/home/dolivierhub/CellProfiler/cellprofiler/gui/app.py", line 55, in __init__
    super(App, self).__init__(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/wx-3.0-gtk2/wx/_core.py", line 8628, in __init__
    self._BootstrapApp()
  File "/usr/lib/python2.7/dist-packages/wx-3.0-gtk2/wx/_core.py", line 8196, in _BootstrapApp
    return _core_.PyApp__BootstrapApp(*args, **kwargs)
  File "/home/dolivierhub/CellProfiler/cellprofiler/gui/app.py", line 58, in OnInit
    import cellprofiler.gui.cpframe
  File "/home/dolivierhub/CellProfiler/cellprofiler/gui/cpframe.py", line 7, in <module>
    import cellprofiler.gui.figure
  File "/home/dolivierhub/CellProfiler/cellprofiler/gui/figure.py", line 20, in <module>
    import matplotlib.backends.backend_wxagg
  File "/home/dolivierhub/.local/lib/python2.7/site-packages/matplotlib/backends/__init__.py", line 17, in <module>
    line for line in traceback.format_stack()


  matplotlib.use('Agg')
could not load these modules: runimagej,DoGNet,transformfilters,save_16bit_pngs,enhancedmeasuretexture,calculatemoments,rescale_mode_percentile,identifylinearobjects,transform,calculatehistogram,measureimagefocus

(cellprofiler:22678): Gdk-WARNING **: gdk_window_set_icon_list: icons too large



at beginning
image

after 20h
image

ressources (sometimes not liberated even after CP not running anymore)
image

I definitely do not recommend trying to analyze such a large batch of images all at once, in the GUI- you’d be better off running in smaller batches (ideally, headless, but in the GUI if you really don’t want to run headless) and then concatenating all your data files in the end to one larger database.

thanks! guess i’ll be learning how to unify databases!

as a rule of thumb, how many files can be run in headless mode? ish…

could i run all of them headless or would smaller batches be required too?

cheers

debbi

could i run all of them headless or would smaller batches be required too?

Running headless runs one image at a time, so you’d presumably want to batch them - we nearly always batch down to the smallest possible batch possible (in your case, one timelapse movie), to make it maximally parallel, but you could break it into 8, 10, 16, however many batches.

How many you can run at a time depends on your access to CPUs and memory, and your pipeline’s requirements for each, so I can’t give truly general recommendations; since you said 16 workers at a time did not bork your available memory, but 32 did, I’d say your magic number is between 16 and 32 for your setup.

it didnt work with 32 workers - turns out 4 workers and 3 files (ie ~3gb) works - trying out now 6 files with same amout of workers.
However I get this error more frequently

File "/home/dolivierhub/CellProfiler/cellprofiler/gui/moduleview.py", line 4180, in GetAttr
    attr.IncRef()  # OH so bogus, don't refcount = bus error
  File "/usr/lib/python2.7/dist-packages/wx-3.0-gtk2/wx/_core.py", line 852, in IncRef
    return _core_.RefCounter_IncRef(*args, **kwargs)
wx._core.PyAssertionError: C++ assertion "Assert failure" failed at ../src/gtk/window.cpp(4096) in GTKSendPaintEvents(): unsupported background style

any idea what that error means? it takes longer to process an image at that stage too though RAM only 72 out of 125gb available.
Is there a LOG file somewhere so we can check what went wrong when it crashes? couldn’t find one!

if anyone has the patience to explain how to run headless, i would be grateful.
Cheers