Cell Painting: Error when trying to generate per well profiles

cellprofiler-analyst

#1

Hello all,

I am attempting to generate per well profiles from a Cell Painting experiment. The per object data is in an SQLite database.
I am following this protocol for the script: https://images.nature.com/full/nature-assets/nprot/journal/v11/n9/extref/nprot.2016.105-S1.pdf

I have very limited experience with scripting in general and am completely unfamiliar with python. I’m working in a Windows environment. So far I have been able to install all the packages and have generated the cellpntg environment and installed CellProfiler-Analyst as a git clone. The error I am running into I think has to do with the export PYTHONPATH=pwd line. Since this is in Windows I updated it to be set PYTHONPATH=cd but nothing seems to happen after I type it in.

When I go to the next step to create the cache within the code file I get the following error: C:\pathstring\Miniconda3\envs\cellpntg\python.exe: No module named cpa.profiling found.

Any help would be appreciated!


#2

Hi,

I believe all of this has to be done in Python2, not Python3; can you try downloading Miniconda2 and see if that helps?


#3

I’ve tried this and it still gives the same error. I’m pretty sure the issue is setting the PYTHONPATH variable.

Would the simplest work around be to make a linux partition?


#4

I made a very dumb pathing error with where I was placing my code folder :confounded:

The cache generation step seemed to work, but at the end I received the following message:

“Not performing normalization because not predicate was specified.”

I went on to the next step and received the following error:

Traceback (most recent call last):
File “/home/ipavli2/miniconda2/envs/cellpntg/lib/python2.7/runpy.py”, line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File “/home/ipavli2/miniconda2/envs/cellpntg/lib/python2.7/runpy.py”, line 72, in _run_code
exec code in run_globals
File “/media/ipavli2/OS/Users/pavlinovia/CellProfiler-Analyst/cpa/profiling/profile_mean.py”, line 153, in
full_group_header=options.full_group_header)
File “/media/ipavli2/OS/Users/pavlinovia/CellProfiler-Analyst/cpa/profiling/profile_mean.py”, line 100, in profile_mean
variables = normalization(cache).colnames
File “/media/ipavli2/OS/Users/pavlinovia/CellProfiler-Analyst/cpa/profiling/normalization.py”, line 66, in colnames
for col, keep in zip(self.cache.colnames, self._colmask)
File “/media/ipavli2/OS/Users/pavlinovia/CellProfiler-Analyst/cpa/profiling/normalization.py”, line 55, in _colmask
self._cached_colmask = np_load(self._colmask_filename)
File “/media/ipavli2/OS/Users/pavlinovia/CellProfiler-Analyst/cpa/profiling/normalization.py”, line 18, in np_load
with open(filename, ‘rb’) as f:
IOError: [Errno 2] No such file or directory: ‘…/input/cache/robust_std/colmask.npy’

I checked the cache and there were 3456 archives but no file called robust_std. I’m guessing I’m missing something from the previous step with regards to normalization.


#5

Nevermind, just skipped a step!

In the normalization step I am getting the following error:

WARNING:root:PROPERTIES WARNING (channels_per_image): No value(s) specified. CPA will assume 1 channel per image.
INFO:root:PROPERTIES: Using default image_buffer_size=1
INFO:root:PROPERTIES: Using default tile_buffer_size=1
WARNING:root:PROPERTIES WARNING (well_format): Field was not defined, using default format of “A01”.
INFO:root:[MainThread] Connecting to the database…
INFO:root:[MainThread] SQLite file: …/input/CellPainting.db
Traceback (most recent call last):
File “/home/ipavli2/miniconda2/envs/cellpntg/lib/python2.7/runpy.py”, line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File “/home/ipavli2/miniconda2/envs/cellpntg/lib/python2.7/runpy.py”, line 72, in _run_code
exec code in run_globals
File “/media/ipavli2/OS/Users/pavlinovia/CellProfiler-Analyst/cpa/profiling/normalization.py”, line 266, in
normalizer._create_cache(predicate)
File “/media/ipavli2/OS/Users/pavlinovia/CellProfiler-Analyst/cpa/profiling/normalization.py”, line 81, in _create_cache
self._create_cache_params(predicate, resume)
File “/media/ipavli2/OS/Users/pavlinovia/CellProfiler-Analyst/cpa/profiling/normalization.py”, line 141, in _create_cache_params
controls = self._get_controls(predicate)
File “/media/ipavli2/OS/Users/pavlinovia/CellProfiler-Analyst/cpa/profiling/normalization.py”, line 90, in _get_controls
cpa.properties.image_table, predicate)):
File “/media/ipavli2/OS/Users/pavlinovia/CellProfiler-Analyst/cpa/dbconnect.py”, line 64, in fn
return f(db, *args, **kwargs)
File “/media/ipavli2/OS/Users/pavlinovia/CellProfiler-Analyst/cpa/dbconnect.py”, line 520, in execute
’\nSecond exception was: %s’%(connID, query, e, e2))
cpa.dbconnect.DBException: ERROR: Database query failed for connection “MainThread” and failed to reconnect
Query was: "select distinct Image_Metadata_Plate, ImageNumber from Per_Image where "
First exception was: near " ": syntax error
Second exception was: ERROR: Database query failed for connection "MainThread"
Query was: "select distinct Image_Metadata_Plate, ImageNumber from Per_Image where "
Exception was: near " ": syntax error


#6

Hi,

I’m double checking with my team, but based on that error snippet I’m guessing that the workflow may not be compatible with SQLite, only with MySQL (the query syntaxes are slightly different). I’m sorry, I had no idea, I’ve never seen anyone try to run it with SQLite before! This sentence from the Bray et al paper unfortunately though makes me think it’s likely to be true:

However, the scripts provided to generate per-well profiles from the extracted features are MySQL-only; see the Equipment section for a link to the Python scripts used in this protocol.

Internally we actually don’t typically use that paper’s workflow anymore; we use an R package called cytominer that our team has been working on. A vignette on how to use cytominer to do the same aggregation and normalization steps in the Bray et al workflow but on an SQLite database can be found here. Otherwise, your alternatives would be to use MySQL instead of SQLite (something your IT department may be able to help you with getting set up) or to try to adjust the CPA code to use SQLite instead of MySQL syntax, which since you’ve said you’re not super familiar with coding I don’t recommend.

If I find out that someone sees a different problem in the error you’ve posted so far, I’ll be sure to let you know!


#7

Thanks for the quick reply!

I think another problem will be that I don’t have my controls defined in the metadata in the .properties file. I will try to set that and see if it corrects the issue.

Will definitely look into both of those options (MySQL and cytominer).


#8

If you wanted to test with absolute certainty whether SQLite was the cause, you could download the sample images, illumination correction functions, and analysis pipeline from the paper, set the output to ExportToDatabase->SQLite, and then try to run the workflow on the resulting file (which should have all the specifications necessary to work), but defining your controls is definitely also doing to be necessary!


#9

So it ended up working with the SQLite database, I just needed to define my controls. Or at least I think it worked.

I did run into another issue, when running the normalization step it gave me this error:

TypeError: The numpy boolean negative, the - operator, is not supported, use the ~ operator or the logical_not function instead.

Which was I fixed by reinstalling numpy as an older version (1.12.1).

This seemed to fix everything. However, at the end of the profile generation it gave me this message:

FutureWarning: comparison to None will result in an elementwise object comparison in the future.
rowmask = [(l != None) and all(~np.isnan(l)) for l in data]

I think this is due to the fact that I left all the other wells defined as null, but I don’t have any idea about what it means!


#10

The Future Warning is related to https://stackoverflow.com/questions/33954216/comparison-to-none-will-result-in-an-elementwise-object

Are you able to generated profiles now?

As Beth pointed out, I’d encourage you to try out https://github.com/cytomining/cytominer for profiling analysis.


#11

Yep, it generated the profiles csv!

And I definitely will, but as with all of this it’s slow going for me :laughing: