ExportToDatabase sometimes producing an "empty" database

I’m using CellProfiler to make some measurements and use the ExportToDatabase module to export the results (as an sqlite database) for use with CellProfiler Analyst. I’ve found that the ExportToDatabase module will produce an “empty” (for lack of a better word) database if I check either or both of:

“Calculate the per-image median values of object measurements?”
“Calculate the per-image standard deviation values of object measurements”

By “empty” (see github.com/CellProfiler/CellPro … /issues/38) I mean that if I then open the database with CellProfiler Analyst using the generated *.properties file and try to fetch objects in the Classifier, I receive the following error:

[code]An error occurred in the program:
ValueError: empty range for randrange() (1,1, 0)

Traceback (most recent call last):
File “classifier.pyc”, line 581, in OnFetch
File “datamodel.pyc”, line 119, in GetRandomObjects
File “datamodel.pyc”, line 97, in GetRandomObject
File “random.pyc”, line 241, in randint
File “random.pyc”, line 217, in randrange[/code]

A ValueError is also raised if I use any of the other tools that are based on per-object measurements (ScatterPlot, Histogram, etc.). If both of those options are set to “No” then I get a database that I can use, and no ValueErrors are raised.

I’ve attached the pipeline file and two example images. I’m using the 2.1.2 64-bit Windows trunk build (r20150327144625) on Windows 8.1. I don’t need CellProfiler to calculate the per-image values, but in working with the program I spent a good amount of time tracking down why the exported databases didn’t contain any information so I figured I would share. :smiley:
ExtractFeatures.cpproj (654 KB)



Thank you for reporting this. And for debugging it!
I have added a Github issue here: github.com/CellProfiler/CellPro … ssues/1340

Thanks,
David

Hi David,

Thanks for filing a bug report about the issue I ran into. The thread on GitHub has helped me narrow down what I think the problem may be, as I’ve since run into the same issue again but in a different context.

I’m working with a different CP pipeline that extracts measurements for use with CPA. I’ve since upgraded CP to the latest trunk build (r20150706201038) and receive the following error when I run my pipeline:

Traceback (most recent call last): File "cellprofiler\pipeline.pyc", line 2127, in prepare_run File "cellprofiler\modules\exporttodatabase.pyc", line 1649, in prepare_run File "cellprofiler\modules\exporttodatabase.pyc", line 2198, in create_database_tables File "cellprofiler\modules\exporttodatabase.pyc", line 262, in execute OperationalError: too many columns in result set
From what I can gather, this error is basically telling me that I’m trying to export too many measurements into a SQLite database and running into the 2000 column limit (https://www.sqlite.org/limits.html) as Lee pointed out on GitHub. Would this be correct?

If so, is there a way to find out the total number of measurements I’m making so I can find an appropriate way to load them into CPA? I’ve thought about compiling sqlite3 with an increased column limit but that seems like a sub-optimal solution. I’m guessing maybe using a MySQL database instead could be another way to increase the limit but I’d rather figure out how many measurements I’m making before going through the hoops of setting that up.

Thanks in advance!
Jon

Hi Jon,

Yes, indeed.

Hmm, not directly, afaik. Yes, I think the best solution going forward would be to switch databases (MySQL is what we use at the Broad).

I would start by looking at your ExportToDatabase settings. Do you have/need per-well means/medians/std? These will double (or more) your measurements. It seems silly (though I have done this in the past!), but you can also reduce the number of characters in your images and objects, at least as a test. If not already, export to multiple object tables, so as to spread out the object measurement columns.

Let us know how it goes.
David