CP3.0: error in ExportToDatabase during startup: too many columns on analysis_Per_Object

I have a pipeline (based on CellPainting) that works fine in CP2.2, but when I try to run it in CP3.0 I get an error message: “Encoundtered unrecoverable error in ExportToDatabase during startup: too many columns on analysis_Per_Object”.

I am using a SQLite database.

Appreciate you locating these errors! Can you post the pipeline (2.2 version and 3.0 version) and images needed to run it?

OK, here are two pipelines and a minimal dataset: http://cb.uu.se/~petter/downloads/CP3Debug/cellpaintinganalysis.zip


The error was in these lines:

Failed to prepare run for module ExportToDatabase
Traceback (most recent call last):
File “/Users/…/CellProfiler/distribution/macos/CellProfiler/cellprofiler/pipeline.py”, line 2085, in prepare_run
if ((not module.prepare_run(workspace)) or
File “/Users/…/CellProfiler/distribution/macos/CellProfiler/cellprofiler/modules/exporttodatabase.py”, line 1836, in prepare_run
self.create_database_tables(self.cursor, workspace)
File “/Users/…/CellProfiler/distribution/macos/CellProfiler/cellprofiler/modules/exporttodatabase.py”, line 2371, in create_database_tables
execute(cursor, statement, return_result=False)
File “/Users/…/CellProfiler/distribution/macos/CellProfiler/cellprofiler/modules/exporttodatabase.py”, line 283, in execute
OperationalError: too many columns on analysis_Per_Object

While waiting for others’ help, you can try to bypass the error by removing one of the scale at “MeasureTexture” . At the moment it measures both scale 3 and 5. Skip either of them solves the issue.


It also worked to skip some of the other measurements.

Reported as a bug here. Thanks Petter!

Are you still working on solving the initial problem and not only giving a friendly warning?

My understanding is that we can’t totally fix the issue since it’s an inherent limitation of sqlite; I can bring it up at our weekly software meeting though to see if we can come up with any better options (choosing which measurements to export or figuring out something else).

How can it be an inherent limitation of sqlite if it works in CP2.2?

How can it be an inherent limitation of sqlite if it works in CP2.2?

Because the issue isn’t really that 3.0 can’t handle as many columns as 2.2, it’s that the 3.0 version of the pipeline makes so many more measurements that you run up against the inherent SQLite column limit.

Your original 2.2 pipeline creates as table that’s ~920 columns wide (and I didn’t have the metadata CSV loaded, so probably 950 when all is said and done). There are 420 texture columns in total, so about 210 from each scale.

In 2.2 you were only measuring texture at one set of angles, an option that you don’t have in 3.0- 3.0 measures all of them. The 3.0 version of your pipeline with only one texture scale would be ~1350 columns wide with the metadata included, 780 of which are texture. So a 3.0 pipeline with 2 texture scales puts us at ~2130 columns, which is above the magic limit of 2000.

The good news is that since these 2130 columns are coming from 3 separate objects, you can use the “Single object view” rather than the “Single object table” mode in ExportToDatabase- from the help:

Single object view creates a single database view to contain the object measurements. A view is a virtual database table which can be used to package together multiple per-object tables into a single structure that is accessed just like a regular table. Choose Single object view if you want to combine multiple objects but using Single object table would produce a table that hits the database size limitations.

I do agree though that it’s likely other people will run into this (or might start running into issues if Texture gives them more than 2000 measurements for any individual object) so I’ll bring up with the team at the software meeting today based on those numbers whether or not we want to revisit the idea of going back to allowing you to choose angles and/or if we want to make major changes to how the Export modules work (which we do have planned for the long run, just not the short run).

OK, thanks for the explanation of the differences between 2.2 and 3.0.

I would also assume that it is rather likely that other people also will run into this type of problem.

We have been working with ExportToDatabase a lot lately and have a series of improvements in mind. I have opened this issue to try to collect some feedback and see if anyone else has any particular concerns they would like to see addressed:

Specifically relating to this thread, while it may appear practical this huge number of columns is neither practically scalable or beneficial from a usability perspective when using a database. We have run into tons of problems with column counts. As pipeline complexity increases this is only going to get worse to the point that even these very flexible database engines such as MySQL and SQLite won’t be able to store analytical outputs even when the per-object table strategy is used. This is already the case if you make extensive use of relationships.