Error loading training set

Hi,

I made a pipeline using Cellprofiler. I save cell objects in a mysql database and then I use cellprofiler analyst.
Everything works find. I can save training set but then when I want to reopen them in a new session I have the following error.

Does anyone have an idea about it?
I can open the training in a csv. For information I have 4 classes.

Thanks,
Solène

Original exception was:
Traceback (most recent call last):
File “CellProfiler-Analyst.py”, line 258, in launch_classifier
classifier = Classifier(parent=self, properties=self.properties)
File “/export/home1/users/dfccdn/weill/testCellpro/CellProfiler-Analyst/cpa/classifier.py”, line 379, in init
self.LoadTrainingSet(p.training_set)
File “/export/home1/users/dfccdn/weill/testCellpro/CellProfiler-Analyst/cpa/classifier.py”, line 1160, in LoadTrainingSet
self.trainingSet = TrainingSet(p, filename, labels_only=False)
File “/export/home1/users/dfccdn/weill/testCellpro/CellProfiler-Analyst/cpa/trainingset.py”, line 29, in init
self.Load(filename, labels_only=labels_only)
File “/export/home1/users/dfccdn/weill/testCellpro/CellProfiler-Analyst/cpa/trainingset.py”, line 134, in Load
self.Create(labelDict.keys(), labelDict.values(), labels_only=labels_only)
File “/export/home1/users/dfccdn/weill/testCellpro/CellProfiler-Analyst/cpa/trainingset.py”, line 91, in Create
self.values += [get_data(k) for k in keyList]
File “/export/home1/users/dfccdn/weill/testCellpro/CellProfiler-Analyst/cpa/trainingset.py”, line 86, in get_data
d = self.cache.get_object_data(k)
File “/export/home1/users/dfccdn/weill/testCellpro/CellProfiler-Analyst/cpa/trainingset.py”, line 324, in get_object_data
self.data[key] = db.GetCellData(key)
File “/export/home1/users/dfccdn/weill/testCellpro/CellProfiler-Analyst/cpa/dbconnect.py”, line 1143, in GetCellData
data = self.execute(query, silent=True)
File “/export/home1/users/dfccdn/weill/testCellpro/CellProfiler-Analyst/cpa/dbconnect.py”, line 64, in fn
return f(db, *args, **kwargs)
File “/export/home1/users/dfccdn/weill/testCellpro/CellProfiler-Analyst/cpa/dbconnect.py”, line 529, in execute
‘\nSecond exception was: %s’%(connID, query, e, e2))
cpa.dbconnect.DBException: ERROR: Database query failed for connection “MainThread” and failed to reconnect
Query was: “SELECT * FROM Per_Object WHERE ()”
First exception was: (1064, “You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘)’ at line 1”)
Second exception was: ERROR: Database query failed for connection “MainThread”
Query was: “SELECT * FROM Per_Object WHERE ()”
Exception was: (1064, “You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘)’ at line 1”)

Hi,

I am having the same problem today! I cannot open my training sets in CPA and get a very similar error. I can also open the cvs files and see the info. I hope it’s ok I am posting here but I assume we are dealing with the same issue. I only have 2 classes.

Any help is appreciated.

Thanks,
Katherin

An error occurred in the program:
TypeError: ‘NoneType’ object has no attribute ‘getitem

Traceback (most recent call last):
File “/Applications/CellProfiler Analyst.app/Contents/Resources/lib/python2.7/cpa/classifier.py”, line 1123, in OnLoadTrainingSet
self.LoadTrainingSetCSV(filename)
File “/Applications/CellProfiler Analyst.app/Contents/Resources/lib/python2.7/cpa/classifier.py”, line 1178, in LoadTrainingSetCSV
self.trainingSet = TrainingSet(p, filename, labels_only=False, csv=True)
File “/Applications/CellProfiler Analyst.app/Contents/Resources/lib/python2.7/cpa/trainingset.py”, line 27, in init
self.LoadCSV(filename, labels_only=labels_only)
File “/Applications/CellProfiler Analyst.app/Contents/Resources/lib/python2.7/cpa/trainingset.py”, line 157, in LoadCSV
self.Create(labelDict.keys(), labelDict.values(), labels_only=labels_only)
File “/Applications/CellProfiler Analyst.app/Contents/Resources/lib/python2.7/cpa/trainingset.py”, line 91, in Create
self.values += [get_data(k) for k in keyList]
File “/Applications/CellProfiler Analyst.app/Contents/Resources/lib/python2.7/cpa/trainingset.py”, line 86, in get_data
d = self.cache.get_object_data(k)
File “/Applications/CellProfiler Analyst.app/Contents/Resources/lib/python2.7/cpa/trainingset.py”, line 325, in get_object_data
return self.data[key][self.col_indices]

Hi,
I am trying to figure this out and I have a question,
did you rename your classes ?

Hi Solene,

I just checked my training set to make sure that I did not rename the classes. The cells are listed as “positive” and “negative” as per default. The class table name my properties file uses is BIC_TA, which is defined using the ExportToDatabase tool in CellProfiler. Could this be the culprit? I was successfully using the same training set every day for a week so I am not sure what to troubleshoot. When I start generating the spreadsheets in CellProfiler, I do get this message attached and I select “yes” although I don’t know exactly what it means, in case that matters. Really appreciate the help!

Katherin

13

Hi Katherin,
The message is asking you if it should overwrite your database. So each time you run your pipeline and select yes, it erase everything you have in your database (tables and data inside them) and recreate new tables with the output data.
So if you changed your pipeline at some point during the week, maybe you have objects with different keys and that’s why you have a problem reloading the training set.
Did you tried to load your BIC_Per_Object table in cellprofiler analyst to check the data?

Solène

Hi,

I can load the per_object and per_image data and create various box plots.
Last week, I did change the pipeline in CP in that I filtered images to use only a subset for analysis. Does this affect the image number that is given to each of my images? Since the image number is key to define the training set, maybe CPA cannot find the correct images?
I am currently rerunning the pipeline using all images. The data set is large so it will take a couple of hours for me to test.

Thanks
Katherin

Hi Solene,

Rerunning the pipeline using the full image set fixed the problem so I must have overwritten info in the database last week. Thanks for helping me figure that out.
So now I was able to score all cells and create a table that groups my data by image or well. Is it possible to get a per object table table of how each cell scored or am I confined to these population data? I using time-lapse data and would like to see when cells transition from class one to class two.

Thanks,
Katherin

Katherin,
If you fill in the “Class table” field (either in your CP pipeline or by editing your properties file), it’ll give you per-object classification.

Solène,
Are you trying to open the training set in the experiment in which it was created, or in a new experiment?

I am doing it in the experiment in which it was created. I was able to open it again using a previous version of CPA so I don’t really know what happened

Hmm, that’s really odd! Is there any other debugging information you can give me- does it happen all the time, only in certain situations, etc?

I’ve asked our CPA expert to look in on this thread, hopefully she’ll have some ideas as to what might have gone wrong!

Hi,

No I don’t have anything else for debugging. I needed to change the pipeline so I cannot even reproduce the error but while using cellprofiler I found another unexpected behaviour. I can save a training set with number as class name but I am not able to reopen it because class names need to be type string. And after if I want to score all the object and have the class for each object I have to give it a name (number or string) shorter than 3 characters because in database in the classification table there is two columns with different types in which the same information is saved (class and class_number).

I don’t know if it’s clear and if I am the only one having the problem.

Anyway thanks for asking to your CPA expert.

Solène

Hello everyone ,
Is it possible to use the same training set for two experiments treated by the same pipeline?

thanks