Crash on Computing cross validation

Hi CP developers,

I am getting an error “typeError: 'NoneType” object is not iterable. It seems from the error message that the cross validation computation is trying to iterate a variable that is not the correct integer or float datatype. I have not seen this specific error before but the cross validatation function does complain about a missing logo file quite often. Has anyone seen this bug and how do I fix it? Thanks!

Hrm, just guessing here, but it may be that there are NULL values in one or more of your measurement columns. I’m not sure if the cross-validation is robust against them.

CPA has a SQL query maker tool under the “advanced” menu that’ll let you check for this. If you’re not familiar with SQL, then you can send me your classifier rules and the name of your per-object table and I should be able to give you a query to try.

Hi Adam,

I am experiencing this cross validation crash and the previously described “IndexError: list index out of range” with the new version of CPA. Why is this happening since I used the same version of CPA to correctly classify a PR stain just last week?

Where is the “IndexError: …” previously described? I don’t know what you’re referring to.

Classification and cross-validation follow two different code paths, one could throw an error while the other works fine.

In my experience, if CPA is working fine, and then we use it to perform the same task on new data, I first look at the data.

I was referring to a past post titled “IndexError in CPA”. I have this issue it seems. The exact error message is:

An error occurred in the program:
IndexError: list index out of range

Traceback (most recent call last):
File “classifier.pyc”, line 1035, in ScoreAll
File “fastgentleboosting.pyc”, line 151, in PerImageCounts
File “multiclasssql.pyc”, line 214, in PerImageCounts
File “multiclasssql.pyc”, line 181, in do_by_steps
File “multiclasssql.pyc”, line 164, in where_clauses

OK, after some searching I found the post you’re referring to and it looks like the problem was due to NULL values in their database.

This brings us back to my first question which is: Where are the NULL values in your database? If we know where they are, then we can begin to understand the problem.

Hi Adam,

I am still getting this error with one of the image sets using an identical pipeline. Could you look at the output data table to see why I get a IndexError: list index out of range if I post it? Could I maybe have too many primary object per image? I have been able to score individual images but the check progress and score all buttons lead to a crash.

Why don’t you send me your pipeline instead. I specifically want to see your ExportToDatabase settings.
CPA isn’t capable of handling multiple primary objects, so the issue is most likely that the database that’s being generated isn’t compatible with CPA.

Hi Adam,

Here is the pipeline that is giving me errors. I can run the pipeline with one set of images but not another. The images have been corrected for uneven illumination via an illumination transfer function acquired from a Fluor-Ref slide. I recently noticed that some of the pixels on the CCD are stuck so maybe the divide operation on the Illumapply module yields a NaN field for pixels that are stuck in the illumination correction image. This is my WAG. I could post the image if you think it would help.
7-26-11_10x_PR_pipe sqlite output.cp (7.78 KB)

Adam,

Here is the illumination calculate and apply pipe that is run before the measurement pipe.
IllumCorrSaveImagePipe.cp (6.49 KB)

Ah, looking at your pipeline now, I see that you aren’t identifying more than one primary object type, so that looks pretty good. When you mentioned having too many primary objects before, I misunderstood and thought you meant you had multiple kinds of primary objects.

As for the theory on the illumination correction producing nans, I don’t think that’s it, but if it comes to that I’ll have one of our image assay developers help you out. For now I just want to find out what’s up with your db. If it’s small, then you could just send me the sqlite file and I could look at it myself.

Again, you can use CPA to inspect your tables by using the SQL query maker tool under the “advanced” menu or by simply opening the TableViewer tool and going file>load table from database. You can then scroll down your measurement columns and see if there are NULL values and where they are. It shouldn’t be hard since it looks like you’re only measuring intensity. If your database is very large, then you could just run this query for each of your measurement columns:

If there’s anything that’s not clear there, let me know and I’ll try to walk you through.

I do not see null values in the table using the table viewer. However, I do see a lot of values in scientific notattion such as 1E-06. Could CPA be converting scientific notation fields to NaNs since it is expecting the database to be populated with floating point fields? I attached the measurement table to the post for you to review.
my_table_containing_null_errors.rar (3.2 MB)

That’s a possibility. Try running these commands in CPA’s SQL Query Maker:

SELECT IGFBP2Cells_Intensity_LowerQuartileIntensity_OrigBlue FROM Expt_PRtiter_per_object WHERE ImageNumber=1 AND ObjectNumber=14;
SELECT IGFBP2Cells_Intensity_LowerQuartileIntensity_OrigBlue / 2 FROM Expt_PRtiter_per_object WHERE ImageNumber=1 AND ObjectNumber=14;

At least this way we’ll know what your database thinks they are. Still, it’s odd because if the values were accidentally stored as strings or something then classifier would automatically ignore them.

Also, can you send me as much output as you can from CPA up until the crash?

“SELECT IGFBP2Cells_Intensity_LowerQuartileIntensity_OrigBlue FROM Expt_IGFBP2Per_Object WHERE ImageNumber=1 AND ObjectNumber=14;” is the sql statement that I used to select the field since the Expt_PRtiter_per_object was not in the database. It returned 1.1444e-05 as a query result. Does that confirm that CPA is expecting float values but returns fields in scientific notation? Could values in scientific notation be rounded to 0 since it appears to be a digitization error if this is the problem?

Hi Adam,

I am not sure if this bug is related to the classifier bug but I cannot select a cell’s datapoint in the scatterplot window to bring up the corresponding boxed image of the selected cell. I click on the star next to the cell’s data table but it just brings up the image that the cell came from but without marking the cell’s location in the image. Do you think this is related to the classifier bug? I can get it to mark cells correctly with the PRpipeline/image pair that does not crash so it probably is related to this bug.

Try:

 SELECT IGFBP2Cells_Intensity_LowerQuartileIntensity_OrigBlue / 2 FROM Expt_IGFBP2Per_Object WHERE ImageNumber=1 AND ObjectNumber=14;

… I want to know if it can do math on the column… if it can’t then it’s probably a string.

When you start up CPA, it should tell you what the location of the sqlite db is that it’s using. Since it looks like your db is small, you could just send that to me and I could take a look at it myself.

For your second question:

Did you click the lasso tool at the top of the scatterplot frame? This is a fairly recent addition.

How did you get to the data table? Did you just launch it from CPA and then open the per-object table from there? I attached an image of what I get when I double click a ‘*’ cell in the per-object datatable, you can see that there’s a little white box drawn where the center of the selected object is. If your images are large, it may be that you have to scroll to see the box. Careful not to click on the image though, or it will select the nearest cell to where you click.

Oh, re-reading your reply it sounds like you’re actually looking at the enrichments table… is that right? If that’s the case then each row is going to be per-image (default) anyway.

I don’t think so, unless I misunderstood something.

The SQL statement returns a value of 0.0 when the LowerQuartileIntensity_OrigBlue field is divided by 2. So, it appears that the field can be operated on as a floating pt value. The scatterplot problem was apparently an issue with my eyes not being able to see the white box. Sorry about any confusion. I included the specific error and trackback message listed in the classifier window when cross validation testing does not work. I also attached the measurement database to see if you can find the field causing the problems.

An error occurred in the program:
TypeError: ‘NoneType’ object is not iterable

Traceback (most recent call last):
File “classifier.pyc”, line 843, in OnCheckProgress
File “fastgentleboosting.pyc”, line 62, in CheckProgress
IGFBP2_2class.rar (3.77 MB)

Are you sure that’s the whole traceback? fastgentleboosting line 62 in by itself wouldn’t cause the error that you pasted.

Also, I looked at your db and all the types look fine. Somewhat concerning however is the fact that there are only two images… this is almost definitely a case that we’ve never seen before. I’ll try copying our example database and deleting all but 2 rows from it to see how it fares.

I have removed the illumination correction part of the image analysis and the number of images processed to try to eliminate sources of uncertainty in the troubleshooting process. I will try including more images to test if it is a special case with only two images.

Please double check the traceback too. Not knowing exactly what code is generating the error is making it really difficult to hypothesize why it’s occurring.