ExportToDatabase SQL/CSV bug

Hi CP developers,

I have been trying to speed up my CP processing jobs by running analyses in 4 or 8 core image batches. Once I started this endeavor I found that CP has significant issues by not gracefully waiting until another parallel write process is done before trying to write to the sqlite DB (which could easily be fixed in a latter release). So, I then tried to use the MYSQL/CSV option in the ExportToDatabase module. It fixes the headless CP write to sqlite bug but creates another issue. It appears to me that the setup.SQL file contains only the last image batch to finish not all batches that were processed by CP. It seems that this bug could easily be fixed in the code base by updating the setup.SQL file after completing each of the batches. Is that what is supposed to occur? I can manually add each image batch range but as you can see from the multicore batch script that I recently posted I would like to streamline the analysis process so it does not take as much configuration time to get the data in a format that the classifier can use. If anyone has suggestions about helping me work around this bug I would greatly appreciate it. Also, please tell me that the SQL/CSV measurement table option can be saved as a sqlite.db file within CPA so that it does not take another 2-5 minutes to convert the SQL measurement files to sqlite in the classifier when the SQL/CSV data is opened at a latter date.

Buggy SETUP.SQL only has one image batch entry:

LOAD DATA LOCAL INFILE 'SQL_3262_Distant_MMR_1_28_Image.CSV’ REPLACE INTO TABLE MMR_3262_Distant_Per_Image
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY ‘"’ ESCAPED BY ‘\’;

LOAD DATA LOCAL INFILE 'SQL_3262_Distant_MMR_1_28_Object.CSV’ REPLACE INTO TABLE MMR_3262_Distant_Per_Object
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY ‘"’ ESCAPED BY ‘\’;

Desired SETUP.SQL with four image batch entries:

LOAD DATA LOCAL INFILE 'SQL_3262_Distant_MMR_1_28_Image.CSV’ REPLACE INTO TABLE MMR_3262_Distant_Per_Image
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY ‘"’ ESCAPED BY ‘\’;

LOAD DATA LOCAL INFILE 'SQL_3262_Distant_MMR_1_28_Object.CSV’ REPLACE INTO TABLE MMR_3262_Distant_Per_Object
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY ‘"’ ESCAPED BY ‘\’;

LOAD DATA LOCAL INFILE 'SQL_3262_Distant_MMR_29_52_Image.CSV’ REPLACE INTO TABLE MMR_3262_Distant_Per_Image
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY ‘"’ ESCAPED BY ‘\’;

LOAD DATA LOCAL INFILE 'SQL_3262_Distant_MMR_29_52_Object.CSV’ REPLACE INTO TABLE MMR_3262_Distant_Per_Object
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY ‘"’ ESCAPED BY ‘\’;

LOAD DATA LOCAL INFILE 'SQL_3262_Distant_MMR_53_76_Image.CSV’ REPLACE INTO TABLE MMR_3262_Distant_Per_Image
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY ‘"’ ESCAPED BY ‘\’;

LOAD DATA LOCAL INFILE 'SQL_3262_Distant_MMR_53_76_Object.CSV’ REPLACE INTO TABLE MMR_3262_Distant_Per_Object
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY ‘"’ ESCAPED BY ‘\’;

LOAD DATA LOCAL INFILE 'SQL_3262_Distant_MMR_77_100_Image.CSV’ REPLACE INTO TABLE MMR_3262_Distant_Per_Image
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY ‘"’ ESCAPED BY ‘\’;

LOAD DATA LOCAL INFILE 'SQL_3262_Distant_MMR_77_100_Object.CSV’ REPLACE INTO TABLE MMR_3262_Distant_Per_Object
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY ‘"’ ESCAPED BY ‘\’;

I wrote a script to merge all of the object and image measurement tables into one image and one object CSV file but it does not appear to work correctly when CPA classifer tries to convert them to a sqlite format. The error message is listed below. I think this error would ideally be fixed in the ExportToDatabase module so that all image batches are appended to the SETUP.SQL file correctly.

An error occurred in the program:
ProgrammingError: Incorrect number of bindings supplied. The current statement uses 440, and there are 1 supplied.

Traceback (most recent call last):
File “classifier.pyc”, line 573, in OnFetch
File “datamodel.pyc”, line 125, in GetRandomObjects
File “datamodel.pyc”, line 91, in _if_empty_populate
File “datamodel.pyc”, line 59, in PopulateModel
File “dbconnect.pyc”, line 616, in GetAllImageKeys
File “dbconnect.pyc”, line 64, in fn
File “dbconnect.pyc”, line 479, in execute
File “dbconnect.pyc”, line 432, in connect

Our solution thus far has been to parse the .sql file, acquire a list of all the image/object CSV’s and write a new .sql file that contains the db table definitions and the LOAD DATA statements for all the CSVs. You approach seems similar to this.

You’ve probably noticed by now that if CPA is pointed to a pair of per-image/per-object CSVs, it will create an SQLite db for use in a temp directory. Whether this temp folder is cleaned out depends on your system, but you can always grab the db and place it in a more permanent location.
-Mark

Mark pointed me to this error that you’re seeing and I have to confess that I’ve never seen anything like it before.

Looking through the code though, it appears that you somehow lost the connection to SQLite while populating the database. What it does is retries the previous query, and if it fails again, it tries to reconnect.

I can’t say that I’ve ever seen SQLite disconnect before, so this is a bit strange to me… maybe there’s some strangeness in your CSV file that’s blowing it up. What I would suggest is to modify your setup.SQL file instead of attempting to merge all the csv files into one. CPA will read all the files individually if this file is properly written (I’ll have Mark send a valid example.) Your example above might work just fine. Give it a shot… if it works, then you could write a script to do this instead (I think Mark is going to send your ours).

Lastly, I wanted to check if you’re writing thumbnails to the csv files… I’m not even sure if CP will let you, but if it does, it probably does it wrong and will blow things up.

Derek, I can certainly send this along. however, it’s written in Python. Are you fluent in this language?

Aside from that, our procedure which sketched out above, is similar to yours with the exception that rather than concatenating the image/obj files together, it writes a series of LOAD DATA statements to the setup.sql file, one for each file. This has worked well for us for a while now, so I wonder if modifying your script accordingly might work things out?
-Mark