Unwanted random image analysis

Hi,

I’ve set up a pipeline that works perfect for the type of image analysis that I want to do, which is part of a HTS ( or HCS).

But, a very simple procedure goes rather wrong, and this is the order that CP choses to retrieve the images from the subfolders that I put them in.

For instance, if I have subfolders named as Plate0, Plate1, Plate2,…Plate10, 11, … 20, and so on, CP will open the folders in the following order: 0,1,10,11,12,13,14,15,16,17,18,19,2,20,21…29,3,30 and so on (instead of 0,1,2,3,4,5,6…).

So as you can see, this doesn’t make much sense (at least to me). Is there a way to solve this problem?

Additionally, I cannot find the way to have a column indicating from which subfolder the image for the analysis was taken from once the data is exported to a spreadsheet, which would somehow solve the first problem.

It would be great if you could help me on this one!

Thank you very much in advance!

And congratulations for the whole Cell Profiler project, it is simply awesome!

The reason you are seeing this is that computers typically sort numbers in lexical order rather than numeric order (see here for more details).

The solution is to insure that your folders are in numeric order from the computer standpoint. The easiest way to to this is to left-pad the digits with zeros, e.g., 00, 01, 02,…,09,10,11,…

The pathname is exported in the spreadsheet under a column with the name Pathname_; the subfolder is contained therein. Another approach is to use regular expressions in LoadImages in the following way:

  • Under “Extract metadata from where?” select “Path”
  • The new setting that appears underneath is capable of extracting the subfolder name. If you want just the number, change it to .*\\/](?P<Date>.*)\\/]Plate(?P<Run>.*)$

This information will be included as a column named Metadata_Run in the exported spreadsheet.

Regards,
-Mark**

Hi Mark,

Thank you very much for your reply.

Unfortunately given that the naming of the plates won’t be only numerical I cannot rely on only changing the original names by adding zeros. Although it might work at a small scale experimental setting.

Regarding the Pathname, in my hands it doesn’t work like you describe it. The result is a column indicating the name of the primary folder where the subfolders are, but not the names of these subfolders, which is what I really need.

I also tried the option of extracting metadata, which gave me the same results, so no luck there either.

Most probably I am making a silly mistake, but I cannot see it. This is the line I am trying to use to get the subfolder name in a column (chosing "Extracting metadata from where? > Path):

And this is the “Path” where the folder (CP test) and subfolders (Plate1) are:

In the “Regular expression editor” I don’t see any error. In the ExportToSpreadsheet module I select Metadata > Date > Plate > Subfolder

And in the end the headers of the Spreadsheet don’t match with the folder names, and it also doesn’t go as far as the folders containing the images.

Metadata_Date Metadata_Plate Metadata_Subfolder PathName_Worms_RAW Jordi CP test 280512 D:\Jordi\280512\CP test

As you can see it doesn’t match and I cannot find a way to get it to work…

I hope you can indicate me how to solve this problem because otherwise I find it very hard to set up a batch analysis, which is crucial for me.

Thank you very much!

Best regards,

Jordi

Actually this is the same problem that I am having:

viewtopic.php?f=18&t=1388&p=5529&hilit=extract+metadata#p5529

Is there a stable and more “user friendly” solution?

Thanks!

Jordi

We currently working on a new user interface that will handle image loading and metadata extraction in a more robust way. However, it is still in devleopment and will be for a few more months. Stay tuned…

Regards,
-Mark

[quote=“puigvert”]And in the end the headers of the Spreadsheet don’t match with the folder names, and it also doesn’t go as far as the folders containing the images.
I hope you can indicate me how to solve this problem because otherwise I find it very hard to set up a batch analysis, which is crucial for me.[/quote]

Is the option the David posted feasible for you? Metadata extraction from path

Granted, this approach would only work if the input subfolders are all nested to the same depth for all images, but it might be in your best interest to do that anyway if possible…
-Mark

Hi Mark,

I’ve the approach that David posted but it doesn’t work for me. The first problem is the Text-Regular Expressions in LoadImages. All my images have different names (e.g. Well A1, Well A2, etc), with only part of the name similar (e.g. GFP), so I am not sure if I can use this trick here.

And then again when I try to write a path using RegExp, somehow I cannot get it to find the final folders. This is what I am trying:

.\/](?P.)\/](?P.)\/](?P.)\/](?P.*)$

But when executing the pipeline, it doesn’t go pass the first Subfolder.

Most likely I am doing something wrong, but I don’t know what… :smile:

Thanks!

Cheers,

Jordi

[quote=“puigvert”]This is what I am trying:

.\/](?P.)\/](?P.)\/](?P.)\/](?P.*)$[/quote]

I think part of the problem is that the “." regexp pattern matches any character, matching not just the folder names but also the slashes that separate them. Since you are using ".” for all the folders, the regexp is not specific enough to narrow down the part of the path you want.

To fix this, I think you will need to use “(?P[0-9]{6})” in order to specify the date as a 6-digit number. That way, assuming there are no other 6-digit numbered folders, the regexp should match Subfolder1 and Subfolder2 correctly based on the Date folder name.

Regards,
-Mark

Hi Mark,

Once again, thanks for the reply.

Using the more concrete line for the Date worked, but unfortunately the “Subfolder 2” is still not recognized. And if I try to add yet an other level, it just doesn’t work, it indicates that it doesn’t match. Should I also modify somehow the Regexp corresponding to finding the subfolders where the images are located? I tried quite some combinations, but I haven’t managed to make it work.

By the way, regarding the SaveImages module, I am currently using it and it works fine BUT :wink: Even if I check the box at “Create subfolder in the Output folder”, which would be ideal for me because it would create as many sub-folders as analyzed “Plates”, it doesn’t do it. And instead it would overwrite the file, unless I chose the “Sequential numbers” option to construct the filename, which is what I am currently doing, but it is not perfect for my batch analysis.

Once again, thank you very much!

Cheers,

Jordi

I’m a bit perplexed as to why the regexp is working only in some cases but not in all, unless (a) the folder nomenclature is different from folder to folder to folder; or (b) it is trying to analyze images from folders above Subfolder2. In either case, I would expect the regexp to fail since it no longer follows the pattern given. Could you post some actual paths that you are using, including an example of one where it fails?

I would suggest using the metadata substitution approach in CellProfiler; see Help > Using CellProfiler > Using Metadata in CellProfiler, under the “Use of metadata-specific module settings” section. In brief, provided you are successfully able to extract the folder metadata, you can select “Default Output sub-folder” as the location in SaveImages, then specify a metadata tag in the sub-folder pathname. This folder will be named with the actuall metadata text for the current cycle, and created if it doesn’t exist already.

Regards,
-Mark

Hi Mark,

Thanks for your suggestions, I will work on them as soon as I can.

For now I am having a bit of a timing problem…I started the analysis by configuring the option of running multiple pipelines, everything is set properly and the pipelines have run one after the other, but the problem is that at the beginning the running time was 30 hours, and after 3 days of continuous analysis, it is still not done and it indicates 76 hours instead…

I don’t have the chance to use clustered computers, but I am using a rather fast computer.

Is there any known reason why CP would slow dow over time?

Thanks!

Cheers,

Jordi

Hi Jordi,

Re: analysis slowdown - I assume that the windows are closed during the run (i.e., module “eyes” are closed)? Did the run ever finish?

The progress time shown is an estimate; it will vary depending on the actual image features encountered, e.g, if you are doing object measurements, an image with lots of objects will take longer than one with a few. If you are exporting the measurements using ExportToSpreadsheet, there is a column in the per-image table indicating the execution times for each module. You could take a look at those and see if there’s anything unusual.

Regards,
-Mark