Metadata extraction from path

Hello

given the structure
/root/project/batch_1/plate_1/measurement_1/file_1.tif
/root/project/batch_1/plate_1/measurement_1/file_2.tif

/root/project/batch_1/plate_2/measurement_1/file_1.tif

I observed, that when the input path to cell profiler is for instance
/root/project/batch_1/
and I want to extract metadata from both (path and file)
CP2 passes the input path (/root/project/batch_1/) and the file name (file_1.tif).

My question is how can I extract the information in between the path and the file name (…/plate_1/measurement_1/…)?
The current workaround is to process each plate induvidually, but I’m looking for a solution, that spares us to separate the image folders.

Felix

Hi Felix,

If you are using LoadImages to load the images, you can extract this information using the metadata extraction functionality there.

  • Select “Path” under “Extract metadata from where?”

  • Adjust the regular expression in “Type the regular expression…” to obtain the desired metadata. If you are not familiar with them, regular expressions are essentially a means for text pattern matching; the help next to this setting has some useful information on how to use them.

  • Also, by clicking the magnifying glass tool, you can bring up a window which displays a sample pathname as well as the regular expression, color-coded to highlight the matching text. You can alter this expression to suit your needs and save it to the pipeline.

In your case, I believe the following regular expression would do the trick of extracting the plate and measurement to the “Plate” and “Measurement” metadata, respectively:

Regards
-Mark

I think my explanations where not clear enough. So the metadata extraction works fine. We use it a lot are very happy about this feature. But the problem lies in a detail. Let’s assume the structure:
/root/project/batch_1/plate_1/measurement_1/file_1.tif
/root/project/batch_1/plate_1/measurement_1/file_2.tif

/root/project/batch_1/plate_2/measurement_1/file_1.tif

When I use the loadImages module and tell it to analyze sub-directories we have the following situation in the first iteration:
CP input directory: "/root/project/batch_1/"
Sub-directories: "/plate_1/measurement/"
File name: “file_1.tif”

Now I observed that the sub-directories are not taken into account when extracting metatdata from the path nor in the extraction from the file name (which would be odd though). So the input for the path-regexp is:
"/root/project/batch_1/“
But I would need to extract information from the sub-directories too. So I would like the input to be the hole path without the file-name, like:
”/root/project/batch_1/plate_1/measurement_1"

Now the first question would be if you are aware about this?
And, is there a way to include the sub-directories in the regexp-input?
B.t.w wouldn’t it be easier to just take the entire path to the file and use on single regexp for the metadata extraction?

Cheers
Felix

I’m not able to replicate this behavior using the ExampleTrackObjects pipeline + images from our examples which is meant to demonstrate just this behavior. For example, the Default Input Folder for the images for this example on my machine is:

and when I use the regexp tool for the path, I get the following for the test text:

C:\Trunk\ExampleImages\ExampleTrackObjects\Sequence1 (Sequence1 is the first sub-folder containing the images)
This is the behavior that you would expect, correct?

We have found that splitting the path and file regexp apart is more robust for non-uniform folder structures and easier for people to understand if they only want to do one of the two regexps.

Regards,
-Mark

Taking another look, I now see what you are referring to. If the sub-folders are > 1 level deep, it doesn’t show them. A bug report has now been filed for this.
-Mark

Hi Felix,

I have gotten this to work, though it is not “user-friendly” yet. Note that the magnifying glass icon helper-tool will not show the correct extraction with this root+subfolder path structure yet, but don’t let that dissuade you.

Basically, In LoadImages:

(1) Use “Text-Regular Expressions” as the File Selection Method (not sure if this is necessary, but it works for me)
(2) The “Subfolder path” will look throughout your whole path (root CP Input dir + so-called ‘subfolders’).

So try something like this for your metadata:

Keep your filename extraction Regexp pattern the same as whatever you have it now (assuming it works!)

For the subfolder pattern, try something like:
.\/](?P.)\/]batch_(?P.)\/]plate_(?P.)\/]measurement_(?P.*).tif$

-David