Cellprofiller metadata import issue

Hi,

I am getting an unexpected problem, I say this because when I ran my pipeline with version 4.0.4 it worked well; now with new images I built a new metadata file (.csv) in the same way as before, and installed the new version 4.0.7. The issue is that now I can’t extract the metadata from the file. I only have two variables in the file file_name to match the FileLocation, and biological_group to make groups in the block Groups.
I spent two hours looking for typos, and I couldn’t find any, also ran as Admin did not solve the issue, and finally I thought on looking the command line and it says:

 Exception in thread "Thread-0" java.io.IOException: Key, "?file_name", is missing from CSV header.at org.cellprofiler.imageset.ImportedMetadataExtractor.<init>(ImportedMetadataExtractor.java:111)

What is odd to me is that the variables are detected (since I can select it in Metadata block), but it is not read: says “nan” in Groups. So I change the grouping from a name (“control”, or “treatment”) to a number (1 or 2), and the problem persisted. But what is even stranger is that the file_name is matched with FileLocation, without issue, which, to me, means that the file in being read and the observation in file_name are there and without typos.

What do you think that I might be doing wrong here?

Challenges

  • What stops you from proceeding?
    The Metadata block is not reading the metadata (as before) from a .csv
  • What have you tried already?
    It worked in version 4.04,now in 4.0.7 tested the same pipeline and the same files and it did not work
  • Have you found any related forum topics? If so, cross-link them.
    No, at least not yet.

Thank you in advance.

Cheers,
Leonardo

Hi @lsilva.m,

If you can share your metadata csv, pipeline, and a sample image set, we can take a look to see if we can troubleshoot the error. Sounds frustrating, sorry about that!

Pearl

Hi @pearl-ryder

thank you for your help.

ok, here it goes (I guess that there is everything in there that you need).

I am running on Windows 19042.804, and meanwhile I installed the new release 4.1.3

sample_metadata_issue_CellProfiller.zip (2.6 MB)

I can’t be certain, because I’m not sitting at your computer (and so I can’t load the files with the exact same path that you have, which means I HAVE to edit your CSV in order to load it on my machine, which means I can’t do a literal test), but I suspect the issue may be a missing :; if you compare your cpproj on the left to your CSV on the right, that is certainly a difference.


First thing’s first though- Do you NEED to be using Groups, and is using Groups the only reason you are loading this metadata? I don’t see anything obvious in your pipeline that requires Groups- nothing that’s saved on only the last cycle, for example, no object tracking, etc. It’s possible you are making yourself a bunch of extra work that you don’t need.

One thing you could do, if you DO decide you need it, in order to make sure your CSV is always correct is the following (you could do this every time you have a new data set, in theory):

  1. Load your image set in Images
  2. Turn Metadata and Groups off, and make sure NamesAndTypes is giving you the sets you expect
  3. Use CellProfiler’s File → Export → ImageSetListing to export a CSV that has all of your images, already grouped into sets
  4. Add the “URL” columns to the sheet that is already set up the way you like it, and/or
    OR
    4A) Just add your metadata columns to the sheet CellProfiler exported, and then use LoadData in your pipeline instead of the 4 input modules.

image

hi @bcimini, thank you for your answer.

I want to keep the groups to be associated with the observations in the final table, because I use that file to analyze the data with R – that is the main reason. To me it looks more correct to associated each object to its image and consequently to its group, just to avoid error in later processing. Added the groups in R could be a source of error, because if for some reason some lines in the table are deleted I will lose the link to the group and thus add the risk of analyzing the data with wrong grouping.

The typo you pointed out was added when I was putting together the files to send in here, the original file was correct with file:///. And, I found my error: there was a change in the path of the files. Which leads me to another question/problem: when the path as a folder with spaces I had issues loading the files, is there any reported issue on this matter? in my specific case the path is file:///.../OneDrive - foo foo2/

thank you

Totally understand the desire to have the metadata associated as early as possible! I was more just wondering whether you needed it specifically for CellProfiler’s Groups module, which I don’t think your pipeline needs, but I can definitely see why you want to include this. For what it is worth you could use a similar table in R to do a join, without worrying about doing it in CellProfiler (and without needing to worry about missing lines etc as long as your join columns were created thoughtfully), but it’s definitely nice to have the whole thing together right from the start!

I’m not aware of any open issue on trouble if there are spaces and/or dashes in the folder name (there was one in CellProfiler 3.0.0, but none I’m aware of since); can you clarify, is it an issue loading the image files, loading the CSV, associating the values from the CSV with the extracted metadata, etc?

exactly, I considered that option, but I think it is more prone to errors once other colleagues use the file; in the end seems like an easy task for CellProfiller.

I test now with the new version and the problem seems to be fixed. But it was in the loading of the files (.ziv), it gave me an error saying that the files could not be found in the path — the path had spaces in it, in my case I was using the OneDrive of my institution and its name has spaces so it looked like this C:/.../OneDrive - Universidade do Algarve/....
Thankfully now its works without issues.

Thank you for your help.