Bulk import bug

Hello OMERO team,

I have stumbled upon what seems like a really weird bug. It has to do with targeting specific Project/Dataset pairs when two Datasets have the same name, and the resulting (unexpected) behavior of the import. I am going to do my best to try to explain it here. I am going to use fake data for clarity, but am happy to share the exact values, Blitz log, etc if you would find it helpful.

Suppose I have two projects with some datasets structured as follows:

FirstProject (ID: 101)
—DatasetFOO (ID: 201)
—DatasetBAR (ID: 202)
SecondProject (ID: 102)
—DatasetFOO (ID: 203)
—DatasetBAZ (ID: 204)

Note that there are two different datasets called “DatasetFOO”, one in each project.

Now I try to use bulk import, with a files.tsv structure like this:

Project:name:FirstProject/Dataset:name:DatasetFOO file1.tif
Project:name:SecondProject/Dataset:name:DatasetFOO file2.tif

I expect to see the following:

FirstProject (ID: 101)
—DatasetFOO (ID: 201)
------ file1.tif
—DatasetBAR (ID: 202)

SecondProject (ID: 102)
—DatasetFOO (ID: 203)
------ file2.tif
—DatasetBAZ (ID: 204)

But what I actually see is this:

FirstProject (ID: 101)
—DatasetFOO (ID: 201)
—DatasetFOO (ID: 203)
------file1.tif
------file2.tif
—DatasetBAR (ID: 202)
SecondProject (ID: 102)
—DatasetFOO (ID: 203)
------file1.tif
------file2.tif
—DatasetBAZ (ID: 204)

In other words, both files go into the same DatasetFOO (in this case ID: 203), even though they differ with respect to Project name. Weirdly, DatasetFOO (203) gets linked to both Projects.

Is this the expected behavior of bulk import? Would it be possible to change so that the search for Dataset by name is limited by the given Project?

Indeed, this workflow leads to a rather unexpected result. We’ve opened a GitHub issue on the respective repository for it: https://github.com/ome/omero-blitz/issues/91
Regards,
Dominik

2 Likes