How to open the bulk annotation file

Hi Dear community,

I downloaded a bulk annotation file from idr website, wondering how could I open it?
it seems not a text file, hdf5 format? If there is any example script in Python would be perfect. Thank you.

Added the IDR tag to place it in that forum.

Thank you!

I tried some with h5py package and seems I can get data out from the file:

my follow up question is say if I have this gene id: ENSG00000115977, how could I get the images and thumbnails associated with this gene id limited to this study? Thanks a lot in advance.

>>> import h5py
>>> 
>>> hf = h5py.File('bulk_annotations', 'r')
>>> a = hf.get('/OME')
>>> cd = np.array(a.get('Measurements'))
>>> cd.shape
(26880,)
>>> cd[0:10]
array([(2697, b'305', b'1', 605030, b'', b'', b'', b'', b'', b'', b'', b'', b'', b'', b'', b'', b'', b'', b'', b'empty well', b'', b'dapi: DNA;vsvg-cfp: CFP-tsO45G ;pm-647: cell surface tsO45G', b'', b'', b'', b'', b'', b'', b'', b'', b'', b'0305-01--2007-04-03', b'a1'),
       (2697, b'305', b'2', 604715, b'Homo sapiens', b'NCBITaxon', b'NCBITaxon_9606', b'HeLa', b'EFO', b'EFO_0001185', b'SI00289037', b'UAACUCUCUCUAUUAAGGGtt', b'CCCUUAAUAGAGAGAGUUAtt', b'NCBI35, Ensembl release 27, Dec 2004', b'ENSG00000109576', b'AADAT', b'', b'Ensembl release 61 gene symbol', b'GRCh37, Ensembl release 61, Feb 2011, Gene Symbols from Ensembl release 61 or 81', b'', b'', b'dapi: DNA;vsvg-cfp: CFP-tsO45G ;pm-647: cell surface tsO45G', b'', b'', b'', b'', b'', b'', b'', b'', b'', b'0305-01--2007-04-03', b'a2'),
       (2697, b'305', b'3', 604933, b'Homo sapiens', b'NCBITaxon', b'NCBITaxon_9606', b'HeLa', b'EFO', b'EFO_0001185', b'SI02224586', b'UUCAUCAGGUUUACCACCUgg', b'AGGUGGUAAACCUGAUGAAtt', b'NCBI35, Ensembl release 27, Dec 2004', b'ENSG00000115977', b'AAK1', b'', b'Ensembl release 81 gene symbol added by IDR curator', b'GRCh37, Ensembl release 61, Feb 2011, Gene Symbols from Ensembl release 61 or 81', b'', b'', b'dapi: DNA;vsvg-cfp: CFP-tsO45G ;pm-647: cell surface tsO45G', b'', b'', b'', b'', b'', b'', b'', b'', b'', b'0305-01--2007-04-03', b'a3'),
       (2697, b'305', b'4', 604821, b'Homo sapiens', b'NCBITaxon', b'NCBITaxon_9606', b'HeLa', b'EFO', b'EFO_0001185', b'SI00289219', b'UUCCGUAUGGAAUCAUCGGga', b'CCGAUGAUUCCAUACGGAAtt', b'NCBI35, Ensembl release 27, Dec 2004', b'ENSG00000183044', b'ABAT', b'', b'Ensembl release 61 gene symbol', b'GRCh37, Ensembl release 61, Feb 2011, Gene Symbols from Ensembl release 61 or 81', b'', b'', b'dapi: DNA;vsvg-cfp: CFP-tsO45G ;pm-647: cell surface tsO45G', b'', b'4-aminobutyrate aminotransferase', b'2.4305', b'yes', b'yes', b'multiple replicates of reagent', b'strong inhibition of secretion', b'strong decrease in rate of protein secretion', b'CMPO_0000319', b'0305-01--2007-04-03', b'a4'),
       (2697, b'305', b'5', 604961, b'Homo sapiens', b'NCBITaxon', b'NCBITaxon_9606', b'HeLa', b'EFO', b'EFO_0001185', b'SI00078085', b'UACUGUUCGUUGUACAUCCag', b'GGAUGUACAACGAACAGUAtt', b'NCBI35, Ensembl release 27, Dec 2004', b'ENSG00000165029', b'ABCA1', b'', b'Ensembl release 61 gene symbol', b'GRCh37, Ensembl release 61, Feb 2011, Gene Symbols from Ensembl release 61 or 81', b'', b'', b'dapi: DNA;vsvg-cfp: CFP-tsO45G ;pm-647: cell surface tsO45G', b'', b'ATP-binding cassette, sub-family A (ABC1), member 1', b'1.3921', b'yes', b'yes', b'multiple replicates of reagent', b'strong inhibition of secretion', b'strong decrease in rate of protein secretion', b'CMPO_0000319', b'0305-01--2007-04-03', b'a5'),

Hi,

It’s probably easiest to describe how to do this in the web UI first. Then you can look into the http GET requests the provide the same results as JSON data.

Start with the “Gene” link at the top of the webclient which takes you to http://idr.openmicroscopy.org/mapr/gene/?experimenter=-1

Here you can enter your search term above the left panel - click on the single item from the drop-down menu. Alternatively, you can use the Gene ID in the URL and go directly to http://idr.openmicroscopy.org/mapr/gene/?value=ENSG00000115977

By expanding the tree on the left, you can see all the studies that have images linked to this gene, with the number of matching images in brackets. There are 2 screens for the idr0009-simpson-secretion. The bulk annotation file you downloaded is linked to idr0009-simpson-secretion/screenB. You can expand that node to see 10 plates, each with a single Image linked to that Gene.
Expanding each Plate will show the Image thumbnail in the centre panel and selecting the Image in the centre panel or the tree will load the Image data in the right panel.

To do this programatically, you can use the browser developer tools to inspect the JSON data being loaded by the webclient, provided by the mapr app: https://github.com/ome/omero-mapr.

The studies that have Images linked to your gene of interest can be retrieved with:
http://idr.openmicroscopy.org/mapr/api/gene/?id=ENSG00000115977

You’ll see under the ‘screens’ list that idr0013-neumann-mitocheck/screenB has ID: 1302.
To see the Plates under Screen:1302 that have Images linked to your gene, you can use http://idr.openmicroscopy.org/mapr/api/gene/plates/?value=ENSG00000115977&id=1302

You can see that the first Plate has ID: 5662. To see the Images in that Plate, use:
http://idr.openmicroscopy.org/mapr/api/gene/images/?value=ENSG00000115977&node=plate&id=5662

The first image in the list has ID: 2862526

The webclient loads batches of thumbnails as JSON data. E.g. http://idr.openmicroscopy.org/webclient/get_thumbnails/?id=2862526
But you can also load thumbnails as jpegs e.g. http://idr.openmicroscopy.org/webclient/render_thumbnail/2862526/
Default size is 96 pixels for longest side.

Hope that helps,

Will.

2 Likes

Hi Will @will-moore,

Thank you so much for your reply, it helps a lot!

I kind of figured out something from the web UI, i was trying to do a batch way for all the genes, my ultimate goal is to mapping genomic coordinates to images and thumbnails, by using gene id as a bridge. So I want to get images/thumbnails associates to gene id (in one study idr0009 for example), then I will use custom script convert gene id to chr1:xxxx-xxxx coordinates for my project.

Using idr0009-screenB for example, what I have tried I download the original annotation csv from github, the download the bulk annotation file as mentioned in 1st post, I managed to convert the bulk annotation file to a cvs file using custom Python script, I found the Plate, Well column are different, as seen by the screenshot below:

so why they are different? maybe there is a way I can get image Id/thumbnail Id(s) for each gene id in the table (for each row) base on the plate and well id?

Thank you again!

Update: seems using this link https://idr.openmicroscopy.org/webgateway/well/604933/children/ I can get image ID retuned: 1313684, then I can get thumbnail using http://idr.openmicroscopy.org/webclient/get_thumbnails/?id=1313684 and image using http://idr.openmicroscopy.org/webclient/render_image/1313684/, not sure if I am doing correct…though…

Hi,

The csv format allows authors to curate the data before it is imported into the IDR, when we know the names of Plates, and Wells but NOT their IDs.
Following import of the images into the IDR OMERO server, we can then use the csv to generate the bulk annotation file by assigning Plate, Well and Image IDs to each row of the csv.

This process is carried out by the $ omero metadata populate command: https://github.com/ome/omero-metadata#populate

We use bulk annotations (HDF tables known as OMERO.tables) to store tabular metadata that doesn’t fit in the OMERO relational database.
Since the metadata is different for each Screen in IDR, using a separate OMERO.table bulk annotation on each screen allows us to customise the table for that screen.
However, it means that you can’t do a query across the whole of IDR to retrieve data from the bulk annotation OMERO.tables. You have to download or open each bulk annotation table in turn.
But if you only want to work with one or two Screens then that may work well for you.
The alternative is to use the OMERO.mapr queries I mentioned in my previous reply.

I think that https://idr.openmicroscopy.org/webgateway/well/604933/children/ will give you what you want in quite a concise format. How did you find that? It is not used by the webclient itself or tested as well as:
http://idr.openmicroscopy.org/api/v0/m/wells/604933/
That API gives you JSON data in the OME model format (described https://docs.openmicroscopy.org/omero/5.6.1/developers/json-api.html) so it is a bit more verbose but contains more data.

Either is fine to use.

You can also get the OMERO.table data as html: http://idr.openmicroscopy.org/webclient/omero_table/14209181/
or as json with pagination: e.g.
http://idr.openmicroscopy.org/webclient/omero_table/14209181/json/?offset=1000&limit=1000

Regards,

Will.

1 Like

Thank you Will @will-moore,

How did you come with the ID 14209181 in last section? From the bulk annotation file the plate ID is 2697 but seems no place used this ID?

Thank you for point the API address, it’s very useful. The URL https://idr.openmicroscopy.org/webgateway/well/604933/children/ was found from the web developer tools (network tab)…

The Screen ‘dr0009-simpson-secretion/screenB’ http://idr.openmicroscopy.org/webclient/?show=screen-803 is annotated with the bulk annotation file from your first post (it has the download link under “Attachments” in the right panel - see screenshot).

Screenshot 2020-07-14 at 22.44.12

The Annotation ID is 6889873 but the ID of the “OriginalFile” is 14209181.
If you click on the ‘eye’ icon, you’ll open the html table in a new window.

I originally tracked down the Screen via the Well ID in your 2nd post, see:
http://idr.openmicroscopy.org/webclient/?show=well-605030

Regards,
Will

1 Like

Thank you Will @will-moore for your patience and explanation!

1 Like