OMERO storage reports

Hi all and @OMETeam !

As we have to report the storage per group of our OMERO server, I wanted to know if there was any smart way of doing that from the command line ?

We have a cron job that is regularly saving the amount of data per user, however this doesn’t really help if we have users in multiple groups for example.

Is there any way to know the amount of data per group ? And if needed per user of that group ?

Also, is there a command line to show the default group of a user ? /bin/omero user list shows the group IDs of all groups but no way to guess the default one.

Thanks a lot !

The repository management guide is probably what you want, e.g., omero fs usage --report ExperimenterGroup:123 for the group with ID 123. (omero group info reports group IDs.) It’s probably best not to attempt checking ExperimenterGroup:* all at once on a large server as the query runs rather more slowly than we might wish.

1 Like

I am not aware of a command line to show users’ default group, that seems an omission in omero user listgroups. The OMERO.blitz API does make the information available but using getDefaultGroup would require writing a little script.

@lguerard,

as a workaround to find the default group of a user, you can use HQL:

omero hql "select e.id, e.omeName, g.name from Experimenter e join e.groupExperimenterMap m join m.parent g where index(m) = 0"

~Josh

2 Likes

Thanks a lot, this seems to be doing what we want !

It seems to even take quite some time for an individual group so I don’t know what’s best between making a loop or using ExperimenterGroup:*. Do you have any ideas about this ?

Thanks as well ! Is HQL able to get the amount of data per group ? This could bypass the need to know about the default group for us.

Thanks again to both :slight_smile:

Definitely the former, use a loop.

It can get some information easily, like the size of the originally uploaded files, in the managed repository. Would that suffice? omero fs usage includes extra disk usage such as pyramids and thumbnails, its thoroughness is part of why it’s a heavier-duty operation. For a simpler approach perhaps it can help to use something like,

omero hql --all "SELECT details.group.name, details.owner.omeName, SUM(size) FROM OriginalFile WHERE size > 0 GROUP BY details.group.name, details.owner.omeName ORDER BY details.group.name, details.owner.omeName"

It won’t be perfect but may be a useful guide.

Thanks ! I’m still waiting for the first command to finish and have an idea of the results and would then create a loop for all groups.

The problem with that is that our IT is charging for storage so all files including pyramids and thumbnails, so I guess omero fs usage is the way to go.

This was super fast and gave good results thanks, I’ll just need to find a way to put all results and not only 25 by 25 in a file if I go that way.

Thanks again ! :slight_smile:

Should it take more than 4 hours for a single group ? :confused:

If you’ve canceled any previous from the command line they might all still be trying to run in different server threads which would slow things down. But, yes, with a lot of data it can take a very long time, hence the GitHub issue linked above, so you might want to go with that quicker HQL-based alternative. Or, if there are far more users than groups, maybe try looping over the users instead of the groups could make it workable.

For what it’s worth, you can see that it’s at least making progress if you temporarily adjust the server’s logback.xml line for ome.services.graphs from INFO up to DEBUG, wait a minute, then watch Blitz-0.log, but change it back after to avoid all the log clutter.

1 Like

Thanks for your answer !

In the end, I used a cron job we already have which is reporting on per user file storage, and used the script from @joshmoore to get the default group of users and made a small python script merging everything together.

Thanks both for your help ! :slight_smile:

1 Like