Clean up OMERO repository and database

I don’t suppose the server’s master.out or master.err logfiles contain anything of interest for the time of that original stalled import?

Hi @mtbc,

I checked master.out and master.log, but there’s nothing special there for the time of interest.

Correct me if I’m wrong but I don’t believe it’s a network issue or an aborted import. What I did yesterday is I downloaded the file (the original file that failed to get imported) from the server’s managed repository to my local machine and then imported it again. Obviously it was fully transferred to the server originally, otherwise I wouldn’t have gotten imported then yesterday, no?

Thanks for the delete command. May I ask you what exactly omero delete Fileset: does? And when we are at it, what cleanse.py does? In particular, what I’d like to know:

  • Does omero delete Fileset: also delete data from the database that’s referencing that fileset?
  • I had a closer look at cleanse.py, but I don’t get beyond https://github.com/ome/omero-py/blob/0546d34066a42aaf3db07de0f6a8557a78892dfc/src/omero/util/cleanse.py#L275, because I don’t know what repo is.
    Does cleanse.py delete only empty subdirectories? Or does it also check whether filesets exist in the database and deletes the ones that exist on the file system but not in the db? From an earlier answer of @joshmoore I assumed the latter, but can’t really find this in the code, maybe you can point me to the right location.

Thank you all for your help,
Bene

Import is a complex multi-stage process. For example, if the client completed the upload then was unable to invoke OMERO.blitz’s verifyUpload successfully then I suspect one would end up with how you found things to be.

omero delete Fileset:<n> deletes data from both the database and the filesystem because, in normal OMERO operation, they should remain consistent: cleanse's original motivation was to correct inconsistencies, for example, indeed by working through the managed repository to delete from the filesystem files that have no corresponding database entry. (This used to be more of an issue when OMERO.server also ran on Windows; sometimes the files would be locked and undeletable on the first try.)

cleanse may also do some other things too, like with pyramids in the pixels directory, @joshmoore may remember more. Admittedly, its removal of the empty directories is not about consistency, it was just an easy thing to add later on, not that people usually run short of inodes.

repo is a property of OriginalFile instances that identifies which of OMERO’s repositories each file is rooted in. Perhaps it will be most illustrative if I provide you an example of how to query which files are rooted in the managed repository:

SELECT path || name FROM originalfile
  WHERE mimetype <> 'Directory' AND repo IN
    (SELECT hash FROM originalfile
       WHERE mimetype = 'Repository' AND name = 'ManagedRepository');

which assumes that your omero.managed.dir setting names the directory that, as in the default /OMERO/ManagedRepository.

Thanks, @mtbc, for the clarifications.

Hi @bene.schmid,

Sorry for missing all the “fun”. Trying to summarize:

  • You’ve found some number of filesets which are non-deleted but which have no images that you want to clean up.
  • However, you’re concerned about the larger question about how they came to be, right?

Can you provide a list of all the filesets that are in this state? Are they from the same user? Of the same file format? Imported at roughly the same time?

~J.

Hi @joshmoore,

I must say that I was just not aware that filesets without images exist at all. So I think there are 2 points:

  • I’d like to delete at least the old ones, since obviously nobody misses them, so they occupy unnecessarily storage space.
  • More importantly, if these filesets are there because of an import problem, it means basically that users loose the corresponding images (of course they should notice if something is missing, and strangely I never got any complains about this).

So yes, I’d definitely like to know why this happens, and if I should daily check if there are new such filesets and then warn users about them. I re-run omero fs sets --without-images --limit -1, and it seems that it happens irregularly since we run OMERO.server (which was in 2014). Different users are affected, and different file formats (Leica, Zeiss, LaVision Biotec, …). I’ll upload the whole output.

For now, I’ll monitor daily if there’s a new such fileset and try to find out from the corresponding users if there was anything unusual, if they never really saw the data in OMERO, or whether they deleted it (and it did not get fully cleaned up).

Thanks again,
Bene

filesets_without_images.txt (124.0 KB)

1 Like

As far as I know, that should only ever happen on failed imports.

Makes sense if there’s no further debugging you want to do with them. Deleting the Fileset: object as Mark described should make the data disappear. (A separate issue will/would be having filesets without images show up in the cleanse output.)

That seems extreme, and I’ve never heard of anyone needing to do that, but you clearly have a lot of these appearing.

Sounds great. Keep us posted.