Symlinks to avoid duplication of raw data

Hello

talking wiith @pwalczysko, he pointed out to me that my creativity lead me into the cracks between intended use cases :upside_down_face:

In our application we end up having the same slide-scanner images multiple times with different annotations and image descriptions (the perks of a bi-lingual university).
So after importing the data once for one language, script runs through the project-dataset-image tree, retrieves the image path and imports them with --transfer ln_s again in a second project.

Now, on reading the documentation again, it sais:

With in-place import, the data either resides completely outside of OMERO or is shared with other users […]

Now Petr pointed out to me that deleting the image that was imported with the ln_s would lead to the single copy being deleted, given that it is the sole link and OMERO is assuming that the data lies outside of the ManagedRepository.
Here is a suggestion on how to make this more explicit in the documentation.

No to fix this I see the following work coming up:

  1. copy the data on a safe location in the file system outside the ManagedRepository.
  2. in place import for the FR and DE version
  3. copy all the annotations and descriptions from from the originals to the newly imported datasets.
  4. cleanup

Petr pointed also mentioned that there might be a simpler fix for this. @joshmoore could you tell me more about this?

Felix

1 Like

Thanks! I’ve opened https://github.com/ome/ome-documentation/pull/1983

In my case, I had done something similar which led to a chain of symlinks from image A to image B to image C. I wanted to get rid of the extra symlink and have B and C both link to A. (I failed to find that script in a few minutes, but I can search more if you are interested.)

In your case, you want to replace the symlink with a hard-link. That way:

  • if either image is deleted, the other remains
  • if both images are deleted, the data is gone

Happy to help you come up add to this, but something like https://superuser.com/a/560610 is likely what you want:

$ head fix.sh Makefile
==> fix.sh <==
#!/usr/bin/env bash

symlink="$1"
original="$(readlink -m ${symlink})"
echo ln -f $original $symlink

==> Makefile <==
b/c: b a
	ln -s $(PWD)/a $(PWD)/b/c

a:
	date > a

b:
	mkdir -p b

run:

$ make clean a b/c run
rm -rf a b
date > a
mkdir -p b
ln -s /tmp/test/a /tmp/test/b/c
# Manually run the following lines AFTER CHECKING THEM!:
ln -f /tmp/test/a ./b/c

$ ln -f /tmp/test/a ./b/c

$ make test
cat a
Fri Jun  7 11:45:03 BST 2019
cat b/c
Fri Jun  7 11:45:03 BST 2019

~J.

1 Like