Hello CP team,
I have a few questions related to handling massive amounts of images. As we are starting to ramp up screens we are running into all kinds of shortages so were wondering has worked out workflow protocols /software or is familiar with commercial tools to systematically: transfer images from capturing workstation to server, backing up, retrieval for processing, automatic deletion of processing steps or permanent store of final high-quality images once processed etc, etc.
We now generate:
- 80 images of 25Mb each per well, thus 25Mb x 80 x 96 ~200Gb/plate .
- 96 stitched and compressed well images (5Mb total per well) ~0.5Gb/plate. Otherwise CP chokes up: is this ~5Mb/well a limitation of our processing workstation or is a CP limitation?
- 96x2 processed (outlined, straighten) images ~1Gb/plate.
The plan is to screen 240,000 wells ~500Tb
We are for now doing things very manually which is time consuming and risky.
- capture with Nikon Elements into 8Tb local SSD
- stitch using Elements. Do you know of fast stitching tools that can be used in a high-processing computer (no windows or OS operating system?
- ‘immediately after’ (depending on the student promptness) we transfer tiles and stitched images to institutional 1PB server, but this needs to be done using the backup function of Elements (one plate at a time) so that we can repopulate Elements HCA visualization tools if needed. This server contents are backed up to cloud daily.
- When we start processing with CP we normally need to transfer back images to processing computer, this generates CP databases whose paths are linked to that computer so sharing results of the analysis is challenging. We would prefer to analyze from any personal computer without moving images back and forth and having the results centralized so that every team member can open the database in the version of CPA for training and scoring (since different people may be looking at different phenotypes).
- transfer back all processed images to the server.
Keeping track of all this processing and transfer of data is difficult and we fear that as we increase the volume we will start losing data or losing track of the data.
So, do you have suggestions or tools on how to improve the workflow?