Optimal configuration of PC hardware for handling large data sets

Hello everybody in this Form,
we are about to configure image processing workstation that will run BigDataViewer, MaMuT plugins, RACE, Imaris, etc. How is MaMuT (and RACE software from Keller lab at Janelia) is written, meaning how many jobs have been parallelized, does it rely on CUDA (GPU arrays) or fast CPU cluster, or both, does it require or would benefit from high end dual CPUs, what are the physical memory requirements for optimal smooth running of the software with larger data sets 1TB or more? What are the data cache and its optimal size? And obviously, logically more obvious questions remain. How many of you have been tested and identified the bottlenecks? Best regards, Vitaly

Hi Vitaly.

I can only answer for MaMuT. MaMuT focuses mainly on interactive browsing, curating and editing by the user, so my comments will relate to the factors that preserve a good responsiveness even facing large datasets.

MaMuT is based on the BigDataViewer for visualization. Large images are resaved in a specialized file format that fosters their quick access from hard-drive when you browse such data set. Here it would be logical to recommend saving the image data file on a dast HD, such as modern SSDs. However in my experience I could achieve satisfactory repainting even through the network or a USB HD.

MaMuT does not rely on GPU at all (as of today, July 2017). Everything is software based so you do not need specialized graphic card and such a card will only bring marginal improvements related to 2D painting.

Now I would tend to say that thanks to the BDV, memory requirements are also modest (the BDV stream the image data from the HD), but storing the annotations in memory is another story. Annotations (tracks, etc…) uses the TrackMate API, which consumes a lot of RAM as you have large annotations (beyond 1 million cells, beware). Having large RAM is a good idea. Modern workstation comes with 64 GB, which is something I would like to have.

That’s all. I hope other users will be able to comment on their experience and advices too.

Hi Jean-Yves, Thank you very much for your reply. In relation to object (cell) tracking within a live embryo, what is a timeframe (minutes, hours) in MaMuT for tracking of let say 3-4 or all cells over 24-48 hrs within a large 3D data set (1TB+) and, let say, sampling every 30-90 seconds? Regards, Vitaly

The MaMuT paper might help: http://www.biorxiv.org/content/early/2017/02/28/112623