I want to build a dataset for a custom DL model. I have about 1290 coculture images (TIFF). I want to create ground truth data by annotating the cells.
I would like to know what is the best way to approach this given that there are thousands of cells per image. And which tools should I use.
I’m doing this because I’ve already used a CP pipeline and it gives bloated counts for leukemic cells due to undersegmentation.
Is deep learning required for your work and how comfortable are you with python?
You can do this in various ways depending on your skills and how much error you accept.
1- Cellprofiler - is the most user friendly, but you are likely not using the correct segmentation options
2- Stardist - can be used through imagej or directly with python, is very good when the cells overlap but have a strong circular shape (this would be my recommendation for you work)
3- Cellpose - works well for non-circular cells, but it is also a deep learning model
Thank you for your reply
I’m a computer engineering undergrad and I’m pretty comfortable with python. I’m working on this project from the perspective of an DL enthusiast.
I used cell profiler to get my baseline results. Could you please share a bit about how I can improve the segmentation options?
I saw the stardist tutorial thanks to @Research_Associate .It seems like a great option I’ll try to run one of their pretrained models for fluorescent images to see the results on my images.
However I would really like to build something from scratch and I am finding it really difficult to decide how I should go about creating the ground truth for this. I want to know if there is a fast way of getting images annotated and which is the best tool for annotating images with 1000s of cells per image (like the one I shared above).
Please let me know if you have any suggestions.
Thanks in advance
The only answer to this is “manually.” You can use plenty of other algorithms to find nuclei, but if they were working accurately, you wouldn’t need a deep learning module. If they aren’t working accurately, you will be building their inaccuracies into your model.
There are suggestions for programs to use for annotating both in the StarDist presentation and elsewhere.
which leads to:
The ability to use a built script and pipeline for training seems rather nice, but it isn’t really using python for most parts except maybe the training itself.
I can’t really tell from that mask picture, but you can definitely improve the results by training your own model. The StarDist presentation on YouTube through Neubias academy goes through adding a small subset of your own nuclei to one of the data sets used to generate the models. I think there were also some discussions of roughly how much of your own data you should add to the training. Probably goes without saying, but you would want to focus on training areas where the pre-trained model was making mistakes.
It’s a lot of video to go through, but it may be worth it if you really want to dig into improving the model.
And I just noticed I linked this before. But still, it should answer most of your questions if you go through it.