Hello, I’m working on a machine learning project which involves labelling a collection of images (which are micrographs). More specifically, the metallurgical phases I see in them. And so, my question is, is there some type of rule behind this process? What I mean is, how many labels per image should I include? Obviously it depends on how many phases there are, but how many for each of them would be a good start? Right now, I’m drawing squares and assigning labels to each of them with a plugin for Fiji I found online and I do that for different parts of the image, that are actually the same phase, but look slightly different. Thank you in advance for your insight.
We would need more details on what you’re trying to achieve. Do you want to automatically identify and label the different phases each image contains or simply label each image with the list of phases it contains (and don’t care about where they are in the image)? Which image processing and machine learning method are you going to use?
Generally, building a training set is not so much about how many examples you have per image but how well the training set covers the diversity present in the data. Typically the size of the training set depends on the complexity of the task, i.e. how hard it is to differentiate the classes given the input data and the learning method used. Without knowledge of your problem, all I could say is get as much data as possible.