I had a question that I have been getting conflicting answers on and wanted to know what your guys thoughts are:
When training an image classifier, is it important for all images to have similar orientation or should we put images in random orientation and let the machine learning figure out the pattern?
Thank you in advance
Let me give a brief answer to your question in a general context,
not specific to ImageJ or other image.sc toolkits.
If the images you will be classifying have a natural orientation – for
example, faces in passport photos or numbers in postal codes – then
randomizing their orientations will be detrimental and make the training
process harder, as you will be hiding some useful information from
If the images you will be given to classify are not naturally oriented,
even if the objects in the images can be naturally oriented – say
fish swimming in a pond, viewed from above, where a fish could
equally likely be swimming in any direction, although the head-to-tail
of a fish could be deemed to define its “natural” orientation – you
have two choices:
Use some other method, presumably simpler and cheaper,
to pre-orient the images – say, rotate all images of the
individual fish so that head-to-tail of the fish goes from top
to bottom of the image – and, of course, apply this same
preprocessing to both the images used for training and the
images to be classified. This presents a “smaller,” easier
problem to the classifier being trained because, in effect,
the preprocessing step is providing some additional useful
information to the classifier being trained.
Feed the images to the classifier being trained without
pre-orienting them. This presents a more difficult problem
to the classifier, but, of course, you don’t have to build this
pre-orienting system that we assumed we had in choice 1.
When training classifiers, especially “deep learning” classifiers
a constant challenge is having enough (labelled) training data.
If you’re in case 2 where you need to train your classifier to
classify un-pre-oriented images, but your training set is not as
large as you really need it to be, it is often helpful to “augment”
your training set by adding to it randomly reoriented copies of
your training images.
Thank you for your detailed response. Regarding your last paragraph, are you saying that if we are dealing with a small sample of images to be classified, then we can presumably double the size by copy and reorienting the images?
Regarding your last paragraph, are you saying that if we are dealing with a small sample of images to be classified, then we can presumably double the size by copy and reorienting the images?
If you need rotational invariant classification, then this holds true in the first place.
However, what is much more important is the relation between the number of required independent samples and the number of free parameters of your classifier. Because the latter can be in the order of 10^4 to 10^7, you need at least a comparable number of sample images per class …
That’s why Google, Instagram, etc. are in a favourable position.
If I understand your question correctly, yes, you can increase
the size of your training set by augmenting it with modified
(reoriented) copies of training-set images. I ask some clarifying
questions, and offer some comments, below.
We could probably give you more relevant and helpful answers
if you added some concrete information about what you are
trying to do. What kind of images are you trying to classify?
What classes do you have? How large and diverse are your
data sets? Can you post some sample images? And so on.
You used the “deep-learning” tag on your original post, so I
assume that you are intending to build a neural-network
classifier. Is this the case?
If so, you are (almost certainly) doing “supervised learning.”
That means you have pre-classified, “ground-truth” training
data that in your case I imagine consists of a set of images that
have been annotated with class labels. Then you have (or will
have in the future) the data you will run through your classifier
so that your automated system will determine to which classes
they belong. (Is this the correct context?)
Going back to my view-from-above pond-fish example, your
classes would then be something like “sunfish,” catfish," “bass,”
and so on. In one image the fish might be swimming from
right to left, in another, from bottom to top, and in a third,
southeast diagonally down.
Even if you are not asking your classifier to determine the direction
(orientation), it is reasonable to believe that deep in the bowels of
your black-box neural network, some “sub-detector” determines
the orientation of your fish, so as, say, to help find the head of your
fish, and uses that information to help distinguish between, say, a
catfish and a sunny.
Let’s say that your (labelled) training set consists of 20 images
each of five different species (for 100 training images total).
You could augment your training set adding to it nine reoriented
copies of each image to get a total of 1000.
Let’s be clear about what this can and cannot do. You still only
have twenty images each to teach your classifier the difference
between, say, a catfish and a sunny. So, from this perspective,
you should think of your training set 100 images to teach the
classifier about different fish.
But we believe that the trained classifier has somewhere inside
it an orientation detector that provides useful internal input to
overall classifier. You really are giving this internal sub-detector
1000 images to train on.
(There are a lot of experiments in the image-classification
research that show that this kind of data augmentation really
does help – and that it can help quite a lot.)
Let’s say that you only have enough computing power to train
your classifier on 1000 images. You are much better off using
1000 independent images – more independent information about
the difference between a catfish and a sunny – than using 100
independent images, copied and reoriented ten times over.
But you are also better off (any many experiments demonstrate
this) using 100 independent images that are augmented to get a
1000-image training set, than training with just the un-augmented
100-image training set.
There’s a lot of alchemy in this sort of machine learning. The more
detail you share about what you are doing, the more helpful the
answers you are likely to receive. And nothing against image.sc,
but you also might get more insightful answers on some machine
learning forum. (Your topic isn’t off topic here, but you’ll generally
get the best answers here to specific question about the tools
under the image.sc umbrella.)