Ordering of max size, scale jitter, and cropping operations

I’m striving to understand exactly how the image augmentation parameters interact with the maximum and minimum input sizes for training a DLC model.

Specifically, what is the order in which the cropping, scaling, and max size check are conducted? I have been assuming that it goes:

  1. Scale frame using “global_scale”
  2. Crop around points using “minsize” and “(bottom/top)height” & “(left/right)width”
  3. Scale the cropped frame using “scale_jitter_(lo/hi)” params
  4. Check to make sure the scaled frame size does not exceed “max_input_size” squared

However, I have been encountering issues in which training freezes partway through, with GPU usage maxed out. This makes me wonder if a too-large frame has snuck through. Does the size check perhaps later in the sequence?

On the basis of pose_defaultdataset.py, it seems I was wrong about the ordering:

First, the max_input_size check occurs on the size of the initial image if it were to be scaled
Then, the image is cropped around a random joint.
Finally, the post-cropping image is scaled.

This seems potentially counterproductive, since it prohibits the model from using small chunks of a larger image. Have I misunderstood how the processing is occurring?

Yes, ideally the order should be changed indeed! However, there is a minsize for the cropping (which by default is +/- 100), thus rescaling ranges >64/200 should never result in problems. What are the parameter settings in which you get freezing & are you sure it’s the augmentation?

I had been seeing freezing with quite large upper bounds on scale jitter (e.g. 10x) which I was using to check the effects of very large perturbations on generalization error. I am not sure that it was the augmentation, as I am now having trouble reproducing it - when trying again I get a much more understandable OOM error.