Training imcomplete - stopping without error message

The training seems to freeze after a while (without display of an error message), it reached maximum iteration 43000 (out of 1300000). When restarting the processus it never gets to the same iteration.

Moreover, when trying to restart the training from a previously saved snapshot, the iteration count restart at 0 and the snapshots previously saved are crashed and written over.

please see here: