I don't think that's a clear case of overfitting. You could have used a subset of the original data for training and the rest for validation and it would have generalised pretty well.
It doesn't generalise when the US ship is in Chinese waters, but that's because the system was never "learning" to recognize ships in the first place.