It's not my field, but I'd be very surprised if models were not also calibrated by extrapolating earlier known years and comparing to later known years.

You can really only judge models on days that was not yet available when they were created.

