I think everyone will agree that using 4 bytes to store the size a much better idea than potentially enumerating millions of nodes just to count them, in the process completely blowing the cache and any optimizations the processors may do.
It's four goddamn bytes. Or eight if you really want more than 4 billion elements.
Most STL implementations have always done this. This is one reason why it's better to use the empty() check for an empty container instead of size() == 0 for all container types.
Also, linked lists are notoriously crap for cache coherency / processor pre-fetching and branch prediction by processors anyway, given that there's no guarantee where each node will be allocated - the only way to do that is pull them off a slab allocator or something, in which case you might as well use a vector or deque anyway...
That may make 'splice' slower. See http://home.roadrunner.com/~hinnant/On_list_size.html (I think that could be solved by having list iterators carry a field pointing to their container, but haven't given it much thought. Corrections welcome)
Having list iterators carry a field pointing to their container wouldn't help you determine the number of objects between two iterators when you splice, which is what you need in order to maintain the object count.
Storing an index in the iterator would not be a solution either. That would mean updating the index in all the existing iterators when you insert an object in the list.
Just make it the native size based on architecture. 4-byte int for 32-bit, 8-byte for 64-bit. It's impossible to store more than 4 billion elements on a 32-bit machine, you run out of memory first.
It's four goddamn bytes. Or eight if you really want more than 4 billion elements.