Hacker News new | past | comments | ask | show | jobs | submit login

Array indices should arguably be unsigned (and struct/type sizes), so I'd say it's a lot more common than you imply.



I would have used to argue this, until I learned that Ada not only allows enum-indexing into arrays (compiler handled), but it also allows non-zero-based indexing.

Example: #1

    -- You just index into this using 100 .. 200 and let the compiler handle it.
    type Offsetted_Array is array (Positive range 100 .. 200) of Integer;
Example: #2

    -- Indexing using an enumeration (it's really just a statically sized map)

    -- An enumeration.
    type c_lflag_t is (ISIG, ICANON, XCase, ... etc.

    -- Create an array which maps into a single 32-bit integer.
    type Local_Flags is array (c_lflag_t) of Boolean
        with Pack, Size => 32;


Yes, Ada is pretty flexible in this regard, but I'm not sure how useful this actually is.


It's actually super useful, especially since you effectively get a statically sized map. Also, you can iterate over enums, and move forward ('Succ) or backwards ('Pred) or to 'First or 'Last. You can also return VLA arrays, which means fewer "allocate just to return" problems (GNAT uses a second stack per thread allocated ahead of time).


What I meant was, how useful non-zero indexing is in general. The utility of indexing by enum is clear, as you say.


I've only used it a few times but IIRC it was contiguous value ranges of grouped values (I think it was error codes coming from C code) anchored to the middle of a range. e.g. an enum which goes from 0 .. N, but values 10-30 were some specific set of logical values and I didn't care about the rest. It was nice that Ada automatically did all range checks for me and I didn't have to remember to subtract to check the correct array index.

The most common thing I've seen it for is that most arrays (and containers) in Ada are written as 1 .. N, but if you're share index information with C code, you want 0 .. N-1 indexing.


And exactly how is silent wraparound useful or even sane for that use case? You just proved the point of the one you responded to.


Wrapping is more sensible than negative indices.


It is still dogshit though. The reasonable behaviour would be an error.


And you can raise the error if that index is actually out of bounds. I don't see why the wrapping specifically is the problem here, the only unsafety is indexing operation itself.


Sure, Rust for example will let you do that (although in debug builds it will panic unless you explicitly said this is what you intended). However from a correctness point of view, it is extremely unlikely that things[n] is doing what you intended if n wrapped.

Most likely you thought you were harmlessly increasing n, after all it's an unsigned integer, and you added something to it. But when it wrapped, adding something to it made it decrease dramatically and you probably didn't consider that.

This can be punished by bad guys, where you expected a value like 10 or maybe 420 instead the bad guys provide a huge number, you do some arithmetic with their huge number, you wrap the offset to the very start of your data structure. Now it's inside the bounds, but not where you expected at all.

This is why people talk about "hoping you get a segfault" in languages like C++ because the alternative is much worse.

If you need to care about this (fiddling with files somebody provided e.g. by uploading or emailing them to you is an obvious place this comes up in web services) you should use WUFFS to do that. You can't make this mistake in WUFFS.


I agree that domain-specific ranged types as found in Ada are close to ideal. Unbounded integers or naturals are second best. Wrapping and checked arithmetic are distant thirds, but I don't think either is intrinsically superior to the other in terms of safety. It depends on the program's specific design IMO, but if we're talking about a C-like language where checked arithmetic is not common, I still think it's clear that indexing should be unsigned. Not the approach I'd take in a new language of course.

The pointer arithmetic you describe is the real source of most unsafety. The reason most C/C++ programmers prefer segfaults is because such arithmetic lacks bounds checking.

Thanks for the reference to WUFF though, looks cool.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: