Why do arrays start at 0?

xg15 · on Aug 24, 2022

I don't quite understand the argument "0-based being easier for pointer arithmetic is nonsense because the language doesn't have pointers".

Whether or not the language presents the concept of "pointer" to the user is independent of whether or not it uses pointers internally. And if it exposes arrays as a concept, it has to implement them somehow.

The simplest possible implementation of arrays is having a start address and putting all elements next to each other in RAM. To get the address of a particular item, this layout naturally leads to the formula "base address + index * element size", with "index" being 0-based. If you want to expose other indexing schemes in your language, you'll have to add more logic to convert the user-visible index back to 0-based before you can obtain the address.

All of this is completely independent of the fact whether your language exposes pointers to the user or not.

Even the yacht story sort of hints at this:

> To keep people from having their jobs cut short by yacht-racing, Richards designed the language to compile as fast as possible. One optimization was setting arrays to start at 0.

If all indexing schemes were equal, why would this change even be an optimisation in the first place?

bluetomcat · on Aug 24, 2022

The "language" that has a 0-based indexing scheme is assembly. An HLL with 1-based counting that compiles to assembly will introduce additional computational overhead for the translation of the index.

If 1-based indexing was used in assembly, then "mov 1(%ebx),%eax" would be the equivalent of "mov (%ebx),%eax".

jiveturkey · on Aug 24, 2022

> additional computational overhead

at compile time, not runtime.

saalweachter · on Aug 24, 2022

For constant addressing; for arr[i] = 2, you'll still need to subtract 1 from i with 1-based addressing when converting to machine instructions.

nousermane · on Aug 25, 2022

> for arr[i] = 2, you'll still need to subtract 1

You can, but don't need to. Just have compiler store array pointer constant as (arr-1) instead of arr, et voila, zero runtime overhead.

Or, for modern(-ish) ISAs, often you can add/substract a small constant at runtime, with no extra cycles taken. For example, for x86_64:

  # rbp contains "true" pointer to arr
  # rax contains 1-based array index
  mov rax, QWORD PTR [rbp-8+rax*8]

astrobe_ · on Aug 25, 2022

Another option for users is to "sacrifice" the 0th entry of an array. Depending on the size(as in sizeof) of the entry, it can be worth it.

A benefit is that you can then use 0 as a sentinel value; for instance if you have a find() routine that surely can fail, it can just return 0 instead of having e.g. -1 (which can introduce minor issues).

In my experience, though, I am so used to 0-based index that switching schemes can cause stupid off-by-one bugs. I guess that's the main reason behind complains about Lua. It's not that "natural" arrays are thought of as bad, but mixing both schemes (often C an Lua) is error-prone.

murderfs · on Aug 25, 2022

It's basically only x86 among modern ISAs that lets you do base + literal + register * literal, aarch64 only gives you base + register shifted by a literal, and I believe RISC-V is similar: https://gcc.godbolt.org/z/dz768768z

snvzz · on Aug 25, 2022

>and I believe RISC-V is similar

Yes it is. It was evaluated, carefully weighted and discarded, as it was not worth it.

kaba0 · on Aug 25, 2022

But it will still execute with likely no extra time at all due to OOE and how fast arithmetics are.

murderfs · on Aug 25, 2022

OOE doesn't necessarily save you if you end up with a hard dependency on the value of the read (and even if it did, the little cores on ARM SoCs are in-order). This is a pretty obvious candidate for macro-op fusion, but I'm not sure whether this actually happens (and if it happens on ARM little cores, etc.)

NohatCoder · on Aug 25, 2022

If it is not on the hot path, it is likely free, but not guaranteed. If it is on the hot path then it is wasting a whole cycle. And of course in highly ALU-dependent code it is another instruction, so a fraction of a clock.

kaba0 · on Aug 25, 2022

What do you mean it wastes a whole cycle? It may indeed have worse performance due to blowing the instruction cache, but I don’t see why would out-of-order execution be slower on the hot path - I doubt there would be too many hot paths without any dependence on memory fetches outside specific benchmarks - the memory loads will take significantly more time even if they hit cache.

shpx · on Aug 25, 2022

What if the variable is used for array indexing and something else, like displayed to the user?

vukgr · on Aug 24, 2022

Can't say I've really thought this through, but couldn't you just subtract 1 (*sizeof(X)) from the arr address?

tremon · on Aug 24, 2022

What would memset(arr, 0, sizeof(arr)) do in that case? I can see myriad problems with having arr itself point outside the memory range allocated to arr[min..max].

saalweachter · on Aug 24, 2022

Hmm, maybe?

You'd still have the odd subtract here or there, but for many use patterns you could probably ignore the overhead.

joaonmatos · on Aug 25, 2022

For most cases you could get away with some index transformation logic in the compiler (but probably your language would need special types for indices, so I do agree with you that it is not worth the trouble).

lucas_membrane · on Aug 25, 2022

> An HLL with 1-based counting that compiles to assembly will introduce additional computational overhead for the translation of the index

Not necessarily so for translation of an index that is evaluated at run time. PL1 (back in the 1960's) compiled arrays into 'dope vectors'. Each array's dope vector included highest and lowest valid subscripts for each dimension of the array and the address where the element with all zero subscripts would be found, if it existed.

unilynx · on Aug 24, 2022

Not sure if assembly/machine code is that relevant. If one based indexing was more prevalent, the LEA instruction on x86 would just subtract one during execution

cryptonector · on Aug 24, 2022

u/blutomcat's point clearly (I think) was that instruction sets didn't do what you suggest, so zero-based index is (was and still is) what you'd do if you wrote in assembly (which was not uncommon!), and on any given platform, assembly was the main language 50-60 years ago. It follows that it's easier to carry that over to higher level programming languages.

Whether that's what actually happened in the cases of HLLs that adopted zero-based arrays or not, I don't know. But today, to me, zero-based array indexing feels very natural, and the idea that zero-based arrays being simpler in assembly carrying over to HLLs seems at the very least plausible.

dboreham · on Aug 24, 2022

"just subtract one" would take a long time 50-60 years ago.

Jtsummers · on Aug 24, 2022

If you did this, you'd subtract one once at array creation. Or subtract whatever the offset is (really, you'd subtract 1 * data type size). Under the hood, using C syntax instead of assembly:

  int foo[n]; // what the user wrote

  // what happens behind the scenes, after a fashion
  int* foo = malloc(n * sizeof(int); // or sp - n * sizeof(int) if stack allocated; subtraction since stacks usually grow "down"
  foo = foo - 1;

All references into foo will now work just fine so long as they are within the [1,n] range (same issues as with 0-based there since C doesn't carry size information for checking array bounds access). This adds one extra instruction per allocation (which includes allocation on the stack) for all non-0 offsets, but then all access will have the same cost whether 0-based, 1-based, or arbitrary-based. That's a non-zero cost, but it's not exorbitant since you'll be accessing much more often than allocating (and if it's reversed, something weird is happening).

maxdamantus · on Aug 25, 2022

Since as you say, C doesn't carry size information, you would actually need to do it each time a pointer is created from an object. In C you can have arrays of arrays (not pointers) or arrays in structs, where there is no pointer on creation:

  // 11 arrays are created here
  int foo[10][10];

It doesn't make sense to create pointers for the 10 inner arrays, so the subtraction would presumably happen when referring to each inner array:

  int *z = foo[x]; // (char *)foo_sub_1 + (sizeof *foo)*x - (sizeof **foo)
  foo[x][y]; // (char *)foo_sub_1 + (sizeof *foo)*x - (sizeof **foo) + (sizeof **foo)*y

In any case, this is more work, both for the computer and for the human. It's analogous to trying to figure out "which year of which century is X in?" (2022 is the 22nd year of the 21st century):

  year(x) = (x - 1)%100 + 1
  century(x) = floor((x - 1)/100) + 1

The above formulae work for the 1-based convention, because they do the necessary 1 subtraction/addition in the right places. If we instead counted from 0, things would be a lot simpler (2021 would be year 21 of century 20):

  year(x) = x%100
  century(x) = floor(x/100)

Unfortunately, the Romans started counting years before anyone knew about 0. I think this is basically why we have a convention of counting from 1: 0 simply wasn't discovered until relatively recently in human history. Earlier number systems such as Greek/Roman numerals don't have a way of representing it.

froh · on Aug 25, 2022

when I count apples I happily start with one. it's "natural" to label the first thing "1".

or how would you count, say, 3 Apples? or three legionaries?

iopq · on Aug 25, 2022

Counter-example:

You have to remove all desks from #20 to #30, how many desks are there?

Oh, it's 11 desks, I see. This is because we used INCLUSIVE indexing, instead of semi-open interval.

If we use semi-open intervals, we don't include the last item, so that we can write

    match index {
        0..10 => println!("first ten"),
        10..20 => println!("next ten"),
        _ => println!("something else"),
    }

but this forces us to start our intervals at 0, otherwise we would have to write 1..11 which would be awkward

maxdamantus · on Aug 25, 2022

Indeed. End exclusion is advantageous because we don't need to add or subtract 1 when constructing or interpreting ranges. Forgetting to add or subtract 1 is basically why off-by-one errors exist (and doing add rather than subtract or vice versa is probably why off-by-two errors exist).

`a..(a+n)` is an n-length range, not an (n + 1)-length range (oh look, an "add 1" operation!).

And an empty range is denoted by `a..a`, not `a..(a-1)` (oh look, a "subtract 1" operation!).

gernb · on Aug 25, 2022

doesn't swift use inclusive indexing? 1...10 includes 10

maxdamantus · on Aug 25, 2022

Haven't used Swift, but looking at the documentation, it seems to support `...` for end-inclusive and `..<` for end-exclusive.

Looking at the Swift standard library, end-exclusive ranges seem to be far more commonly used:

  ~/build/swift$ git grep -e '\.\.< *[a-zA-Z0-9_(]' --and --not -e '^ *//' -- stdlib/\*.swift | wc -l
  599

  ~/build/swift$ git grep -e '\.\.\. *[a-zA-Z0-9_(]' --and --not -e '^ *//' -- stdlib/\*.swift | wc -l
  112

maxdamantus · on Aug 25, 2022

[0, 1, 2] would be how I would refer to my 3 apples. Each apple is identified by the number of apples that precede it.

I don't think it's inherently more natural to start from 1, just conventional. Disregarding history/convention, I think it would be more natural to use the lowest available natural number.

Back in Roman times, the lowest natural number that people were aware of was "1", so obviously they started counting from that number.

Our understanding of and use of mathematics has evolved since then, and accordingly there are fields such as computer science and combinatorics where there are clear advantages to starting from the smallest number (zero). In virtually all other cases, the reason "1" seems more natural is because that's the way it's been done historically.

It seems that when labelling those apples using a 1-based count, the logic is basically: each apple is identified by the number of apples that precede it, plus 1. The reason for the "plus 1" is that that was your starting number, but it could have easily been 2 or 3. If you instead start from 0, you can omit the "plus X" logic, just as I omitted the "+ 1" and "- 1" logic in my year/century formulae when moving from 1-based to 0-based counting.

topaz0 · on Aug 26, 2022

The romans weren't the only people who knew how to count back then... And how many of the other (presumably arbitrary) choices for smallest number were zero? And why did it take over one thousand years to come up with ordinal "zeroth" after we knew about cardinal "zero"?

> It seems that when labeling ... number ... that precede it, plus one.

I don't think so. It's the number that you have counted once you've counted that one.

maxdamantus · on Aug 26, 2022

How do you write zero in Roman numerals? Answer: you can't. Even though they used their number system for adding amounts of money, they hadn't figured out "cardinal zero" yet.

It was introduced into western mathematics through Fibonacci in the 1200s at the same time that Hindu-Arabic numerals were adopted, which use "0" as a placeholder (compare this to earlier Greek numerals which work similarly to the system we use today but without placeholders and using different sets of symbols for the different places—and of course no way of representing zero).

topaz0 · on Aug 26, 2022

That's what I'm talking about (though I guess the "thousand years" estimate was slightly high). Why did it take hundreds of years after the introduction of zero to get "the zeroth element of a sequence", which is quite a new usage, and even now is confined to specialized fields?

dahart · on Aug 25, 2022

The well known “Numerical Recipes in C 2nd ed.” talks about how the first edition promoted exactly this - offset the pointer after allocation and you can use any indexing origin you want. And they promoted 1-indexing and wrote all the algorithms as such, before switching everything to 0-indexing for the 2nd edition.

I agree with you the subtraction instruction is not exorbitant, especially considering malloc is much more expensive and always has been.

dahart · on Aug 25, 2022

Integer add/subtract never really took a long time. Slower than today’s sub-nanosecond, and it depends on the computer, but there were CPUs in the mid 1960s doing millions of instructions per second. They weren’t crazy slow, they were just crazy expensive.

dboreham · on Aug 25, 2022

A million, perhaps. But you're talking about adding one instruction to every single data access. Plus your code bloats up with all those extra dec instructions. BCPL and C were created for departmental and lab computers like the pdp-9.

Note, for example that the mighty 6502 succeeded in the market because its designers had heard from customers that the $100 price for the 6800 was too much, so they removed some instructions and addressing modes to get the die small enough that it could be sold for $20.

dahart · on Aug 25, 2022

> But you're talking about adding one instruction to every single data access.

No, you misunderstood. You subtract one from the pointer at allocation time. You don’t add an instruction every time you access.

topaz0 · on Aug 26, 2022

And even if it was an instruction for every access, memory access is much more expensive than and add (at least nowadays -- I don't know if that's always been true).

dahart · on Aug 26, 2022

Similarly I like to think about it as the add/subtract instruction is very small compared to malloc (and maybe could be included in a custom allocator for free). So the one extra instruction will always be somewhere between relatively minor to negligible and unnoticed.

Yeah today memory access on a cache miss is really expensive. I think memory was normally running at a different clock rate than the CPU even 50-60 years ago, so has always been something where you can’t touch memory every instruction. The latency of a cache miss wasn’t nearly as bad in the past. When the data cache was invented, a main memory access was like 4 times slower than register access. Today, main memory access can be more than a hundred time slower, sometimes it can approach a thousand times slower, which is why we always have multi-level caches now. And memory latency is still getting worse. If someone figured out how to make memory faster without increasing the cost or the energy consumption, they’d be rich! ;)

kitkat_new · on Aug 24, 2022

would it? or would it be just a differently wired circuit?

fsckboy · on Aug 24, 2022

the "differently wired circuit" would be an extra stage of logic computing a carry all the way from lsb to msb (an ALU outside the ALU?) and would contribute a fair bit of extra time. Easier to just use the ALU to do it, which is inserting an extra instruction, also a time waster.

codebje · on Aug 25, 2022

If you're already doing base index addressing (eg, [SI + DI]) you're already using the ALU to compute your memory address, and it's not _that_ much extra wiring to have the ALU support either a constant power-of-two displacement or even a three-way addition; a constant displacement for a basic ripple adder is a XOR and an AND NOT, eg. The x86 family does three-way addition (and a shift) to support "[ESI] [EBX * scale + displacement]".

TuringTest · on Aug 25, 2022

It's not just the extra wiring, but the extra time; each additional computation layer in an integrated circuit introduced delays that would made the operation noticeably slower.

codebje · on Aug 25, 2022

Well, maybe, if the specific use case of 1-based arrays was all that got supported.

It'd be a pretty huge waste of processor design space to do that, though: if an HLL wants to support n-based arrays for n != 0 it can just store the base pointer with the initial index offset already applied.

Early 8-bit processors had a pretty poor set of addressing modes: you couldn't even expect [R1 + R2] let alone [R1 + R2 * scale + displacement], so random access into an array would be a multi-instruction task. By the time of the 80386 base scaled index addressing with displacement was in the CPU core, but it's not _free_ - there's an additional byte in the instruction encoding for the displacement, so even if the CPU computes the displacement with no additional clock cycles you'll have a latency cost for fetching the byte.

tuatoru · on Aug 25, 2022

I would be fine with it all if 0-based was called "offset", with "index" reserved for 1-based.

The element at offset 0 is the first element.

Precision in naming is important.

dathinab · on Aug 25, 2022

Defining index as the element wise offset is a precise well formed definition/name.

It's also a definition commonly used in math, through not always. It depends a lot of the area of mathematics and even the cultural context.

The only reason we sometimes feel 0-index is wrong IMHO is because the english language describes entities in a sequence as 1st, 2nd, 3rd etc.

But I wouldn't be surprised if there is some human language somewhere which does it differently. Like something which roughly matches to english like "head, head+1, head+2, ...".

biztos · on Aug 25, 2022

Which floor is the first floor in your building?

Depends which country your building is in, and more!

My apartment building is 1,2… but my mall is G,M,1,2… (same country different architects). For many years I lived in Europe where it’s usually G,1,2… (where G can also be E or Fsz or whatever) and when I was in the US I had to remember that 1 is G, though L,2,3… is also common.

City people live with index ambiguity all the time and our brains manage just fine.

joaonmatos · on Aug 25, 2022

Quick complement. Where I’m from (Portugal) we use the literal number 0 for the ground floor.

avereveard · on Aug 25, 2022

In Europe is common for elevator to use zero for ground, even if language/culture has a specific name for it (piano terra/piano rialzato here)

fjkdlsjflkds · on Aug 25, 2022

Actually, "rés-do-chão" would be the more common term. Still... in Portugal, the "first floor" is implicitly never the ground floor (but the floor above it), so 0-based indexing is respected (as you suggest).

joaonmatos · on Aug 25, 2022

On elevators you don’t see “R/C” unless the elevator is very old, and the same for office building directories or shopping centres. In almost all cases it’s the number zero. OTOH, in countries like Germany you will see the “E” for Erdgeschoss everywhere.

dathinab · on Aug 26, 2022

> Which floor is the first floor in your building?

0.5 ... yep that won't work

In some buildings build around 1900 where I live the "ground" floor is not on the ground but around 1/2 a floor above the ground floor. While the cellar is just half below the ground. Because of this the cellar has above ground level windows allowing legal cellar apartments (oversimplified in Germany apartments require windows which are above ground).

This creates a lot of confusion.

By convention this 0.5 floor still counts as ground floor with the floor number 0. But people confuse it all the time for the first floor. Especially if in some cases you do have a few utility rooms exactly at ground floor.

Now things get worse if you live in a cellar apartment in such house: By convention floor -1.

Because nearly everyone will thing you mistyped as normally you legally can't have apartments below the ground floor.

Another fun index ambiguity is for apartments build on a mountain side/cliff.

In this case depending on which side of the house you look at the ground floor might be at, above or below the ground level. Often the floor level with the main entrance counts as ground floor, but sometimes it is not the case and e.g. the lowest floor not below ground or the highest floor even with a "ground".

So yes, index/floor level ambiguity is everywhere and much worse then just is ground floor 0 or 1 ;)

rswail · on Aug 25, 2022

Not only that but which floor is the 13th (in the West) or 14th (in China)? Often those numbers are skipped because "luck", "feng shui", or other superstitious nonsense.

Do you count the machine floors in high rise buildings?

In AU, the ground floor is effectively "zero", the first floor is the one above ground. Depending on the design of the building, there might be a "mezzanine" floor between ground and 1st.

Basements usually follow the same pattern, so B1 is the first floor below ground etc.

nawitus · on Aug 25, 2022

In one department store in Finland there are two separate numbering schemes, see this photo from the elevator: https://hs.mediadelivery.fi/img/some/default/c244c43c3008429...

lkuty · on Aug 25, 2022

In Belgium (french speaking part of the country, like in France) we call the ground floor (where you enter the buidling) "rez-de-chaussée" (or "rez" for short) which is the equivalent of 0 and then the first floor, numbered 1, is on top of it and you use an elevator or stairs to reach it.

2143 · on Aug 25, 2022

Yes, where I'm at — large country in Asia — we count floors starting from 0 (aka G for Ground).

Confused me when I visited USA.

Tao3300 · on Aug 25, 2022

G or L (for lobby) followed by 1 isn't uncommon in the US.

> Confused me when I visited USA

It's applied inconsistently enough to be confusing for natives too!

988747 · on Aug 25, 2022

I've just come back from the vacation in the US, and my experience was: when the hotel has G or L for the ground floor, it still has 2 for the floor directly above it. "G or L followed by 1" never happened to me.

loonster · on Aug 25, 2022

It's the consistent inconsistencies that causes it to be less confusing. It would be really weird to go from someplace very orderly to another that is haphazard.

kgwxd · on Aug 25, 2022

> Defining index as the element wise offset is a precise well formed definition/name.

But it's less precise than offset and obviously introduces confusion. Whoever originated the term "0-indexed" should have just suggested we use a better word rather than keeping the inaccurate word (proven by the fact you're currently looking for a way to remove some confusion around the currently chosen word) and prefixing it with a digit, which itself is a confusing thing to do to an English word, thus adding to the confusing while subtracting none.

dathinab · on Aug 26, 2022

you can say the same for 1-indexed, because from the get to go both existed.

Also it's not less precise, as there are many different kinds of offsets too. First you have element wise and byte wise and similar offsets, then offsets are inherently relative to something, and that's not always the first element. But could be an offset from the back and I have even seen a offset from one element before the first element in some very unusual use-cases.

kgwxd · on Aug 25, 2022

But we'd have to change all the "i"s in our for loops with "o"s across the planet, maybe even beyond. I'm willing to help, whatever it takes.

smeagull · on Aug 25, 2022

> Precision in naming is important.

It really isn't. The human race gets by without it the vast majority of the time.

TuringTest · on Aug 25, 2022

The human race doesn't do science or engineering the vast majority of the time. When it does, precision in naming is important.

blueflow · on Aug 24, 2022

That so people in this thread argue about the higher-level language (missing the point) shows that few people found access to the underlying machine code.

Which is sad, because all code is still executed as machine instructions even when the developer does not see it or does not want to care.

just_boost_it · on Aug 24, 2022

We're 20 years past the point where you can expect everyone in tech to trace every instruction down to machine instructions. Higher level languages abstract away the need for it, and for the most part, we can rely on the authors of those languages to make many of the decisions that impact performance. The rest of us learn about these details on posts like this. It's not a sad fact, in fact it's probably one of the most useful things that's happened in tech. It frees people to think about other problems.

TillE · on Aug 24, 2022

I barely touch assembly in my day to day work, but I do understand on a fairly deep level how a computer works, which I feel is extremely important and very frequently influences how I write high-level code.

Certainly one can bang out code their entire career without ever having a clue how machine code works, but I really wouldn't advise it. At worst it leads to total ignorance, and at best you accumulate a disconnected set of "best practices" as inscrutable lore handed down from on high.

iopq · on Aug 25, 2022

When I did web dev it was more important to know how the WEB BROWSER works, not how machine code works

it's not useful to know how the browser allocates memory or garbage collects it, it's more useful to know that it has web sockets. Don't get the idea that things you know are relevant to other people.

Aeolun · on Aug 25, 2022

You go as deep as you need to. My day job involves writing tons of Javascript and there is zero need for me to dive into the underlying processes until something goes wrong. So far the lowest I’ve ever needed to go was the node source.

ZephyrBlu · on Aug 24, 2022

What do you consider "high-level code" and in what way does deeply understanding how a computer works influence your approach?

iopq · on Aug 25, 2022

In college, I optimized the array removal routine from O(N^2) to O(N) by using INVARIANTS

But it probably ran in like 100us anyway

aspyct · on Aug 25, 2022

Ugh, invariants...

Anyway, complexity stays relevant, no matter what language you use. That's the real deal.

iopq · on Aug 26, 2022

It only stays relevant if the JavaScript is too slow. In the case of removing two elements from an array of length 8, it's not too slow to remove them one by one since, again, it runs in microseconds

ryanbrunner · on Aug 24, 2022

If you're not raining about how your code translates to logic gates, you're not a competent programmer. /s

Abstractions are good, and while we should be careful not to completely trust a brand new abstraction, high level languages are established enough that we don't need to worry about it.

ivanhoe · on Aug 24, 2022

20 years ago ASM was just as cryptical black magic to most of programers as it is today... try perhaps 40 y/a...

metafunctor · on Aug 24, 2022

Not sure if you were around 20 years ago, but you sure couldn't expect everyone in tech to understand machine instructions as well as the layers above, all the way up to corporate politics.

But some people, yeah, they could do it.

That has not changed.

tdeck · on Aug 24, 2022

If this were the primary reason, it seems odd that some of the earliest popular languages used 1-based indexing in an era with much slower CPUs (<= 1 MIPS) and less available memory to store instruction code. If the authors of FORTRAN were comfortable with 1-indexing on the IBM 704 (40 KIPS) in 1957, it couldn't have been prohibitive. At the same time, as computer use cases became more interactive there may have been a greater focus on performance, but it likely that the semantics of the high-level language were at least as important.

Blikkentrekker · on Aug 24, 2022

“machine code” isn't really the zero point that's special nowadays.

The code the programmer wrote is compiled to something such as LLVM IR, LLVM IR is further compiled by LLVM to Assembly, this is further compiled by an assembler into machine code, and then the c.p.u. further compiles this to it's internal code as it executes it. “machine code” really is no more special in this chain of events than, say, LLVM IR.

blueflow · on Aug 24, 2022

Machine code is special because its the only thing the CPU can actually execute.

clhodapp · on Aug 25, 2022

... then the c.p.u. further compiles this to it's internal code as it executes it. “machine code” really is no more special in this chain of events than, say, LLVM IR.

I think this is the core point. Even though you are mechanically feeding machine code into the CPU, modern CPUs deeply transform your program before (or even while) running it, with potentially deep performance implications. With modern CPUs, the only choice we truly have is which compiler we trust to optimize our program.

iopq · on Aug 25, 2022

Sometimes it executes it exactly how you think it will execute. But it's still slower because the "less optimized code" fit in its cache.

izacus · on Aug 25, 2022

It's slightly different - machine code is special because it's a visible API between our programming work and the CPU sillicon. Microcode is private implementation detail which you usually can't affect as a software developer. Machine code is something you are directly creating by writing things though.

Blikkentrekker · on Aug 28, 2022

Why is that visible a.p.i. different than, say, LLVM i.r. which is also a visible a.p.i.?

Why is the c.p.u. special?

pjmlp · on Aug 25, 2022

Except if the CPU is microcoded, those instructions might not mean what you think they do, and you need something like Intel's V-Tune to actually understand what the CPU is doing with them.

blueflow · on Aug 25, 2022

> ... to actually understand what the CPU is doing with them.

x86 has a developers manual for this purpose.

pjmlp · on Aug 25, 2022

It has, and if you manage to know everything on those pages, for every single processor on the market, while taking into consideration motherboard architectures, and OS workloads you're our hero.

We are way beyond MS-DOS days and Michael Abrash's books.

gitonthescene · on Aug 24, 2022

It was just most basic example. Indices are connected with math all the time. Consider implementing circular queues with 1-based arrays. Any arguments about what’s _possible_ miss the point. Arguments about what _feels_ natural purport to know the human mind which I find generally suspect. Mostly I think of computers as tools and am interested in getting the most functionality out as smoothly as possible not as slaves whose job it is to make my life worry free.

Someone · on Aug 24, 2022

> To get the address of a particular item, this layout naturally leads to the formula "base address + index * element size", with "index" being 0-based. If you want to expose other indexing schemes in your language, you'll have to add more logic to convert the user-visible index back to 0-based before you can obtain the address.

The easy way to do that is by shifting the base address. In pseudo-C (I think that computing the b pointer is non-conforming, even if the code never tries to access b[0], but compilers can do this without problems):

   int a[100];      // array of 100 integers, zero-based
   int * b = a - 1; // array of 100 integers, one-based

Things only get costly when you want to check array bounds or when you have multi-dimensional arrays. There also may be non-standard architectures where this kind of stuff isn’t possible.

dahfizz · on Aug 25, 2022

In this case, b is a pointer that does not point to a valid address. You can't memset(b, 0, 100), or do any of the other regular pointer things with b.

It feels like you're introducing a million edge cases.

smeagull · on Aug 25, 2022

Okay, now do 2D arrays.

lokedhs · on Aug 25, 2022

This is undefined behaviour, as a pointer is only allowed to point to valid addresses (as well as the index of the last element + 1, but it can't be dereferenced).

twobitshifter · on Aug 24, 2022

Should work 99.99999% of the time until someone tries to put an array at 0x000001 :)

alexcosan · on Aug 24, 2022

I think, in theory, it would work regardless of the starting address. As long as you don't try to access the invalid address (which you wouldn't assuming that it's starting in the index 1, you would always be accessing the first valid address)

Someone · on Aug 24, 2022

In theory, it’s not guaranteed to work at all. https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2310.pdf#p... (emphasis added):

“In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i + n-th and i − n-th elements of the array object, _provided_they_exist”

[…]

If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; _otherwise,_the_behavior_is_undefined.

In this case, the minus-one-th element doesn’t exist, so the expression

  int * b = a - 1;

triggers undefined behavior.

I think some compilers use this in practice to produce faster code (that, often, will not do what the programmer expects it to do). Start reading at https://stackoverflow.com/questions/56360316/c-standard-rega... if you’re sure they don’t. I expect that will change your opinion.

joaonmatos · on Aug 25, 2022

That is a description of the commonly agreed upon definition of the C abstract machine and language semantics. You could simply define the language another way with regards to this behaviour.

Not that I would want to do it, I think zero-based addressing is not very taxing for the convenience of being closer to how we think of memory addressing.

matthewmcg · on Aug 24, 2022

Amusing but probably irrelevant: the sailing yacht racing handicapping calculations mentioned are almost certainly under the Performance Handicap Racing Fleet (PHRF) system[0]. This system is "zero based" in its own way. A sailboat's performance is assessed and handicapped against a standard yacht with a zero rating. Your time to complete the race course is adjusted by your handicap to determine your final standing. This lets you race slower and faster types of boats in a "fair" way.

[0] https://en.wikipedia.org/wiki/Performance_Handicap_Racing_Fl....

lupire · on Aug 24, 2022

Implementation matters for performance, but even beyond that, the interface matters for users. 0-based offsets are convenient for users doing math on indexes.

A pointer is just one kind of array-like indexing scheme. Good pointery languages will distinguish Address from Offset from Integer.

turboponyy · on Aug 24, 2022

Nats start at 0, end of discussion - it's only logical to index by the naturals.

mtizim · on Aug 24, 2022

That's really silly - the natural numbers also famously start at either 0 or 1, depending on the country, the discipline, and the individual.

thaumasiotes · on Aug 24, 2022

I like to treat the naturals as starting at zero, but not for any mathematical reason. I just think that since it's easy to write ℤ+, it's more useful for ℕ to refer to something slightly different than it would be for ℕ to be a synonym for ℤ+ while anyone wanting to include zero in their set was forced to spell out "nonnegative integers".

A dichotomy between "ℤ+" on the one hand versus "ℕ" on the other is just plain more convenient than a dichotomy between "ℤ+" and "ℕ" on the one hand versus "ℤ\ℤ-" on the other.

zzo38computer · on Aug 24, 2022

It is because you mean two different things by "natural numbers", even though the same words are used for things that are not the same things.

I prefer to start by zero; I think it is more useful in general, and makes more sense mathematically for many (although not all) purposes, and systems that you will find objects and operations that have these properties too.

topaz0 · on Aug 24, 2022

If there were any justice in the world, they'd start at 2, as a handicap. They might even make it out of the bottom of the NL east if that was the case.

JadeNB · on Aug 24, 2022

> Nats start at 0, end of discussion - it's only logical to index by the naturals.

Now you'll just bring out the people who start the naturals at 1.

ajg36 · on Aug 24, 2022

Starting the naturals at 1 is quite onerous.

colejohnson66 · on Aug 24, 2022

That’s why I compromise and start at -1

lttlrck · on Aug 24, 2022

other important and day to day tools like rulers, clocks and speedometers also start at zero. So it's not exactly "bending humans to microprocessors weird alien ways".

It all makes perfect sense in the context of measuring.

foobarbaz33 · on Aug 25, 2022

The analogy is slightly off though. The first position of an array occupies 1 unit of memory. 0 is not a possible measurement in the world of arrays.

Maybe a null terminated string could have length 0 if you don't count the terminator. But that 0 is a property of the "string" abstraction. The actual "array" would still be length 1.

iopq · on Aug 25, 2022

An array of length 0 starts at 0 and ends with 0

foobarbaz33 · on Aug 25, 2022

> An array of length 0 starts at 0 and ends with 0

Nope. The moment you "start" somewhere you occupy 1 unit of memory. Thus no longer an "array of length 0".

There is no such thing as an array of length 0. It absolutely does not exist. You cannot write source code to represent it.

ie assuming we are talking about the pure data structure of an array. Some languages may have some abstraction built on of arrays (ie c-strings) that can have length 0 but these are not "arrays" and if they are defined they still occupy 1 unit of memory for the terminator.

steveklabnik · on Aug 26, 2022

C++ requires every value to take at least one byte of memory, but this isn’t true of all languages. Additionally, C++20 adds [[no_unique_address]], which allows zero-sizes types.

iopq points out Rust as a language that does allow zero length arrays in a sibling comment.

While ISO C does not allow zero length arrays, GNU C does. There’s also flexible array members, which can be length 0.

foobarbaz33 · on Aug 26, 2022

I'd say this transcends language.

Sure a language can define some abstraction that says "hey I'm length-0 array!". But this is more like a statement of "hey I don't exist yet, but if i do exist in the future I'll be of type array and at least length-1".

You can't access index 0 of a length-0 array.

You can only access index 0 of length-1+ array.

iopq · on Aug 26, 2022

Allowed in Rust:

https://play.rust-lang.org/?version=stable&mode=debug&editio...

it's an array of length 0, but it can't be "used" because it has a length of 0 so it occupies no RAM

foobarbaz33 · on Aug 26, 2022

> An array of length 0 starts at 0 and ends with 0

But such an array does not "start" at 0 as it does not exist. You need an array of at least length-1 to "start" at 0.

iopq · on Sept 7, 2022

I can write a loop like this:

`for(int i=0; i < 0; ++i)` and it will run 0 times, the exact amount of times I expected

foobarbaz33 · on Sept 10, 2022

But again your not starting at "index 0" of an array that exists.

This loop is not processing any physically existing array. In fact the compiler would remove this block of code as it is a no-op.

topaz0 · on Aug 26, 2022

The programming languages that I use most often can all represent empty arrays just fine. Of course, indexing into them is an error. But defining them is commonplace, and not being able to define them would make a lot of my code a lot more complicated.

chillfox · on Aug 25, 2022

With those other things zero is zero, but with arrays the zero index is the first item. The disconnect really starts when you get the size of an array and then have to minus it by one to have the last index.

To take your ruler example and apply array logic to to. Say you are measuring a 2cm thing. If the ruler follows the index then it would show 1cm (the last index), and so you would have to add + to get the size/length.

Having the index not align with the size is a pain that results in a lot of +/-1 code that a compiler could just as easily have handled.

fsckboy · on Aug 25, 2022

it's a good idea to call the 0th element of an array the zeroth element, else you encourage off-by-one errors. The size of the array, the "count", is that which your index must be less than.

I'm not saying it has to be this way and it can't start with 1, I'm saying what you are pointing out as a flaw is actually the mixing of systems.

in computer programming teams it's best to adopt a standard unambiguous language to communicate to one another so as not have to constantly say "do you mean...?"

iopq · on Aug 25, 2022

No, it's the first element, if it's a 32 bit int it still occupies bytes

0..4

the second int occupies bytes

4..8

you can see the array up to the second int occupies the contiguous space 0..8 which 0..4*2 and it has TWO elements

topaz0 · on Aug 25, 2022

Rulers, clocks, and speedometers measure continuous quantities, i.e. ℝ. The natural numbers are irrelevant there.

thaumasiotes · on Aug 24, 2022

> Nats start at 0, end of discussion - it's only logical to index by the naturals.

Well, no, obviously you want to index by ordinals, and they start at "first".

salty_biscuits · on Aug 24, 2022

Where 0 is the cardinality of the empty set, i.e. some empty collection. Not a convincing argument.

JadeNB · on Aug 24, 2022

> Where 0 is the cardinality of the empty set, i.e. some empty collection. Not a convincing argument.

Why not? "Every cardinality, except of the empty set, is a natural number" isn't very convincing to me, if we're making cardinal-based arguments.

salty_biscuits · on Aug 24, 2022

I'm not arguing that the natural numbers should start from one, rather that the usual path for developing the natural numbers starting with set theory is that zero is the size of a set with nothing in it. In maths they always talk about the first element in a vector, not the zeroth. It is a bad argument to point to mathematics for using zero based indices. It is not at all common to use zero based indexing there.

JadeNB · on Aug 24, 2022

I'm a mathematician, and 0-based indexing makes way more sense to me.

Dijkstra's argument (https://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF) is that we count the number of predecessors. This fits well with the mathematical usage, at least among mathematicians who care to dig into the order-theoretic foundations: the von Neumann construction of ordinals (and of cardinals as least ordinals of a fixed, well, cardinality), each finite ordinal is equal to its cardinality: 0 = \emptyset is a set with 0 elements, 1 = {0} is a set with 1 element, 2 = {0, 1} is a set with 2 elements, etc.

salty_biscuits · on Aug 24, 2022

I'm a mathematician too, and it doesn't make sense to me. It seems somewhat arbitrary depending on the application, i.e. I think it is a type problem. We use an Int to mean something special about array elements. I'd be happier with first() and last() methods really, I don't care for coming up with meanings based on implementation details.

JadeNB · on Aug 25, 2022

> I'm a mathematician too, and it doesn't make sense to me.

Just to be clear, I wasn't making some sort of bizarre appeal to authority; whether something makes sense is a matter of taste, not of mathematics, and can't be proven or dis-. I meant only to respond to:

> In maths they always talk about the first element in a vector, not the zeroth. It is a bad argument to point to mathematics for using zero based indices.

I don't know for sure which convention is more common, though I suspect you're right that it's the 1-based convention, but I do know that it's false that 'they'—meaning, among others, we!—always talk about the first element/component in a vector. I probably will do that in a first Linear Algebra class, because the textbook often does and I don't want to introduce unnecessary confusion; but usually in my own work, and sometimes when I teach upper-level math classes, I use 0-based indexing when it's necessary to choose.

samatman · on Aug 25, 2022

The question is whether 0 is a valid ordinal. If it is, why skip it?

If you allow 0 as a valid ordinal, it unifies ordination and measure. That's the best reason to do so.

There. Djikstra as haiku.

thaumasiotes · on Aug 24, 2022

It's not a convincing argument because you index into an array if you want to retrieve an element contained in the array. If 0 is the cardinality of an empty collection, it's not a valid index, because you can only index into non-empty collections.

JadeNB · on Aug 24, 2022

Dijkstra's argument (https://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF) is that the index counts the number of predecessors.

I think an age-based way to phrase it: in your 1st year, your age is 0; in your 2nd year, your age is 1; and so on. We can assign people numbers indicating what year of their lives they're in, or how many years they have lived, and both are fine, but we've settled on the latter.

thaumasiotes · on Aug 24, 2022

> Dijkstra's argument (https://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF) is that the index counts the number of predecessors.

That's not an argument. It's a coincidence. There are applications where what you care about is the number of predecessors (indeed, that's what the compiler cares about, which is why we have 0-indexing in the first place), but they are a tiny minority of all indexing.

> I think an age-based way to phrase it: in your 1st year, your age is 0; in your 2nd year, your age is 1; and so on.

But that isn't even true. No one ever reports the age of their new child as 0; instead, they will report a positive number of months, or -- if it's an extremely new child -- of weeks or days.

JadeNB · on Aug 25, 2022

> That's not an argument. It's a coincidence. There are applications where what you care about is the number of predecessors (indeed, that's what the compiler cares about, which is why we have 0-indexing in the first place), but they are a tiny minority of all indexing.

I think there's nothing to say to the first two sentences but that I regard it as an argument that may be more or less convincing. I don't know exactly what it means for something to be an argument vs. a coincidence; it is a coincidence that, say, my name is what it is, but it is nonetheless correct for me to argue that that is my name.

These are all conventions anyway, and there is not much use (or even meaning) in arguing about which one is the right or wrong convention, just which one makes more or less sense; and this is one way to make sense of 0-based indexing, though of course there are also ways to make sense of 1-based indexing.

I'm not sure I buy that these cases are a tiny minority, but I'm certainly in no position to produce any data to the contrary.

> > I think an age-based way to phrase it: in your 1st year, your age is 0; in your 2nd year, your age is 1; and so on.

> But that isn't even true. No one ever reports the age of their new child as 0; instead, they will report a positive number of months, or -- if it's an extremely new child -- of weeks or days.

I can believe that ages aren't reported that way, although I think that a child who will be 1 year in 1 year should logically be said to be 0 years; but we can avoid that debate by considering future years: in, to pick the example that applies to me, my 42nd year of life, I am 41 (I could say 41 and 1 month, but I don't—in fact, there's a Seinfeld joke about that). Similarly, the 42nd entry in a 0-indexed array is indexed 41. It doesn't have to be that way, but I don't think one can argue that there's anything logically amiss about it (nor about 1-based indexing … but we do have to pick one).

thaumasiotes · on Aug 25, 2022

> I think there's nothing to say to the first two sentences but that I regard it as an argument that may be more or less convincing. I don't know exactly what it means for something to be an argument vs. a coincidence; it is a coincidence that, say, my name is what it is, but it is nonetheless correct for me to argue that that is my name.

Sure. But we're talking about whether to index arrays from 0 or 1. It is true that naming an array element after the quantity of its predecessors will tell you the number of predecessors the element has. But that's not an argument for why you should do it; there would need to be some kind of benefit to having that information. Without a benefit, it's just something that happens to be true.

That's the difference between an argument and a coincidence.

smugma · on Aug 24, 2022

I learned them in elementary school (in the US) as starting at 1. Indeed, Google's first result for natural number is from Oxford Languages, and defines natural number as:

the positive integers (whole numbers) 1, 2, 3, etc., and sometimes zero as well.

vbezhenar · on Aug 24, 2022

Waste 0th element or reuse it for something like length (hello, pascal strings). Another option is using base_address - element_size as your array value. Another option is using +element_size for all array accesses, assembly languages usually have this instruction. There’re many options to use 1-based indexing without sacrificing performance.

xg15 · on Aug 24, 2022

> assembly languages usually have this instruction

Is that so? I wasn't aware of that.

vbezhenar · on Aug 24, 2022

At least x86_64 can put something like `ptr[index * 4 + 4]` in one instruction (assuming that variable values are in registers). Not completely sure about ARM.

chinabot · on Aug 25, 2022

newer pascal strings are #0 terminated, original pascal strings were 255 chars max with string[0] being the string length, they were also not Unicode friendly, newer pascal compilers still support this as the shortstring type. I use pascal style strings in my embedded C library I wrote in 1989 much more efficient on 8/16-bit systems

Uptrenda · on Aug 24, 2022

This is far easier and clearer than the entire article. Good stuff.

iib · on Aug 24, 2022

0 for offsets, 1 for ordinals, that's usually the rule. Of course, both are conventions.

cm2187 · on Aug 25, 2022

I always assumed it started at zero otherwise you don’t use the full range of an unsigned integer and you can only have an odd number as your capacity (for whatever you use indexing for, not just RAM addressing).

twanvl · on Aug 24, 2022

Other advantages of zero based indexing, beyond being 'closer to the machine':

It works better with the modulo operator: `array[i%length]` vs `array[(i+length-1)%length+1]`. Or you would have to define a modulo-like operator that maps ℕ to [1..n].

It works better if you have a multi-dimensional index, for example the pixels in an image. With 0 based indexing, pixel `(x,y)` is at `array[x+widthy]`. With 1 based indexing it is at `array[x+width(y-1)]`. You might argue that programming languages should support multi-dimensional arrays, but you still need operations like resizing, views, etc.

armchairhacker · on Aug 24, 2022

Another advantage is with ranges: 0-based indexing and exclusive ranges work well. This is apparent with cursor position in text selection

Consider:

    Characters       h e l l o
    Cursor index    0 1 2 3 4 5
    Char index       0 1 2 3 4
    Range [0,3)     [0,1,2]
    Range [2,5)         [2,3,4]
    Range [1,1)       []

If we used 1-based indexing and exclusive ranges, it leads to ranges where the end index is greater than the string's length...

    Characters       h e l l o
    Cursor index    0 1 2 3 4 5
    Char index       1 2 3 4 5
    Range [1,4)     [1,2,3]
    Range [3,6) (!)     [3,4,5]
    Range [2,2)       []

but if we use inclusive ranges, it leads to ranges where the end index is less than the start index...

    Characters       h e l l o
    Cursor index    0 1 2 3 4 5
    Char index       1 2 3 4 5
    Range [1,3]     [1,2,3]
    Range [3,5]         [3,4,5]
    Range [2,1) (!)   []

Also:

    Characters            h e l l o
    Cursor index         0 1 2 3 4 5
    0-based range [0,3)  [0,1,2]
    1-based range [1,4)  [1,2,3]

for the 0-based range [0, 3), the left array bracket is at cursor index 0, and the right bracket is at index 3. With 1-based indexing it doesn't work like that because the range is [1, 4)

samatman · on Aug 25, 2022

This is the bane of my existence working with Lua.

Iterating an array or adding to the end are fine, we have ipairs and insert for that, but ranges on strings I'm constantly having to think harder and write more code than necessary.

I love the language, wouldn't trade it for another, but the 1-based indexing on strings, which represents an empty string at position 3 as (3,2), it's egregious.

Not as egregious as a dynamic language where 0 is false though.

chii · on Aug 25, 2022

> a dynamic language where 0 is false though.

well, C also considers 0 being false (and you can argue that C is "dynamic"!).

yccs27 · on Aug 25, 2022

Yeah, offsets are just easier to mathematically manipulate than ordinals. It's not just pointer arithmetic where it matters that item i corresponds to start+i*step. Any time you want to convert between integer indices and general linearly-spaced values, 0-based indexing is more convenient.

tabtab · on Aug 24, 2022

In many domains code maintenance is more important than hardware costs. In many domains 1-based indexing is a better fit, meaning less conversion code, meaning simpler code. Thus, the best indexing choice depends on the domain and circumstances, as do many controversial questions. Most tend to specialize in specific kinds of domains and over-extrapolate their experience into other domains.

wizofaus · on Aug 24, 2022

I'd agree so why did languages that allow specifying the base die (other than maybe VBA).

btilly · on Aug 24, 2022

Because changing the base is a great source of bugs.

That said, not all languages have given up on this. For example Julia allows it. https://docs.julialang.org/en/v1/devdocs/offset-arrays/

Ironically I learned this from a discussion of Julia bugs. Apparently changing offsetting of arrays has proven to be a source of bugs in Julia. So maybe someday they will come to the same conclusion as languages like Perl and stop allowing it.

wizofaus · on Aug 24, 2022

I'd argue 0-base is a source of bugs too! Ideally we'd be able to catch more array indexing bugs at compile time - there are definitely cases where it should be possible to determine that arrays are being incorrectly indexed via static analysis.

btilly · on Aug 24, 2022

The problem is that libraries which assume 0-base break when you have a 1-based array. And vice versa. Trying to combine libraries with different conventions becomes impossible.

Therefore changing the base leads to more bugs than either base alone.

That said, the more you can just use a foreach to not worry about the index at all, the better.

Of 0-based and 1-based, the only data point I have is a side comment of Dijkstra's that the language Mesa allowed both, and found that 0-based arrays lead to the fewest bugs in practice. I'd love better data on that, but this is a good reason to prefer 0-based.

That said, I can work with either. But Python uses 0-based and plpgsql uses 1-based. Switching back and forth gets..annoying.

wizofaus · on Aug 25, 2022

I'd expect the compiler not to let you to pass a 0-based array to a library function expecting a 1-based array. I'm pretty sure that's how it worked with Visual Basic, which was the only language I ever used such a feature in.

btilly · on Aug 25, 2022

You are demanding a lot from the type system.

Search for OffsetArrays in https://yuri.is/not-julia/ for practical problems encountered in trying to make this feature work in a language whose compiler does try to be smart.

ChrisRackauckas · on Aug 25, 2022

The type system does tell you if this is used. `::OffsetArray`.

btilly · on Aug 25, 2022

Yes, but did the programmer tell the type system that they are expecting an OffsetArray, they have tested it, and it will work correctly?

The existence of a mechanism does not guarantee its correct use. As that link demonstrates.

bluetomcat · on Aug 24, 2022

A disadvantage comes to mind. While the following loop works as expected:

    for (size_t i = 0; i < length; i++) ...

The following causes an unsigned integer underflow and is an infinite loop:

    for (size_t i = length - 1; i >= 0; i--) ...

mananaysiempre · on Aug 24, 2022

As Jens Gustedt points out[1], the following intentional unsigned overflow works perfectly for downwards iteration (even when length is 0 or SIZE_MAX), though it looks a bit confusing at first:

  for (size_t i = length - 1; i < length; i--) ...

You are also free to start at any other (not necessarily in-bounds) index, just like with ascending iteration.

[1] https://gustedt.wordpress.com/2013/07/15/a-praise-of-size_t-...

salawat · on Aug 24, 2022

Principle of least surprise violated.

Also, that behavior is not guaranteed. The programmer would need to be aware of how the particular machine in question actually handles that.

Then again, that's C.

sersnth · on Aug 24, 2022

Unsigned integer underflow and overflow are both guaranteed to wrap by the C standard.

411111111111111 · on Aug 24, 2022

Uh I'm confused but don't know c++.

why doesn't that loop end instantly?

I mean length - 1 < length should always be true, right?

Or does it only terminate when the number underflows? Terribly confused here

Jtsummers · on Aug 24, 2022

For loops are translatable from:

  for(initialize; condition; increment) { ... }

to:

  initialize;
  while(condition) {
    ...
    increment
  }

(more or less, some scoping things not encompassed by the above; this is also how pretty much every for loop in a C-syntax language works) The condition of a for loop is equivalent to a while loop's condition. So yes, length - 1 < length will be true on the first iteration, which is fine because the loop continues as long as that condition is true.

What the above approach takes advantage of is that when underflow eventually happens you'll have this condition:

  MAXINT < length

Which will terminate it for all possible values of length.

bobthepanda · on Aug 24, 2022

It‘s an unsigned int, so past 0 it overflows back to the maximum

411111111111111 · on Aug 24, 2022

Ooh, i see. I wasn't aware that they under/overflow at 0 when they're unsigned. Thanks for broadening my horizon!

naavis · on Aug 24, 2022

It loops while the condition is true. When an underflow happens, it stops being true.

msk-lywenn · on Aug 24, 2022

It’s a condition to run, not a condition to stop

wizofaus · on Aug 24, 2022

That was my reaction - why should anyone think it might be the latter? Are there languages that do have such a syntax without explicit keywords ("do...until")?

msk-lywenn · on Sept 3, 2022

I think lisp or scheme does. I was often confused by that when I was playing with it

billforsternz · on Aug 24, 2022

The loop continues until i transitions from 0 to 0 minus 1. 0-1 in this case actually doesn't equal -1 since size_t is an unsigned type, instead it wraps around to be the largest possible positive integer instead. TLDR; yes as you speculate it terminates when the number underflows.

xxs · on Aug 24, 2022

the footnote [1] should be [0], just for the sake of this very topic.

Seriously though, while the idiom does work for unsigned integers, it's a bad idiom to learn [makes code reviews harder]. The post-decrement one in the loop body works with everything (signed/unsigned), and it's well known.

operator-name · on Aug 24, 2022

In that specific case I'd do the following:

    for (size_t i = n; i-- > 0 ;) ...

Or count from `length` to 1, but subtract 1 in the loop body, or count up and subtract the length in the loop body. Any modern compiler should be able to optimise these to be equivalent.

In the majority of cases, counting down is not necessarily. Nor is ordered iteration. Most languages have a `for each` style syntax that's preferable anyway.

cylon13 · on Aug 24, 2022

Ah yes, the goes-to operator -->

tomjakubowski · on Aug 24, 2022

And the wink operator, so the compiler knows you know the deal

bmacho · on Aug 25, 2022

I like how GP is called "operator-name" and instead of doing himself, he makes others joking with operator names. Although I'm not sure if it's altruism or highly manipulative behaviour.

wizofaus · on Aug 24, 2022

Gold - have an upvote!

divbzero · on Aug 24, 2022

Or alternatively:

  size_t i = length;
  while (i--) ...

jecel · on Aug 25, 2022

Unless I am remembering C wrong, this would work but the

   for (size_t i = length; i-- > 0 ; ) ...

that several other people posted would not execute for index 0. Shouldn't it be this instead?

   for (size_t i = length; --i > 0 ; ) ...

xxs · on Aug 24, 2022

the correct way/idiom to reverse iterate an array is

  for (size_t i = length; i-- > 0; )...

It's surprising how often the issue pops, it works well with both signed and unsigned integers.

(edit) I've started with one based indexing (basic)... mixed with 0 based (assembly), more 1 based (pascal), then more stuff (all zero based). I am, yet, to see a real advantage of a one based indexing... after the initial process.

empiricus · on Aug 24, 2022

Is this cache friendly?

sgtnoodle · on Aug 24, 2022

Sure, why wouldn't it be? As far as a cache is concerned, I don't think reverse sequential iteration would be any different than forward sequential. The actual RAM accesses may be less optimal if there's some speculative pre-fetching with assumed forward sequential access, but that's conjecture.

ehvatum · on Aug 25, 2022

With some exceptions, hardware prefetch works in terms of ascending accesses. To learn if a particular CPU will prefetch for descending access, benchmarking is essential. Best to use soft prefetch calls if performance is critical.

chii · on Aug 25, 2022

i would suspect that the cache prefetch/prediction could use the "velocity" of the memory access to predict the next access; so if the access pattern was going backwards, the "velocity" would be negative, but prefetching would still work if they just followed the predicted pattern.

ehvatum · on Aug 25, 2022

It is not, unless your compiler is smart enough to recognize reverse iteration and prefetch appropriately.

If performance matters, you should experiment with __builtin_prefetch, which is available in clang and GCC.

msk-lywenn · on Aug 24, 2022

It’s not. It was nice on architectures were cache didn’t matter much and were subtracting and comparing to zero was just one instruction (looking at you old core ARM)

bregma · on Aug 24, 2022

In the C programming language unsigned integers do not overflow. They wrap. This is well-defined behaviour and the example code is simply incorrect. Most modern compilers will give you a diagnostic for this.

ncmncm · on Aug 24, 2022

If you have foolishly turned off -W in your build system, that could happen. Otherwise, you get a nice warning pointing out your folly.

nemetroid · on Aug 24, 2022

Unless wrapping underflow is sensible for the domain (which it isn’t when representing the size of something), unsigned integers are usually a bad idea.

pwdisswordfish9 · on Aug 24, 2022

You can always rip a page out of C++’s playbook:

    for (size_t i = length; i > 0; i--) {
        // ...
        item = array[i - 1];

(This is how reverse iterators work in C++.)

10000truths · on Aug 24, 2022

  for (size_t i = 0; i < length; i++) {
      size_t j = (length - 1) - i;
      ...
  }

EDIT: change i to j

yuliyp · on Aug 24, 2022

That's quite broken (you'd want a different variable inside the body, vs clobbering the iteration counter, else this would process the last item in your list, then exit).

rabbidruster · on Aug 24, 2022

I hope you haven't done this anywhere.

Makes my brain hurt, but I think this will only run through the loop one time looking at the last element of the array.

unwind · on Aug 24, 2022

Uh no don't reassign the loop variable in the inner scope. Use:

    const j = (length - 1) - i;

in that case. Much safer.

Banana699 · on Aug 24, 2022

You should save the old value of i somewhere and restore it back at the very end of the loop. Or simply define a new j like another comment says.

mananaysiempre · on Aug 24, 2022

A rare case where 1-based indexing is more convenient is complete binary trees laid out breadth-first (as in a standard binary heap): parent is i div 2 and children are 2i and 2i+1 when starting at one and who knows what when starting at zero. But that’s the only one I know.

wizofaus · on Aug 24, 2022

Except 1-based indexing is what we use in normal language. We don't use "zeroeth" or "player (number) zero" etc. And the word "first" is shortened to 1st etc. Personally I think we'd be better off if programming languages stuck to the same convention - off-by-1 errors aren't the hardest problems to deal with but they're still annoying.

ChickeNES · on Aug 24, 2022

Well: https://www.merriam-webster.com/dictionary/zeroth

wizofaus · on Aug 25, 2022

Sure, it's technically a word, but hardly one you'd casually drop into your conversations (with non-programmers)

Quekid5 · on Aug 24, 2022

> "player (number) zero"

That would be because any game worth playing has at least one player... and so it's natural to continue from there. (In terms of language.)

OscarCunningham · on Aug 25, 2022

You might have heard of Conway's Game of Life.

Quekid5 · on Aug 26, 2022

Fair :)

lokedhs · on Aug 25, 2022

That's confusing ordinal and cardinal numbers. The element with index 0 is the first number. The element with index 1 is the second number, and so on.

Using the term "zeroth" is basically some form of showing off (even though it's kinda fun), but will be utterly confusing when you get to the fifty-second element which is the last in a group of 53 elements.

wizofaus · on Aug 25, 2022

I'm not confusing them, my point about abbreviating "first" as 1st was that in typical speech we start counting at 1. Nobody says "let's start with item zero on the list". But programmers are stuck with having to say/think "item 0 in the array".