Hacker News new | past | comments | ask | show | jobs | submit login
Why do arrays start at 0? (buttondown.email/hillelwayne)
413 points by snikolaev on Aug 24, 2022 | hide | past | favorite | 674 comments



I don't quite understand the argument "0-based being easier for pointer arithmetic is nonsense because the language doesn't have pointers".

Whether or not the language presents the concept of "pointer" to the user is independent of whether or not it uses pointers internally. And if it exposes arrays as a concept, it has to implement them somehow.

The simplest possible implementation of arrays is having a start address and putting all elements next to each other in RAM. To get the address of a particular item, this layout naturally leads to the formula "base address + index * element size", with "index" being 0-based. If you want to expose other indexing schemes in your language, you'll have to add more logic to convert the user-visible index back to 0-based before you can obtain the address.

All of this is completely independent of the fact whether your language exposes pointers to the user or not.

Even the yacht story sort of hints at this:

> To keep people from having their jobs cut short by yacht-racing, Richards designed the language to compile as fast as possible. One optimization was setting arrays to start at 0.

If all indexing schemes were equal, why would this change even be an optimisation in the first place?


The "language" that has a 0-based indexing scheme is assembly. An HLL with 1-based counting that compiles to assembly will introduce additional computational overhead for the translation of the index.

If 1-based indexing was used in assembly, then "mov 1(%ebx),%eax" would be the equivalent of "mov (%ebx),%eax".


> additional computational overhead

at compile time, not runtime.


For constant addressing; for arr[i] = 2, you'll still need to subtract 1 from i with 1-based addressing when converting to machine instructions.


> for arr[i] = 2, you'll still need to subtract 1

You can, but don't need to. Just have compiler store array pointer constant as (arr-1) instead of arr, et voila, zero runtime overhead.

Or, for modern(-ish) ISAs, often you can add/substract a small constant at runtime, with no extra cycles taken. For example, for x86_64:

  # rbp contains "true" pointer to arr
  # rax contains 1-based array index
  mov rax, QWORD PTR [rbp-8+rax*8]


Another option for users is to "sacrifice" the 0th entry of an array. Depending on the size(as in sizeof) of the entry, it can be worth it.

A benefit is that you can then use 0 as a sentinel value; for instance if you have a find() routine that surely can fail, it can just return 0 instead of having e.g. -1 (which can introduce minor issues).

In my experience, though, I am so used to 0-based index that switching schemes can cause stupid off-by-one bugs. I guess that's the main reason behind complains about Lua. It's not that "natural" arrays are thought of as bad, but mixing both schemes (often C an Lua) is error-prone.


It's basically only x86 among modern ISAs that lets you do base + literal + register * literal, aarch64 only gives you base + register shifted by a literal, and I believe RISC-V is similar: https://gcc.godbolt.org/z/dz768768z


>and I believe RISC-V is similar

Yes it is. It was evaluated, carefully weighted and discarded, as it was not worth it.


But it will still execute with likely no extra time at all due to OOE and how fast arithmetics are.


OOE doesn't necessarily save you if you end up with a hard dependency on the value of the read (and even if it did, the little cores on ARM SoCs are in-order). This is a pretty obvious candidate for macro-op fusion, but I'm not sure whether this actually happens (and if it happens on ARM little cores, etc.)


If it is not on the hot path, it is likely free, but not guaranteed. If it is on the hot path then it is wasting a whole cycle. And of course in highly ALU-dependent code it is another instruction, so a fraction of a clock.


What do you mean it wastes a whole cycle? It may indeed have worse performance due to blowing the instruction cache, but I don’t see why would out-of-order execution be slower on the hot path - I doubt there would be too many hot paths without any dependence on memory fetches outside specific benchmarks - the memory loads will take significantly more time even if they hit cache.


What if the variable is used for array indexing and something else, like displayed to the user?


Can't say I've really thought this through, but couldn't you just subtract 1 (*sizeof(X)) from the arr address?


What would memset(arr, 0, sizeof(arr)) do in that case? I can see myriad problems with having arr itself point outside the memory range allocated to arr[min..max].


Hmm, maybe?

You'd still have the odd subtract here or there, but for many use patterns you could probably ignore the overhead.


For most cases you could get away with some index transformation logic in the compiler (but probably your language would need special types for indices, so I do agree with you that it is not worth the trouble).


> An HLL with 1-based counting that compiles to assembly will introduce additional computational overhead for the translation of the index

Not necessarily so for translation of an index that is evaluated at run time. PL1 (back in the 1960's) compiled arrays into 'dope vectors'. Each array's dope vector included highest and lowest valid subscripts for each dimension of the array and the address where the element with all zero subscripts would be found, if it existed.


Not sure if assembly/machine code is that relevant. If one based indexing was more prevalent, the LEA instruction on x86 would just subtract one during execution


u/blutomcat's point clearly (I think) was that instruction sets didn't do what you suggest, so zero-based index is (was and still is) what you'd do if you wrote in assembly (which was not uncommon!), and on any given platform, assembly was the main language 50-60 years ago. It follows that it's easier to carry that over to higher level programming languages.

Whether that's what actually happened in the cases of HLLs that adopted zero-based arrays or not, I don't know. But today, to me, zero-based array indexing feels very natural, and the idea that zero-based arrays being simpler in assembly carrying over to HLLs seems at the very least plausible.


"just subtract one" would take a long time 50-60 years ago.


If you did this, you'd subtract one once at array creation. Or subtract whatever the offset is (really, you'd subtract 1 * data type size). Under the hood, using C syntax instead of assembly:

  int foo[n]; // what the user wrote

  // what happens behind the scenes, after a fashion
  int* foo = malloc(n * sizeof(int); // or sp - n * sizeof(int) if stack allocated; subtraction since stacks usually grow "down"
  foo = foo - 1;
All references into foo will now work just fine so long as they are within the [1,n] range (same issues as with 0-based there since C doesn't carry size information for checking array bounds access). This adds one extra instruction per allocation (which includes allocation on the stack) for all non-0 offsets, but then all access will have the same cost whether 0-based, 1-based, or arbitrary-based. That's a non-zero cost, but it's not exorbitant since you'll be accessing much more often than allocating (and if it's reversed, something weird is happening).


Since as you say, C doesn't carry size information, you would actually need to do it each time a pointer is created from an object. In C you can have arrays of arrays (not pointers) or arrays in structs, where there is no pointer on creation:

  // 11 arrays are created here
  int foo[10][10];
It doesn't make sense to create pointers for the 10 inner arrays, so the subtraction would presumably happen when referring to each inner array:

  int *z = foo[x]; // (char *)foo_sub_1 + (sizeof *foo)*x - (sizeof **foo)
  foo[x][y]; // (char *)foo_sub_1 + (sizeof *foo)*x - (sizeof **foo) + (sizeof **foo)*y
In any case, this is more work, both for the computer and for the human. It's analogous to trying to figure out "which year of which century is X in?" (2022 is the 22nd year of the 21st century):

  year(x) = (x - 1)%100 + 1
  century(x) = floor((x - 1)/100) + 1
The above formulae work for the 1-based convention, because they do the necessary 1 subtraction/addition in the right places. If we instead counted from 0, things would be a lot simpler (2021 would be year 21 of century 20):

  year(x) = x%100
  century(x) = floor(x/100)
Unfortunately, the Romans started counting years before anyone knew about 0. I think this is basically why we have a convention of counting from 1: 0 simply wasn't discovered until relatively recently in human history. Earlier number systems such as Greek/Roman numerals don't have a way of representing it.


when I count apples I happily start with one. it's "natural" to label the first thing "1".

or how would you count, say, 3 Apples? or three legionaries?


Counter-example:

You have to remove all desks from #20 to #30, how many desks are there?

Oh, it's 11 desks, I see. This is because we used INCLUSIVE indexing, instead of semi-open interval.

If we use semi-open intervals, we don't include the last item, so that we can write

    match index {
        0..10 => println!("first ten"),
        10..20 => println!("next ten"),
        _ => println!("something else"),
    }
but this forces us to start our intervals at 0, otherwise we would have to write 1..11 which would be awkward


Indeed. End exclusion is advantageous because we don't need to add or subtract 1 when constructing or interpreting ranges. Forgetting to add or subtract 1 is basically why off-by-one errors exist (and doing add rather than subtract or vice versa is probably why off-by-two errors exist).

`a..(a+n)` is an n-length range, not an (n + 1)-length range (oh look, an "add 1" operation!).

And an empty range is denoted by `a..a`, not `a..(a-1)` (oh look, a "subtract 1" operation!).


doesn't swift use inclusive indexing? 1...10 includes 10


Haven't used Swift, but looking at the documentation, it seems to support `...` for end-inclusive and `..<` for end-exclusive.

Looking at the Swift standard library, end-exclusive ranges seem to be far more commonly used:

  ~/build/swift$ git grep -e '\.\.< *[a-zA-Z0-9_(]' --and --not -e '^ *//' -- stdlib/\*.swift | wc -l
  599

  ~/build/swift$ git grep -e '\.\.\. *[a-zA-Z0-9_(]' --and --not -e '^ *//' -- stdlib/\*.swift | wc -l
  112


[0, 1, 2] would be how I would refer to my 3 apples. Each apple is identified by the number of apples that precede it.

I don't think it's inherently more natural to start from 1, just conventional. Disregarding history/convention, I think it would be more natural to use the lowest available natural number.

Back in Roman times, the lowest natural number that people were aware of was "1", so obviously they started counting from that number.

Our understanding of and use of mathematics has evolved since then, and accordingly there are fields such as computer science and combinatorics where there are clear advantages to starting from the smallest number (zero). In virtually all other cases, the reason "1" seems more natural is because that's the way it's been done historically.

It seems that when labelling those apples using a 1-based count, the logic is basically: each apple is identified by the number of apples that precede it, plus 1. The reason for the "plus 1" is that that was your starting number, but it could have easily been 2 or 3. If you instead start from 0, you can omit the "plus X" logic, just as I omitted the "+ 1" and "- 1" logic in my year/century formulae when moving from 1-based to 0-based counting.


The romans weren't the only people who knew how to count back then... And how many of the other (presumably arbitrary) choices for smallest number were zero? And why did it take over one thousand years to come up with ordinal "zeroth" after we knew about cardinal "zero"?

> It seems that when labeling ... number ... that precede it, plus one.

I don't think so. It's the number that you have counted once you've counted that one.


How do you write zero in Roman numerals? Answer: you can't. Even though they used their number system for adding amounts of money, they hadn't figured out "cardinal zero" yet.

It was introduced into western mathematics through Fibonacci in the 1200s at the same time that Hindu-Arabic numerals were adopted, which use "0" as a placeholder (compare this to earlier Greek numerals which work similarly to the system we use today but without placeholders and using different sets of symbols for the different places—and of course no way of representing zero).


That's what I'm talking about (though I guess the "thousand years" estimate was slightly high). Why did it take hundreds of years after the introduction of zero to get "the zeroth element of a sequence", which is quite a new usage, and even now is confined to specialized fields?


The well known “Numerical Recipes in C 2nd ed.” talks about how the first edition promoted exactly this - offset the pointer after allocation and you can use any indexing origin you want. And they promoted 1-indexing and wrote all the algorithms as such, before switching everything to 0-indexing for the 2nd edition.

I agree with you the subtraction instruction is not exorbitant, especially considering malloc is much more expensive and always has been.


Integer add/subtract never really took a long time. Slower than today’s sub-nanosecond, and it depends on the computer, but there were CPUs in the mid 1960s doing millions of instructions per second. They weren’t crazy slow, they were just crazy expensive.


A million, perhaps. But you're talking about adding one instruction to every single data access. Plus your code bloats up with all those extra dec instructions. BCPL and C were created for departmental and lab computers like the pdp-9.

Note, for example that the mighty 6502 succeeded in the market because its designers had heard from customers that the $100 price for the 6800 was too much, so they removed some instructions and addressing modes to get the die small enough that it could be sold for $20.


> But you're talking about adding one instruction to every single data access.

No, you misunderstood. You subtract one from the pointer at allocation time. You don’t add an instruction every time you access.


And even if it was an instruction for every access, memory access is much more expensive than and add (at least nowadays -- I don't know if that's always been true).


Similarly I like to think about it as the add/subtract instruction is very small compared to malloc (and maybe could be included in a custom allocator for free). So the one extra instruction will always be somewhere between relatively minor to negligible and unnoticed.

Yeah today memory access on a cache miss is really expensive. I think memory was normally running at a different clock rate than the CPU even 50-60 years ago, so has always been something where you can’t touch memory every instruction. The latency of a cache miss wasn’t nearly as bad in the past. When the data cache was invented, a main memory access was like 4 times slower than register access. Today, main memory access can be more than a hundred time slower, sometimes it can approach a thousand times slower, which is why we always have multi-level caches now. And memory latency is still getting worse. If someone figured out how to make memory faster without increasing the cost or the energy consumption, they’d be rich! ;)


would it? or would it be just a differently wired circuit?


the "differently wired circuit" would be an extra stage of logic computing a carry all the way from lsb to msb (an ALU outside the ALU?) and would contribute a fair bit of extra time. Easier to just use the ALU to do it, which is inserting an extra instruction, also a time waster.


If you're already doing base index addressing (eg, [SI + DI]) you're already using the ALU to compute your memory address, and it's not _that_ much extra wiring to have the ALU support either a constant power-of-two displacement or even a three-way addition; a constant displacement for a basic ripple adder is a XOR and an AND NOT, eg. The x86 family does three-way addition (and a shift) to support "[ESI] [EBX * scale + displacement]".


It's not just the extra wiring, but the extra time; each additional computation layer in an integrated circuit introduced delays that would made the operation noticeably slower.


Well, maybe, if the specific use case of 1-based arrays was all that got supported.

It'd be a pretty huge waste of processor design space to do that, though: if an HLL wants to support n-based arrays for n != 0 it can just store the base pointer with the initial index offset already applied.

Early 8-bit processors had a pretty poor set of addressing modes: you couldn't even expect [R1 + R2] let alone [R1 + R2 * scale + displacement], so random access into an array would be a multi-instruction task. By the time of the 80386 base scaled index addressing with displacement was in the CPU core, but it's not _free_ - there's an additional byte in the instruction encoding for the displacement, so even if the CPU computes the displacement with no additional clock cycles you'll have a latency cost for fetching the byte.


I would be fine with it all if 0-based was called "offset", with "index" reserved for 1-based.

The element at offset 0 is the first element.

Precision in naming is important.


Defining index as the element wise offset is a precise well formed definition/name.

It's also a definition commonly used in math, through not always. It depends a lot of the area of mathematics and even the cultural context.

The only reason we sometimes feel 0-index is wrong IMHO is because the english language describes entities in a sequence as 1st, 2nd, 3rd etc.

But I wouldn't be surprised if there is some human language somewhere which does it differently. Like something which roughly matches to english like "head, head+1, head+2, ...".


Which floor is the first floor in your building?

Depends which country your building is in, and more!

My apartment building is 1,2… but my mall is G,M,1,2… (same country different architects). For many years I lived in Europe where it’s usually G,1,2… (where G can also be E or Fsz or whatever) and when I was in the US I had to remember that 1 is G, though L,2,3… is also common.

City people live with index ambiguity all the time and our brains manage just fine.


Quick complement. Where I’m from (Portugal) we use the literal number 0 for the ground floor.


In Europe is common for elevator to use zero for ground, even if language/culture has a specific name for it (piano terra/piano rialzato here)


Actually, "rés-do-chão" would be the more common term. Still... in Portugal, the "first floor" is implicitly never the ground floor (but the floor above it), so 0-based indexing is respected (as you suggest).


On elevators you don’t see “R/C” unless the elevator is very old, and the same for office building directories or shopping centres. In almost all cases it’s the number zero. OTOH, in countries like Germany you will see the “E” for Erdgeschoss everywhere.


> Which floor is the first floor in your building?

0.5 ... yep that won't work

In some buildings build around 1900 where I live the "ground" floor is not on the ground but around 1/2 a floor above the ground floor. While the cellar is just half below the ground. Because of this the cellar has above ground level windows allowing legal cellar apartments (oversimplified in Germany apartments require windows which are above ground).

This creates a lot of confusion.

By convention this 0.5 floor still counts as ground floor with the floor number 0. But people confuse it all the time for the first floor. Especially if in some cases you do have a few utility rooms exactly at ground floor.

Now things get worse if you live in a cellar apartment in such house: By convention floor -1.

Because nearly everyone will thing you mistyped as normally you legally can't have apartments below the ground floor.

Another fun index ambiguity is for apartments build on a mountain side/cliff.

In this case depending on which side of the house you look at the ground floor might be at, above or below the ground level. Often the floor level with the main entrance counts as ground floor, but sometimes it is not the case and e.g. the lowest floor not below ground or the highest floor even with a "ground".

So yes, index/floor level ambiguity is everywhere and much worse then just is ground floor 0 or 1 ;)


Not only that but which floor is the 13th (in the West) or 14th (in China)? Often those numbers are skipped because "luck", "feng shui", or other superstitious nonsense.

Do you count the machine floors in high rise buildings?

In AU, the ground floor is effectively "zero", the first floor is the one above ground. Depending on the design of the building, there might be a "mezzanine" floor between ground and 1st.

Basements usually follow the same pattern, so B1 is the first floor below ground etc.


In one department store in Finland there are two separate numbering schemes, see this photo from the elevator: https://hs.mediadelivery.fi/img/some/default/c244c43c3008429...


In Belgium (french speaking part of the country, like in France) we call the ground floor (where you enter the buidling) "rez-de-chaussée" (or "rez" for short) which is the equivalent of 0 and then the first floor, numbered 1, is on top of it and you use an elevator or stairs to reach it.


Yes, where I'm at — large country in Asia — we count floors starting from 0 (aka G for Ground).

Confused me when I visited USA.


G or L (for lobby) followed by 1 isn't uncommon in the US.

> Confused me when I visited USA

It's applied inconsistently enough to be confusing for natives too!


I've just come back from the vacation in the US, and my experience was: when the hotel has G or L for the ground floor, it still has 2 for the floor directly above it. "G or L followed by 1" never happened to me.


It's the consistent inconsistencies that causes it to be less confusing. It would be really weird to go from someplace very orderly to another that is haphazard.


> Defining index as the element wise offset is a precise well formed definition/name.

But it's less precise than offset and obviously introduces confusion. Whoever originated the term "0-indexed" should have just suggested we use a better word rather than keeping the inaccurate word (proven by the fact you're currently looking for a way to remove some confusion around the currently chosen word) and prefixing it with a digit, which itself is a confusing thing to do to an English word, thus adding to the confusing while subtracting none.


you can say the same for 1-indexed, because from the get to go both existed.

Also it's not less precise, as there are many different kinds of offsets too. First you have element wise and byte wise and similar offsets, then offsets are inherently relative to something, and that's not always the first element. But could be an offset from the back and I have even seen a offset from one element before the first element in some very unusual use-cases.


But we'd have to change all the "i"s in our for loops with "o"s across the planet, maybe even beyond. I'm willing to help, whatever it takes.


> Precision in naming is important.

It really isn't. The human race gets by without it the vast majority of the time.


The human race doesn't do science or engineering the vast majority of the time. When it does, precision in naming is important.


That so people in this thread argue about the higher-level language (missing the point) shows that few people found access to the underlying machine code.

Which is sad, because all code is still executed as machine instructions even when the developer does not see it or does not want to care.


We're 20 years past the point where you can expect everyone in tech to trace every instruction down to machine instructions. Higher level languages abstract away the need for it, and for the most part, we can rely on the authors of those languages to make many of the decisions that impact performance. The rest of us learn about these details on posts like this. It's not a sad fact, in fact it's probably one of the most useful things that's happened in tech. It frees people to think about other problems.


I barely touch assembly in my day to day work, but I do understand on a fairly deep level how a computer works, which I feel is extremely important and very frequently influences how I write high-level code.

Certainly one can bang out code their entire career without ever having a clue how machine code works, but I really wouldn't advise it. At worst it leads to total ignorance, and at best you accumulate a disconnected set of "best practices" as inscrutable lore handed down from on high.


When I did web dev it was more important to know how the WEB BROWSER works, not how machine code works

it's not useful to know how the browser allocates memory or garbage collects it, it's more useful to know that it has web sockets. Don't get the idea that things you know are relevant to other people.


You go as deep as you need to. My day job involves writing tons of Javascript and there is zero need for me to dive into the underlying processes until something goes wrong. So far the lowest I’ve ever needed to go was the node source.


What do you consider "high-level code" and in what way does deeply understanding how a computer works influence your approach?


In college, I optimized the array removal routine from O(N^2) to O(N) by using INVARIANTS

But it probably ran in like 100us anyway


Ugh, invariants...

Anyway, complexity stays relevant, no matter what language you use. That's the real deal.


It only stays relevant if the JavaScript is too slow. In the case of removing two elements from an array of length 8, it's not too slow to remove them one by one since, again, it runs in microseconds


If you're not raining about how your code translates to logic gates, you're not a competent programmer. /s

Abstractions are good, and while we should be careful not to completely trust a brand new abstraction, high level languages are established enough that we don't need to worry about it.


20 years ago ASM was just as cryptical black magic to most of programers as it is today... try perhaps 40 y/a...


Not sure if you were around 20 years ago, but you sure couldn't expect everyone in tech to understand machine instructions as well as the layers above, all the way up to corporate politics.

But some people, yeah, they could do it.

That has not changed.


If this were the primary reason, it seems odd that some of the earliest popular languages used 1-based indexing in an era with much slower CPUs (<= 1 MIPS) and less available memory to store instruction code. If the authors of FORTRAN were comfortable with 1-indexing on the IBM 704 (40 KIPS) in 1957, it couldn't have been prohibitive. At the same time, as computer use cases became more interactive there may have been a greater focus on performance, but it likely that the semantics of the high-level language were at least as important.


“machine code” isn't really the zero point that's special nowadays.

The code the programmer wrote is compiled to something such as LLVM IR, LLVM IR is further compiled by LLVM to Assembly, this is further compiled by an assembler into machine code, and then the c.p.u. further compiles this to it's internal code as it executes it. “machine code” really is no more special in this chain of events than, say, LLVM IR.


Machine code is special because its the only thing the CPU can actually execute.


... then the c.p.u. further compiles this to it's internal code as it executes it. “machine code” really is no more special in this chain of events than, say, LLVM IR.

I think this is the core point. Even though you are mechanically feeding machine code into the CPU, modern CPUs deeply transform your program before (or even while) running it, with potentially deep performance implications. With modern CPUs, the only choice we truly have is which compiler we trust to optimize our program.


Sometimes it executes it exactly how you think it will execute. But it's still slower because the "less optimized code" fit in its cache.


It's slightly different - machine code is special because it's a visible API between our programming work and the CPU sillicon. Microcode is private implementation detail which you usually can't affect as a software developer. Machine code is something you are directly creating by writing things though.


Why is that visible a.p.i. different than, say, LLVM i.r. which is also a visible a.p.i.?

Why is the c.p.u. special?


Except if the CPU is microcoded, those instructions might not mean what you think they do, and you need something like Intel's V-Tune to actually understand what the CPU is doing with them.


> ... to actually understand what the CPU is doing with them.

x86 has a developers manual for this purpose.


It has, and if you manage to know everything on those pages, for every single processor on the market, while taking into consideration motherboard architectures, and OS workloads you're our hero.

We are way beyond MS-DOS days and Michael Abrash's books.


It was just most basic example. Indices are connected with math all the time. Consider implementing circular queues with 1-based arrays. Any arguments about what’s _possible_ miss the point. Arguments about what _feels_ natural purport to know the human mind which I find generally suspect. Mostly I think of computers as tools and am interested in getting the most functionality out as smoothly as possible not as slaves whose job it is to make my life worry free.


> To get the address of a particular item, this layout naturally leads to the formula "base address + index * element size", with "index" being 0-based. If you want to expose other indexing schemes in your language, you'll have to add more logic to convert the user-visible index back to 0-based before you can obtain the address.

The easy way to do that is by shifting the base address. In pseudo-C (I think that computing the b pointer is non-conforming, even if the code never tries to access b[0], but compilers can do this without problems):

   int a[100];      // array of 100 integers, zero-based
   int * b = a - 1; // array of 100 integers, one-based
Things only get costly when you want to check array bounds or when you have multi-dimensional arrays. There also may be non-standard architectures where this kind of stuff isn’t possible.


In this case, b is a pointer that does not point to a valid address. You can't memset(b, 0, 100), or do any of the other regular pointer things with b.

It feels like you're introducing a million edge cases.


Okay, now do 2D arrays.


This is undefined behaviour, as a pointer is only allowed to point to valid addresses (as well as the index of the last element + 1, but it can't be dereferenced).


Should work 99.99999% of the time until someone tries to put an array at 0x000001 :)


I think, in theory, it would work regardless of the starting address. As long as you don't try to access the invalid address (which you wouldn't assuming that it's starting in the index 1, you would always be accessing the first valid address)


In theory, it’s not guaranteed to work at all. https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2310.pdf#p... (emphasis added):

“In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i + n-th and i − n-th elements of the array object, _provided_they_exist”

[…]

If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; _otherwise,_the_behavior_is_undefined.

In this case, the minus-one-th element doesn’t exist, so the expression

  int * b = a - 1;
triggers undefined behavior.

I think some compilers use this in practice to produce faster code (that, often, will not do what the programmer expects it to do). Start reading at https://stackoverflow.com/questions/56360316/c-standard-rega... if you’re sure they don’t. I expect that will change your opinion.


That is a description of the commonly agreed upon definition of the C abstract machine and language semantics. You could simply define the language another way with regards to this behaviour.

Not that I would want to do it, I think zero-based addressing is not very taxing for the convenience of being closer to how we think of memory addressing.


Amusing but probably irrelevant: the sailing yacht racing handicapping calculations mentioned are almost certainly under the Performance Handicap Racing Fleet (PHRF) system[0]. This system is "zero based" in its own way. A sailboat's performance is assessed and handicapped against a standard yacht with a zero rating. Your time to complete the race course is adjusted by your handicap to determine your final standing. This lets you race slower and faster types of boats in a "fair" way.

[0] https://en.wikipedia.org/wiki/Performance_Handicap_Racing_Fl....


Implementation matters for performance, but even beyond that, the interface matters for users. 0-based offsets are convenient for users doing math on indexes.

A pointer is just one kind of array-like indexing scheme. Good pointery languages will distinguish Address from Offset from Integer.


Nats start at 0, end of discussion - it's only logical to index by the naturals.


That's really silly - the natural numbers also famously start at either 0 or 1, depending on the country, the discipline, and the individual.


I like to treat the naturals as starting at zero, but not for any mathematical reason. I just think that since it's easy to write ℤ+, it's more useful for ℕ to refer to something slightly different than it would be for ℕ to be a synonym for ℤ+ while anyone wanting to include zero in their set was forced to spell out "nonnegative integers".

A dichotomy between "ℤ+" on the one hand versus "ℕ" on the other is just plain more convenient than a dichotomy between "ℤ+" and "ℕ" on the one hand versus "ℤ\ℤ-" on the other.


It is because you mean two different things by "natural numbers", even though the same words are used for things that are not the same things.

I prefer to start by zero; I think it is more useful in general, and makes more sense mathematically for many (although not all) purposes, and systems that you will find objects and operations that have these properties too.


If there were any justice in the world, they'd start at 2, as a handicap. They might even make it out of the bottom of the NL east if that was the case.


> Nats start at 0, end of discussion - it's only logical to index by the naturals.

Now you'll just bring out the people who start the naturals at 1.


Starting the naturals at 1 is quite onerous.


That’s why I compromise and start at -1


other important and day to day tools like rulers, clocks and speedometers also start at zero. So it's not exactly "bending humans to microprocessors weird alien ways".

It all makes perfect sense in the context of measuring.


The analogy is slightly off though. The first position of an array occupies 1 unit of memory. 0 is not a possible measurement in the world of arrays.

Maybe a null terminated string could have length 0 if you don't count the terminator. But that 0 is a property of the "string" abstraction. The actual "array" would still be length 1.


An array of length 0 starts at 0 and ends with 0


> An array of length 0 starts at 0 and ends with 0

Nope. The moment you "start" somewhere you occupy 1 unit of memory. Thus no longer an "array of length 0".

There is no such thing as an array of length 0. It absolutely does not exist. You cannot write source code to represent it.

ie assuming we are talking about the pure data structure of an array. Some languages may have some abstraction built on of arrays (ie c-strings) that can have length 0 but these are not "arrays" and if they are defined they still occupy 1 unit of memory for the terminator.


C++ requires every value to take at least one byte of memory, but this isn’t true of all languages. Additionally, C++20 adds [[no_unique_address]], which allows zero-sizes types.

iopq points out Rust as a language that does allow zero length arrays in a sibling comment.

While ISO C does not allow zero length arrays, GNU C does. There’s also flexible array members, which can be length 0.


I'd say this transcends language.

Sure a language can define some abstraction that says "hey I'm length-0 array!". But this is more like a statement of "hey I don't exist yet, but if i do exist in the future I'll be of type array and at least length-1".

You can't access index 0 of a length-0 array.

You can only access index 0 of length-1+ array.


Allowed in Rust:

https://play.rust-lang.org/?version=stable&mode=debug&editio...

it's an array of length 0, but it can't be "used" because it has a length of 0 so it occupies no RAM


> An array of length 0 starts at 0 and ends with 0

But such an array does not "start" at 0 as it does not exist. You need an array of at least length-1 to "start" at 0.


I can write a loop like this:

`for(int i=0; i < 0; ++i)` and it will run 0 times, the exact amount of times I expected


But again your not starting at "index 0" of an array that exists.

This loop is not processing any physically existing array. In fact the compiler would remove this block of code as it is a no-op.


The programming languages that I use most often can all represent empty arrays just fine. Of course, indexing into them is an error. But defining them is commonplace, and not being able to define them would make a lot of my code a lot more complicated.


With those other things zero is zero, but with arrays the zero index is the first item. The disconnect really starts when you get the size of an array and then have to minus it by one to have the last index.

To take your ruler example and apply array logic to to. Say you are measuring a 2cm thing. If the ruler follows the index then it would show 1cm (the last index), and so you would have to add + to get the size/length.

Having the index not align with the size is a pain that results in a lot of +/-1 code that a compiler could just as easily have handled.


it's a good idea to call the 0th element of an array the zeroth element, else you encourage off-by-one errors. The size of the array, the "count", is that which your index must be less than.

I'm not saying it has to be this way and it can't start with 1, I'm saying what you are pointing out as a flaw is actually the mixing of systems.

in computer programming teams it's best to adopt a standard unambiguous language to communicate to one another so as not have to constantly say "do you mean...?"


No, it's the first element, if it's a 32 bit int it still occupies bytes

0..4

the second int occupies bytes

4..8

you can see the array up to the second int occupies the contiguous space 0..8 which 0..4*2 and it has TWO elements


Rulers, clocks, and speedometers measure continuous quantities, i.e. ℝ. The natural numbers are irrelevant there.


> Nats start at 0, end of discussion - it's only logical to index by the naturals.

Well, no, obviously you want to index by ordinals, and they start at "first".


Where 0 is the cardinality of the empty set, i.e. some empty collection. Not a convincing argument.


> Where 0 is the cardinality of the empty set, i.e. some empty collection. Not a convincing argument.

Why not? "Every cardinality, except of the empty set, is a natural number" isn't very convincing to me, if we're making cardinal-based arguments.


I'm not arguing that the natural numbers should start from one, rather that the usual path for developing the natural numbers starting with set theory is that zero is the size of a set with nothing in it. In maths they always talk about the first element in a vector, not the zeroth. It is a bad argument to point to mathematics for using zero based indices. It is not at all common to use zero based indexing there.


I'm a mathematician, and 0-based indexing makes way more sense to me.

Dijkstra's argument (https://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF) is that we count the number of predecessors. This fits well with the mathematical usage, at least among mathematicians who care to dig into the order-theoretic foundations: the von Neumann construction of ordinals (and of cardinals as least ordinals of a fixed, well, cardinality), each finite ordinal is equal to its cardinality: 0 = \emptyset is a set with 0 elements, 1 = {0} is a set with 1 element, 2 = {0, 1} is a set with 2 elements, etc.


I'm a mathematician too, and it doesn't make sense to me. It seems somewhat arbitrary depending on the application, i.e. I think it is a type problem. We use an Int to mean something special about array elements. I'd be happier with first() and last() methods really, I don't care for coming up with meanings based on implementation details.


> I'm a mathematician too, and it doesn't make sense to me.

Just to be clear, I wasn't making some sort of bizarre appeal to authority; whether something makes sense is a matter of taste, not of mathematics, and can't be proven or dis-. I meant only to respond to:

> In maths they always talk about the first element in a vector, not the zeroth. It is a bad argument to point to mathematics for using zero based indices.

I don't know for sure which convention is more common, though I suspect you're right that it's the 1-based convention, but I do know that it's false that 'they'—meaning, among others, we!—always talk about the first element/component in a vector. I probably will do that in a first Linear Algebra class, because the textbook often does and I don't want to introduce unnecessary confusion; but usually in my own work, and sometimes when I teach upper-level math classes, I use 0-based indexing when it's necessary to choose.


The question is whether 0 is a valid ordinal. If it is, why skip it?

If you allow 0 as a valid ordinal, it unifies ordination and measure. That's the best reason to do so.

There. Djikstra as haiku.


It's not a convincing argument because you index into an array if you want to retrieve an element contained in the array. If 0 is the cardinality of an empty collection, it's not a valid index, because you can only index into non-empty collections.


Dijkstra's argument (https://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF) is that the index counts the number of predecessors.

I think an age-based way to phrase it: in your 1st year, your age is 0; in your 2nd year, your age is 1; and so on. We can assign people numbers indicating what year of their lives they're in, or how many years they have lived, and both are fine, but we've settled on the latter.


> Dijkstra's argument (https://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF) is that the index counts the number of predecessors.

That's not an argument. It's a coincidence. There are applications where what you care about is the number of predecessors (indeed, that's what the compiler cares about, which is why we have 0-indexing in the first place), but they are a tiny minority of all indexing.

> I think an age-based way to phrase it: in your 1st year, your age is 0; in your 2nd year, your age is 1; and so on.

But that isn't even true. No one ever reports the age of their new child as 0; instead, they will report a positive number of months, or -- if it's an extremely new child -- of weeks or days.


> That's not an argument. It's a coincidence. There are applications where what you care about is the number of predecessors (indeed, that's what the compiler cares about, which is why we have 0-indexing in the first place), but they are a tiny minority of all indexing.

I think there's nothing to say to the first two sentences but that I regard it as an argument that may be more or less convincing. I don't know exactly what it means for something to be an argument vs. a coincidence; it is a coincidence that, say, my name is what it is, but it is nonetheless correct for me to argue that that is my name.

These are all conventions anyway, and there is not much use (or even meaning) in arguing about which one is the right or wrong convention, just which one makes more or less sense; and this is one way to make sense of 0-based indexing, though of course there are also ways to make sense of 1-based indexing.

I'm not sure I buy that these cases are a tiny minority, but I'm certainly in no position to produce any data to the contrary.

> > I think an age-based way to phrase it: in your 1st year, your age is 0; in your 2nd year, your age is 1; and so on.

> But that isn't even true. No one ever reports the age of their new child as 0; instead, they will report a positive number of months, or -- if it's an extremely new child -- of weeks or days.

I can believe that ages aren't reported that way, although I think that a child who will be 1 year in 1 year should logically be said to be 0 years; but we can avoid that debate by considering future years: in, to pick the example that applies to me, my 42nd year of life, I am 41 (I could say 41 and 1 month, but I don't—in fact, there's a Seinfeld joke about that). Similarly, the 42nd entry in a 0-indexed array is indexed 41. It doesn't have to be that way, but I don't think one can argue that there's anything logically amiss about it (nor about 1-based indexing … but we do have to pick one).


> I think there's nothing to say to the first two sentences but that I regard it as an argument that may be more or less convincing. I don't know exactly what it means for something to be an argument vs. a coincidence; it is a coincidence that, say, my name is what it is, but it is nonetheless correct for me to argue that that is my name.

Sure. But we're talking about whether to index arrays from 0 or 1. It is true that naming an array element after the quantity of its predecessors will tell you the number of predecessors the element has. But that's not an argument for why you should do it; there would need to be some kind of benefit to having that information. Without a benefit, it's just something that happens to be true.

That's the difference between an argument and a coincidence.


I learned them in elementary school (in the US) as starting at 1. Indeed, Google's first result for natural number is from Oxford Languages, and defines natural number as:

the positive integers (whole numbers) 1, 2, 3, etc., and sometimes zero as well.


Waste 0th element or reuse it for something like length (hello, pascal strings). Another option is using base_address - element_size as your array value. Another option is using +element_size for all array accesses, assembly languages usually have this instruction. There’re many options to use 1-based indexing without sacrificing performance.


> assembly languages usually have this instruction

Is that so? I wasn't aware of that.


At least x86_64 can put something like `ptr[index * 4 + 4]` in one instruction (assuming that variable values are in registers). Not completely sure about ARM.


newer pascal strings are #0 terminated, original pascal strings were 255 chars max with string[0] being the string length, they were also not Unicode friendly, newer pascal compilers still support this as the shortstring type. I use pascal style strings in my embedded C library I wrote in 1989 much more efficient on 8/16-bit systems


This is far easier and clearer than the entire article. Good stuff.


0 for offsets, 1 for ordinals, that's usually the rule. Of course, both are conventions.


I always assumed it started at zero otherwise you don’t use the full range of an unsigned integer and you can only have an odd number as your capacity (for whatever you use indexing for, not just RAM addressing).


Other advantages of zero based indexing, beyond being 'closer to the machine':

It works better with the modulo operator: `array[i%length]` vs `array[(i+length-1)%length+1]`. Or you would have to define a modulo-like operator that maps ℕ to [1..n].

It works better if you have a multi-dimensional index, for example the pixels in an image. With 0 based indexing, pixel `(x,y)` is at `array[x+widthy]`. With 1 based indexing it is at `array[x+width(y-1)]`. You might argue that programming languages should support multi-dimensional arrays, but you still need operations like resizing, views, etc.


Another advantage is with ranges: 0-based indexing and exclusive ranges work well. This is apparent with cursor position in text selection

Consider:

    Characters       h e l l o
    Cursor index    0 1 2 3 4 5
    Char index       0 1 2 3 4
    Range [0,3)     [0,1,2]
    Range [2,5)         [2,3,4]
    Range [1,1)       []
If we used 1-based indexing and exclusive ranges, it leads to ranges where the end index is greater than the string's length...

    Characters       h e l l o
    Cursor index    0 1 2 3 4 5
    Char index       1 2 3 4 5
    Range [1,4)     [1,2,3]
    Range [3,6) (!)     [3,4,5]
    Range [2,2)       []
but if we use inclusive ranges, it leads to ranges where the end index is less than the start index...

    Characters       h e l l o
    Cursor index    0 1 2 3 4 5
    Char index       1 2 3 4 5
    Range [1,3]     [1,2,3]
    Range [3,5]         [3,4,5]
    Range [2,1) (!)   []
Also:

    Characters            h e l l o
    Cursor index         0 1 2 3 4 5
    0-based range [0,3)  [0,1,2]
    1-based range [1,4)  [1,2,3]
for the 0-based range [0, 3), the left array bracket is at cursor index 0, and the right bracket is at index 3. With 1-based indexing it doesn't work like that because the range is [1, 4)


This is the bane of my existence working with Lua.

Iterating an array or adding to the end are fine, we have ipairs and insert for that, but ranges on strings I'm constantly having to think harder and write more code than necessary.

I love the language, wouldn't trade it for another, but the 1-based indexing on strings, which represents an empty string at position 3 as (3,2), it's egregious.

Not as egregious as a dynamic language where 0 is false though.


> a dynamic language where 0 is false though.

well, C also considers 0 being false (and you can argue that C is "dynamic"!).


Yeah, offsets are just easier to mathematically manipulate than ordinals. It's not just pointer arithmetic where it matters that item i corresponds to start+i*step. Any time you want to convert between integer indices and general linearly-spaced values, 0-based indexing is more convenient.


In many domains code maintenance is more important than hardware costs. In many domains 1-based indexing is a better fit, meaning less conversion code, meaning simpler code. Thus, the best indexing choice depends on the domain and circumstances, as do many controversial questions. Most tend to specialize in specific kinds of domains and over-extrapolate their experience into other domains.


I'd agree so why did languages that allow specifying the base die (other than maybe VBA).


Because changing the base is a great source of bugs.

That said, not all languages have given up on this. For example Julia allows it. https://docs.julialang.org/en/v1/devdocs/offset-arrays/

Ironically I learned this from a discussion of Julia bugs. Apparently changing offsetting of arrays has proven to be a source of bugs in Julia. So maybe someday they will come to the same conclusion as languages like Perl and stop allowing it.


I'd argue 0-base is a source of bugs too! Ideally we'd be able to catch more array indexing bugs at compile time - there are definitely cases where it should be possible to determine that arrays are being incorrectly indexed via static analysis.


The problem is that libraries which assume 0-base break when you have a 1-based array. And vice versa. Trying to combine libraries with different conventions becomes impossible.

Therefore changing the base leads to more bugs than either base alone.

That said, the more you can just use a foreach to not worry about the index at all, the better.

Of 0-based and 1-based, the only data point I have is a side comment of Dijkstra's that the language Mesa allowed both, and found that 0-based arrays lead to the fewest bugs in practice. I'd love better data on that, but this is a good reason to prefer 0-based.

That said, I can work with either. But Python uses 0-based and plpgsql uses 1-based. Switching back and forth gets..annoying.


I'd expect the compiler not to let you to pass a 0-based array to a library function expecting a 1-based array. I'm pretty sure that's how it worked with Visual Basic, which was the only language I ever used such a feature in.


You are demanding a lot from the type system.

Search for OffsetArrays in https://yuri.is/not-julia/ for practical problems encountered in trying to make this feature work in a language whose compiler does try to be smart.


The type system does tell you if this is used. `::OffsetArray`.


Yes, but did the programmer tell the type system that they are expecting an OffsetArray, they have tested it, and it will work correctly?

The existence of a mechanism does not guarantee its correct use. As that link demonstrates.


A disadvantage comes to mind. While the following loop works as expected:

    for (size_t i = 0; i < length; i++) ...
The following causes an unsigned integer underflow and is an infinite loop:

    for (size_t i = length - 1; i >= 0; i--) ...


As Jens Gustedt points out[1], the following intentional unsigned overflow works perfectly for downwards iteration (even when length is 0 or SIZE_MAX), though it looks a bit confusing at first:

  for (size_t i = length - 1; i < length; i--) ...
You are also free to start at any other (not necessarily in-bounds) index, just like with ascending iteration.

[1] https://gustedt.wordpress.com/2013/07/15/a-praise-of-size_t-...


Principle of least surprise violated.

Also, that behavior is not guaranteed. The programmer would need to be aware of how the particular machine in question actually handles that.

Then again, that's C.


Unsigned integer underflow and overflow are both guaranteed to wrap by the C standard.


Uh I'm confused but don't know c++.

why doesn't that loop end instantly?

I mean length - 1 < length should always be true, right?

Or does it only terminate when the number underflows? Terribly confused here


For loops are translatable from:

  for(initialize; condition; increment) { ... }
to:

  initialize;
  while(condition) {
    ...
    increment
  }
(more or less, some scoping things not encompassed by the above; this is also how pretty much every for loop in a C-syntax language works) The condition of a for loop is equivalent to a while loop's condition. So yes, length - 1 < length will be true on the first iteration, which is fine because the loop continues as long as that condition is true.

What the above approach takes advantage of is that when underflow eventually happens you'll have this condition:

  MAXINT < length
Which will terminate it for all possible values of length.


It‘s an unsigned int, so past 0 it overflows back to the maximum


Ooh, i see. I wasn't aware that they under/overflow at 0 when they're unsigned. Thanks for broadening my horizon!


It loops while the condition is true. When an underflow happens, it stops being true.


It’s a condition to run, not a condition to stop


That was my reaction - why should anyone think it might be the latter? Are there languages that do have such a syntax without explicit keywords ("do...until")?


I think lisp or scheme does. I was often confused by that when I was playing with it


The loop continues until i transitions from 0 to 0 minus 1. 0-1 in this case actually doesn't equal -1 since size_t is an unsigned type, instead it wraps around to be the largest possible positive integer instead. TLDR; yes as you speculate it terminates when the number underflows.


the footnote [1] should be [0], just for the sake of this very topic.

Seriously though, while the idiom does work for unsigned integers, it's a bad idiom to learn [makes code reviews harder]. The post-decrement one in the loop body works with everything (signed/unsigned), and it's well known.


In that specific case I'd do the following:

    for (size_t i = n; i-- > 0 ;) ...
Or count from `length` to 1, but subtract 1 in the loop body, or count up and subtract the length in the loop body. Any modern compiler should be able to optimise these to be equivalent.

In the majority of cases, counting down is not necessarily. Nor is ordered iteration. Most languages have a `for each` style syntax that's preferable anyway.


Ah yes, the goes-to operator -->


And the wink operator, so the compiler knows you know the deal


I like how GP is called "operator-name" and instead of doing himself, he makes others joking with operator names. Although I'm not sure if it's altruism or highly manipulative behaviour.


Gold - have an upvote!


Or alternatively:

  size_t i = length;
  while (i--) ...


Unless I am remembering C wrong, this would work but the

   for (size_t i = length; i-- > 0 ; ) ...
that several other people posted would not execute for index 0. Shouldn't it be this instead?

   for (size_t i = length; --i > 0 ; ) ...


the correct way/idiom to reverse iterate an array is

  for (size_t i = length; i-- > 0; )...
It's surprising how often the issue pops, it works well with both signed and unsigned integers.

(edit) I've started with one based indexing (basic)... mixed with 0 based (assembly), more 1 based (pascal), then more stuff (all zero based). I am, yet, to see a real advantage of a one based indexing... after the initial process.


Is this cache friendly?


Sure, why wouldn't it be? As far as a cache is concerned, I don't think reverse sequential iteration would be any different than forward sequential. The actual RAM accesses may be less optimal if there's some speculative pre-fetching with assumed forward sequential access, but that's conjecture.


With some exceptions, hardware prefetch works in terms of ascending accesses. To learn if a particular CPU will prefetch for descending access, benchmarking is essential. Best to use soft prefetch calls if performance is critical.


i would suspect that the cache prefetch/prediction could use the "velocity" of the memory access to predict the next access; so if the access pattern was going backwards, the "velocity" would be negative, but prefetching would still work if they just followed the predicted pattern.


It is not, unless your compiler is smart enough to recognize reverse iteration and prefetch appropriately.

If performance matters, you should experiment with __builtin_prefetch, which is available in clang and GCC.


It’s not. It was nice on architectures were cache didn’t matter much and were subtracting and comparing to zero was just one instruction (looking at you old core ARM)


In the C programming language unsigned integers do not overflow. They wrap. This is well-defined behaviour and the example code is simply incorrect. Most modern compilers will give you a diagnostic for this.


If you have foolishly turned off -W in your build system, that could happen. Otherwise, you get a nice warning pointing out your folly.


Unless wrapping underflow is sensible for the domain (which it isn’t when representing the size of something), unsigned integers are usually a bad idea.


You can always rip a page out of C++’s playbook:

    for (size_t i = length; i > 0; i--) {
        // ...
        item = array[i - 1];
(This is how reverse iterators work in C++.)


  for (size_t i = 0; i < length; i++) {
      size_t j = (length - 1) - i;
      ...
  }
EDIT: change i to j


That's quite broken (you'd want a different variable inside the body, vs clobbering the iteration counter, else this would process the last item in your list, then exit).


I hope you haven't done this anywhere.

Makes my brain hurt, but I think this will only run through the loop one time looking at the last element of the array.


Uh no don't reassign the loop variable in the inner scope. Use:

    const j = (length - 1) - i;
in that case. Much safer.


You should save the old value of i somewhere and restore it back at the very end of the loop. Or simply define a new j like another comment says.


A rare case where 1-based indexing is more convenient is complete binary trees laid out breadth-first (as in a standard binary heap): parent is i div 2 and children are 2i and 2i+1 when starting at one and who knows what when starting at zero. But that’s the only one I know.


Except 1-based indexing is what we use in normal language. We don't use "zeroeth" or "player (number) zero" etc. And the word "first" is shortened to 1st etc. Personally I think we'd be better off if programming languages stuck to the same convention - off-by-1 errors aren't the hardest problems to deal with but they're still annoying.



Sure, it's technically a word, but hardly one you'd casually drop into your conversations (with non-programmers)


> "player (number) zero"

That would be because any game worth playing has at least one player... and so it's natural to continue from there. (In terms of language.)


You might have heard of Conway's Game of Life.


Fair :)


That's confusing ordinal and cardinal numbers. The element with index 0 is the first number. The element with index 1 is the second number, and so on.

Using the term "zeroth" is basically some form of showing off (even though it's kinda fun), but will be utterly confusing when you get to the fifty-second element which is the last in a group of 53 elements.


I'm not confusing them, my point about abbreviating "first" as 1st was that in typical speech we start counting at 1. Nobody says "let's start with item zero on the list". But programmers are stuck with having to say/think "item 0 in the array".


I don't disagree with you. I just don't think a programmer should be confused about the statement "item 0 is the 1'st item".


The 2 major blunders in programming: 1) Off by one errors.


With 0-based indexing the children are at 2i+1 and 2i+2. The parent is at (i-1) div 2.

Not hard to figure out.


> With 0-based indexing the children are at 2i+1 and 2i+2. The parent is at (i-1) div 2.

> Not hard to figure out.

While that's true, "you just shift by 1" is equally good at all arguments for or against 0-based indexing, so deploying it here probably won't convince.


I was not trying to convince.

That said, the effort of one versus the other is so trivial that there is no point in ever using effort as an argument either way. Doubly so because what seems like effort to us is simple unfamiliarity.

What is important is which one leads to more careless errors in practice. As a trivial example, consistent indentation takes effort, but failing to do it leads to more careless errors. Therefore everyone indents code.

The only data point I've seen on that is the side remark about Mesa in https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/E.... That remark, therefore, is the only argument that I care about.


An incredibly unrare case happens all the time in my work: array[length - 1], or variations of this. Anything involving the last element of the array, and often iterating through the whole array will use something similar at some point.


This is the reason I always think of when I hear the question


I think the 1-indexing folks would have to argue that a%b should return a value from 1 to b inclusive. This does make the same sort of intuitive sense as 1-indexing. For example we number clocks from 1 to 12.


% is defined as (mostly) a remainder operator. I don't think you want to change the semantics of division itself.


Right. Ultimately the 1-indexers are wrong.


But interestingly, the number at the top is 12, not 1.


No we don't. Clocks start at 0. what time is it when it's half an hour after midnight? 00:30.


but nobody "says" zero o'clock - it's always twelve o'clock!

and the example is in 24hr format - which needs to have the 00 to differentiate it from being 12:30. But if you write in 12hr format, you don't ever use 00 - it's always 12:30am or 12:30pm


First of all, you don't. Or I guess most Americans don't. I do, or most Europeans do. 12.30am and 12.30pm feels just very wrong in Europe.

But yea, I do agree with you. we don't say 0, we say 12, no matter if it's noon or midnight. That's because we humans avoid saying zero when we mean zero.

It's the same with other comments here, talking about counding seconds, we say "ok go, one, two,.. and not "zero, one, two,..". We say 3 months old baby, and not 0 years old baby.

We use 0-index in so many things, we just avoid saying the word "zero", and we use other names or other units to avoid that word.


I'm from Sweden, and I certainly would say 00:15 (as zero fifteen). Although the time at exactly 00:00 would be called midnight.


12 is zero, in 12-hour clocks.


Because otherwise you would be wasting a perfectly good number for no reason, which means you need to use more bits to do the same thing.

To write 4 numbers (including zero) you only need two bits

    0: 00
    1: 01
    2: 10
    3: 11
To write 4 numbers if you avoid using the number zero, you need three bits

    1: 001
    2: 010
    3: 011
    4: 100
If you extrapolate that a little bit, you'll realize that you'll need two bytes (1 Byte + 1 bit from another byte) to store the indices of an array with 2^8 elements, which is just dumb.

Some of you might be thinking: you don't need to store it the same way it's written, you can just substract 1 from whatever the user typed and convert it behind the scenes.

Yes, you could subtract 1, but then you would be making the whole system more complex, opaque, and unelegant, while hiding information from the programmer for no good reason.


The same applies to counting in other bases too. For instance, in 1-indexed counting grids for kids, the last column always feels out of place.

0-indexed decimal grid:

   0  1  2  3  4  5  6  7  8  9
  10 11 12 13 14 15 16 17 18 19
  20 21 22 23 24 25 26 27 28 29
  30 31 32 33 34 35 36 37 38 39
  40 41 42 43 44 45 46 47 48 49
  50 51 52 53 54 55 56 57 58 59
  60 61 62 63 64 65 66 67 68 69
  70 71 72 73 74 75 76 77 78 79
  80 81 82 83 84 85 86 87 88 89
  90 91 92 93 94 95 96 97 98 99
1-indexed decimal grid:

  1  2  3  4  5  6  7  8  9  10
 11 12 13 14 15 16 17 18 19  20
 21 22 23 24 25 26 27 28 29  30
 31 32 33 34 35 36 37 38 39  40
 41 42 43 44 45 46 47 48 49  50
 51 52 53 54 55 56 57 58 59  60
 61 62 63 64 65 66 67 68 69  70
 71 72 73 74 75 76 77 78 79  80
 81 82 83 84 85 86 87 88 89  90
 91 92 93 94 95 96 97 98 99 100


Are you sure it doesn't just look that way because you are used to monospaced fonts? Ask a kid to show you their zeroth finger.


As a kid, I saw these things often on multiple-choice test sheets. The tens digit at the end of each row not matching that at the beginning always seemed awkward, and I wondered why they didn't just start at zero. This was long before I did any programming.

The first year CE being "1" resulting in the new millennium starting at 2001 instead of 2000 also seemed idiotic, and the 1900s being referred to as "the 20th century" was something I always had to consciously compensate for. Numbering items starting at zero would have given us more elegant/less confusing ways to communicate those things.

Zero-based indexing makes things inherently simpler because it matches the way we write numbers (and becomes much more noticeable once you have more than one digit). It's not just an optimization for computers.


Well, given that year numbering system's start in Europe, zero wasn't as broadly accepted as other digits at the time.

By AD 150, Ptolemy, influenced by Hipparchus and the Babylonians, was using a symbol for zero.

https://en.wikipedia.org/wiki/0#History

To keep it confusing: the traditional proleptic Gregorian calendar (like the Julian calendar) does not have a year 0 and instead uses the ordinal numbers 1, 2, ... both for years AD and BC. Thus the traditional time line is 2 BC, 1 BC, AD 1, and AD 2. ISO 8601 uses astronomical year numbering which includes a year 0 and negative numbers before it. Thus the ISO 8601 time line is −0001, 0000, 0001, and 0002.

https://en.wikipedia.org/wiki/Gregorian_calendar#Dual_dating


I asked my son to show me zero fingers(after show me 2 fingers and 5 fingers), he closed his fists.


My point exactly - there isn't an identifiable zeroth finger.


There's not a second finger either. You can convey "two" by showing any two fingers.


And? My solution? Let the kid grow up and learn.


[flagged]


Anecdotally, I taught my children numbers starting from zero with a similar routine, based on cardinality. They immediately got it.

My kids' school also uses zero-based tables.

"Natural" is a very loaded and ambiguous term, best avoided if you hope to have sane discussions.


The index is the distance away from the first element. So index 0 is the first element. Index 3 is 3 away from the first element, so the fourth element.

It's the difference between ordinal and cardinal numbers. In other words, counting and indexing.


Well well well...

https://news.ycombinator.com/item?id=5888459 (quoted here for convenience)

—————

Formally, ordinals start at zero too :-) [0]. We owe their latest definition to Von Neumann, but AFAIK, the former definitions were similar. [...snip...]

--

[0] This is meta-meta-contrarianism.

- The layman counts from 1

- The uptight programmer counts from zero, because Dijkstra said so (or so he thinks).

- The meta-contrarian (I used to be one) says fuck it, ordinals start at one.

- The meta-meta-contrarian reads Wikipedia[1], realizes he was formally wrong, and goes one step further in pedanticity, back to zero [0].

That being said, my brain prefers 1-indexing programming languages like Lua, Julia, R and Matlab...

[1] http://en.wikipedia.org/wiki/Ordinal_number

[0] Help! I'm stuck in a Boolean algebra[1][0]!

[1] http://en.wikipedia.org/wiki/Boolean_algebras_canonically_de...

[0] ... where xor is an addition that wraps around ...


Teaching kids about zero I have no issue with.

I do not believe for a second that you taught them to count by holding up one finger for zero, two fingers for one etc.

Reminder that we're specifically talking about array indexing.


Of course there is - I can write “0” on it!


Counting fingers (or other items) isn't a positional numeral system, so it's really apples to oranges. Also, "first" has a meaning of "nothing precedes it", which is separate from the indexing system", it shouldn't mean "at index 1", unless otherwise specified.


Though the idea of an index as a position instead of an address is weird to me, you can indeed make that analogy. The "first" element of an array has the index zero, as in "zero elements precede it".


> Ask a kid to show you their zeroth finger.

They will get confused because every day counting got it wrong. In a perfect world we would count fingers from 0.


Neither looks like the canonical choice to me. They are both just as good.


counting grid? What even are these?


Yeah, but then couldn't the compiler just swap the 1 with a 0 and 256 with 255?


That gets really dicey if you're storing the value of the index in a variable. Does the programmer need to know that if it's an INDEX they're storing that goes from 1-256 they can fit that into an 8-bit value because the compiler will magic away that last bit? Speaking of which, how will the compiler know which 1's are really 0's and which 256's are really 255's? Will it compile that 1+1=1 if I use that result as an index? Just having a 0th index is much simpler than all of that noise.


not really? It just changes how the compiler emits the x := *(addr + idx) operation where instead of doing something like

    ldr x, [addr, idx]
it just does

    sub t1, idx, #1
    ldr x, [addr, t1]
If we're talking about C, accessing invalid/OOB indices is undefined and so if idx happens to be zero and unsigned, we'll overflow and hit something unexpected, which is fine.


That's what I was thinking too, except that they have a point. You might end up using more bytes to store indexes as variables (if the index is 256 for example), than you would with zero indexing.


Someone linked to your johnny decimal system earlier today. https://johnnydecimal.com/ I liked the idea but I noticed that it didn’t seem to be 0 indexed. If I adopt this i’m definitely going for zero indexing.



You could have a 1 based type that doesn’t include 0. Or make zero the max value.


Fron the article:

> It's not the reason you think. No, it's not that reason either.

Indexing is a language level concern, it has nothing to do with the generated code.

Unless you write machine code I guess.


> Indexing is a language level concern

IIRC, Visual Basic 6 took a why not both? approach to array indexing.


I like how the article summarizes this idea:

> These all point to one reason why 0-indexing might be preferred: it matches machine semantics more


But then of course the reason why 1-indexing is better (or really, arbitrary indexing, which is even better) is that it matches human semantics better.


Edsger Dijkstra wrote an interesting article titled 'why numbering should start at 0'. Perhaps not answering the question directly but an interesting read nonetheless.

https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/E...


As the article implies, Dijkstra was specifically wrong about FORTRAN, as defined at the time — the '77 standard, when it was still only conscionable to SHOUT 6 alphamerics.

“… you probably know that arrogance in computer science is measured in nano-Dijkstras.” — Alan Kay


"This quote keeps on showing up out of context." — Alan Kay https://news.ycombinator.com/item?id=11799963


Well, I did hear it in context, and have also heard him talking about concocting aphorisms. Kay also said “He did like to pull chains …”


I've never understood the first half of Dijkstra's argument. Unless we are programming a theoretical Turing machine, there will also be a maximum number that can be represented, so the argument about using inclusive bounds for the lower end also applies to the the upper end. So you can't represent all numbers with a single range style. You either end up excluding the smallest number, the largest number, or the empty set. Or you need special notation to handle one of those three cases.

I suppose the 0,1,infinity principle argues that including the empty set and smallest number is more important that including the largest number, so that notation should be preferred by default. And when writing using A<=i<B it is clear enough, but once you remove those symbols and start using them in indexing or function call syntax, like [A:B] or range(A,B), the inconsistency of bounds really breaks my brain, and having to pass a number larger than the largest array index as an upper index range bound gives me nervous ticks. I'd rather use inclusive bounds everywhere and have special syntax for empty set.


Sure but it’s much more common for a range to include 0 than INT_MAX


Interestingly, that Dijkstra article was originally published/circulated in handwritten form. It helps showcase the author's handwriting skills.

https://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF


The handwriting skills were common back in the day... By the way, are you sure that it was Dijkstra himself who handprinted these texts (rather than have, say, his secretary do that)? Because this is not exactly the (beautiful) handwriting as it was taught back then.


He handwrote them, and occasionally typed them (especially early on, I gave up trying to find the transition point). It was something he was known for. If you read the EWD's over the years you'll see the same (or very similar) handwriting throughout, which would not be the case if he had a secretary writing them for him (who wouldn't have been the same person as he moved between countries).

https://en.wikipedia.org/wiki/Edsger_W._Dijkstra#EWD_manuscr...

https://www.cs.utexas.edu/~EWD/ - Pick random ones from different decades and you'll see very similar handwriting.

(Didn't read these, picked two at random):

2001 - https://www.cs.utexas.edu/~EWD/ewd13xx/EWD1307.PDF

1984 - https://www.cs.utexas.edu/~EWD/ewd09xx/EWD901.PDF


If you ask people which floor of building they're on, it's going to depend on which country they're in. In North America, at least, the first floor you walk into (in a sane city: I understand there are some which do not qualify in this respect due to hills or historic disaster recovery) is the first floor. On other continents, you enter the ground floor and need to take stairs or an elevating device to get to the first floor.

"Natural" zero-based counting at its best.


A big international corporation (Siemens) had, at one point, standardized their buildings world wide to have room numbers starting with 2 on the ground floor, 3 on the floor above etc., leaving space for two basements.

All room numbers were 5 digits, and skipped numbers if there were multiple windows in one room, based on the theory that such a room might later be sub-divided, and wanting to avoid renumbering of rooms down the hallway.

(I encountered this around 2003, so might have well changed since then; didn't find any references only with a quick search).


I worked for them relatively recently and didn't encounter this; or at least never noticed. Then again, our office building was an open office plan where desks next to the windows and rooms (conference or office) were all in the center of each floor.


The ground floor is 0. The floor above 0 is 1. The floor below 0 is -1. Anything else is crazy TBH, the whole point of integer numbers is to count things that start at a defined point and go opposite ways. Why shift it and then reinvent weird pseudo-negative numbers like S1?

But then again people measure distance in feet and write dates as month day year :)


It would be nice if this was the case. In grad school I taught in a building that was on a hill and had once been separate buildings so different entrances were on different floor and the transition between the former separate buildings was 4 stairs.

So there was a floor with rooms numbered 1xx and 0xx. There was also a floor below the 0xx floor. They just numbered it 00xx. And you could enter the building on any of those floors depending on what door you picked.

The class I taught was in 0005. Room 005 was someone's office. So on the first day of every semester we would have students waiting outside the door of 005 instead of 0005...


You'd think if they had to go below 0 they could just prefix with a letter, like B (Below? Basement?) B1, B2, etc.


Could this be Dwinelle Hall at UC Berkeley?


nope. IU


Cool, Dwinelle at Berkeley is a fascinatingly bizarre building for seemingly very similar reasons, and (if I remember correctly) also with pretty much the same weird numbering system plus tiny staircases to adjust for mismatched heights between floors!


Counterpoint: because floors are sum types.

Floors 1 through 10 are "regular" floors. L is the lobby. P1 through P3 are the parking (basement) levels.

So you have three types: regular, lobby, and parking. And the elevator labels are of type regular | lobby | parking.

(Now you can't disagree with me because sum types are popular on HN, and I have framed this in terms of sum types.)


Sum types don't automatically define ordering between the possible values AFAIK? So in your implementation you'd have to manually define the

    up: (regular | lobby | parking) -> (regular | lobby | parking)

    down: (regular | lobby | parking) -> (regular | lobby | parking)
and

    which_way: (regular | lobby | parking) x (regular | lobby | parking) -> direction
If you use subset of integers you can just use operators inc, dec and <.

Yeah I know it was a joke.


If I build a building in the shape of a double helix, what order are the two floors in?

Hmm, are there actually an infinite number of floors? Do we need to switch to floating point?


> If I build a building in the shape of a double helix, what order are the two floors in?

Why separate the floors that are on the same level?

> Hmm, are there actually an infinite number of floors?

I assumed there isn't and that we use a subset of integers. If you have infinite number of floors - feel free to use full set of integers (if you have enough memory that is).

> Do we need to switch to floating point?

No point, if you have more than aleph0 floors - floats won't help you.


We have a building at the local university that has ground at one end level with transitory step between floors -1 and 0, you enter the building through the stair well, and descend or ascend to get to a floor. I wouldn't be surprised if an architect designed a building that did this on flat ground, violating the ability to number floors relative to ground with integers.


This is how it should be, but the only country in the world, AFAIK, that does it that way is Germany.


I think the only country i've been which does not do this is the US. It was pretty weird at first


There’s some confusion here about the exact word being used. The German word that is mistranslated here as “floor” is “Stock”, which actually means “addition”. So the floor above the ground floor is the first addition, not the “first floor”.


Not german here, this is standard and I'm a bit confused by your comment.

0 is ground floor.


When I was at Leeds University the ground floor of the Physics Admin building was Level 6 (it went up to Level 11). It was built on a hill and according to legend this allowed room for expansion (which never happened) down the hill without requiring negative floor numbers.

To add to the fun the only continuous corridor through the entire T-shaped building was on Level 10, one end of which (the base of the T) was at ground level on its part of the hill. There was a famous 'kink' in the middle of the top / cross-bar of the T (apparently the longest corridor in Europe) because they started building from each end and were slightly off.


I always said it was an argument between the counting of things versus the relative offset of things.

This is a great example.


I call `array[0]` the first element. I don't know what you'll think of that.

I wonder how many multiples of people who like arguing about starting from 1 vs 0 and how many people were confused for more than a day


The Patterson Building at Acadia University in Wolfville, NS, has a basement and four floors. The rooms on the first floor are numbered 1XX, second floor 2XX, etc. up to the top floor 4XX.

The elevator, however, goes from floor '1' (the basement), to floor '5' (the top or fourth floor).

This really confuses visitors.

The Elevator Mafia want $10,000 to fix it.


I've been living in North America for decades. Every building has G (zero). This is the floor that you walk into and has a lobby. The next floor up is 1 as such is labeled in the elevator.

So your initial assumption is incorrect from my experience.


A lot of elevators in US do indeed have G for the ground floor, but that's because they are made by ThyssenKrupp AG, a German company ;).


This is not my experience in the US. Most places either have G or 1 for the first floor. Then 2 for the next. So some elevators say G, then 2, 3, and so on.

I've seen setups like you mention, but they definitely aren't the majority in places I've lived.


I'll have to look actively next time I ride an elevator, this is just from my recollection.


What city are you in, out of curiosity?


SF Bay Area (currently), but have lived in several major metro areas.


It really comes down to a choice between a machine-focused (0) or human-focused (1) approach.

The 0 makes a lot of sense in a C pointer world where memcpy and other alike functions can be written very thight.

The 1 makes a lot of sense in a human world, when we count, we start at 1, we talk about the "1st", counting on finger starts with 1, etc.

I once were at a Lua (1 indexed language) conference where this was discussed, and Luis started explaining why Lua was 1-index with this sentence: "The 1st argument ...." :)


> It really comes down to a choice between a machine-focused (0) or human-focused (1) approach.

Just think of 0-based as offset-based and 1-based as index-based. Both are intuitive just like that. I never get why people arguing over this bring pointers and memory (or anything computer related) to the table. No normal person is going to understand that, but everyone understands that if you don't move at all (0 offset) you stay at the first (index 1) item. Add one and ... you get the point.


"Offset-based" is bringing in pointers; that's the thing it's an offset "from". "The beginning of the array" is just a pointer.

I suppose saying that does have an advantage over explicitly talking about pointers, in that the word "pointer" is a piece of jargon that has a lot of baggage. That's just avoiding jargon, though, not really using a different model.


No, offset means "measuring from origin", pointers use that language they don't provide it.

The advantages in measuring from origin can accrue to the person choosing to do it, because there are other reasons to do so which aren't satisfying the CPU.


No, offset means "measuring from a defined location". In a computer, the choice of location is more or less arbitrary[0]. If you pick the first element, you get zero indexing. If you pick a space before the first element, you get one indexing. Either one is a pointer, since a pointer is just computer jargon for "a defined location".

Calling that defined location "origin" doesn't suddenly make it not a pointer.

[0]- NUMA and cache effects aside, as they don't matter for this purpose


Offset-based lets you refer to an abstract location that is after the last element without resorting to n + 1.

  [ ] [ ] [ ] ...    [ ]
 0   1   2   3   n-1     n

We can regard this n as a "virtual zero", and then make it possible to index the n - 1 element also indexable as just -1. The index -n then aliases to 0.


I must say in this case Google does it right: https://cloud.google.com/spanner/docs/reference/standard-sql...

In Google standard SQL, to access an array it is simply not allowed to put a number inside square brackets. You must specify which way you mean. So `SELECT some_numbers[OFFSET(1)], some_numbers[ORDINAL(1)]` is allowed but not `SELECT some_numbers[1]`.


That sounds actually great to avoid any ambiguity. At the cost of a bit of verbosity, though.


that's actually good verbosity, because the intent is very clear using this method, and multiple people can read the code and unambiguously identify the intent without having to talk to the original author.

This type of verbosity makes sense in a big organization where the left hand doesn't talk much to the right hand much, so communication naturally evolves to happen at the code level.


Indeed.

I like for example using enum types to ensure that what’s actually passed is the expected value even though fundamentally the semantics do not need to be much more complicated than integer values. There could be the same thing with a distinction between offset and indices as two different integer numerical types to avoid any ambiguity.


But I think that still leaves open the question of "offset from what?"

If you move zero from the first element, you're still at the first element... but if you move zero from the fourth element you're still at the fourth element. If you move one from before the first element, you're at the first element.

I think you're more or less right, but I think we still need something to motivate the first element as the point of reference, and the machine focus is one way to do that.


In some languages/environments, the array is literally the same as the pointer to the first element.

Other languages are more sophisticated (like humans are!) and can say that position 0 does not exist. An array of size 0 has no elements. An array of size 5 has elements 1at through 5th. An array of size N has 1st through Nth, inclusive.


"ordinal" is preferred to "index", as "index" doesn't naturally suggest starting at "1", or even restricting the key to integers. "Ordinal" starts from 1, everyone agrees (except mathematicians, of course :-) ).


Ask 100 random people on the street and I'd be surprised if even 1 knew the definition of "ordinal". It's an uncommon word.


Do people not study grammar in American schools? I thought all kids learn the distinction between cardinal (one, two, three) and ordinal (first, second, third) numbers.


At the age that cardinal and ordinal numbers are taught, kids simply don't remember "cardinal" and "ordinal", they remember "one, two, three" and "first, second, third".

99% of the instances I've seen "ordinal" outside of this thread has been in code/documentation. It is not a common word in everyday language.


The English grammar that Americans study in school is likely somewhat different than the English grammar that is taught outside of America as the goal of the latter is likely focused on helping students map their native language onto English. I would agree that `ordinal` is uncommon word for many Americans, but it wouldn't at all surprise me if there were languages where the equivalent word was far more common, and therefore its use and translation was a part of standard English as a second language curriculum.


> The English grammar that Americans study in school is likely somewhat different than the English grammar that is taught outside of America as the goal of the latter is likely focused on helping students map their native language onto English

English is spoken as a native language in many countries outside of America.


So is "array". What's your point? My point is that it is unambiguous once you say what you are talking about.


I disagree. The CS definition of array, sure, but if you were to say “we have an array of options to eat”, most people with a high school education would know what you mean.


And then the counting niceties come along: those perennial +1 mistakes are caused by the fact that, e.g. 4 and 5 are two numbers but they are only one apart.


> when we count, we start at 1, we talk about the "1st"

Although often with an implicit zero. Under typical North American culture, your 1st birthday, for example, is more accurately the first anniversary of your birthday. Your birth is zero indexed.


It's fundamentally a distinction between end-index and start-index.

Most human counting uses end-index. I.e. "1" is after 1-thing (has passed, is physically obtained, etc.).

Most computer counting uses start-index + length, for efficiency and to better generalize. I.e. "1" is at the memory address immediately before the "1"st item.

Which ultimately creates the "0 index is 1st thing" linguistic confusion.

PS: Also, language predates computers by a few years, and the concept of zero is hard.


good take!

i think a more accurate and complete idea is that humans refer to a thing in its entirety, with things being lined up and scanned in order as only a potential convenience

if you ask someone to identify an object, they'll point to the middle (or center of the most important component) of the object, not to the 'start' or 'end' of it in their field of vision..

that said, the human perspective is subtle and convenient in completely different ways to how a computer manages memory, and this 'end-index' idea seems like a useful way to map the human whole-object perspective to a linear memory-index perspective


They may point to the middle, but that's not a reference to half an object. ;)

The assumption is that the thing is its entirety, as you said.

Hypothetically, I imagine an array index reference, in human terms, would be communicated as "this thing starts here" or "this is the beginning of this thing."

Which isn't a concept or phrasing we have much occasion to use, other than for routes or long length-measured objects?


Birthdays are anniversaries. Anniversaries are annual activities when we celebrate/honor past events. They do not include the event itself. The first anniversary is 1 year after the initial event.


Yes, that’s the explanation for how we count them as we do, but it has no greater weight (and I’d argue less weight) as to why than saying whether a[0] or a[1] should be the first element in an array.

I say it has less weight as birthday is a compound word, the root words of which suggest that your first birthday could logically be the day of your birth rather than a year after it.

My first weddingday was not a year after I got married. My first graduationday was not a year after I graduated.


> Birthdays are anniversaries.

Indeed they are, which is why I literally said so in the previous comment. Did you forget to finish reading it?

> They do not include the event itself.

That is true because the event itself is implied information. There is no value in speaking of it. If you stand before us, we can be certain that you had a day of birth (your 0th anniversary). We don't necessarily know how many times you've gone around the sun following that, however, so that is where we find value in communicating additional information.

If you were counting apples, there is the state where you have no apples (index 0), the state where you have one apple (index 1), the state where you have two apples (index 2), etc. When counting you don't need to worry about index 0 because the no apple state is naturally implied. It only becomes interesting and worthy of communication when you have at least one apple to speak of, thus you start at 1. The state found at index 0 is still implicitly there, though.


If I care about apples (for my lunch, or my store), I care about the difference between having 0 of them and not having bothered to count yet.

0!=null


If no apples is the unusual state, like you expected to find an apple in your lunchbox but someone ate it without your knowledge, then certainly there would be reason to communicate the no apples state. It is ignored in communication when it does not provide useful information, but it is not forgotten. The 0th index is implicitly there.


Come to the infinity store - we literally sell everything! (some items may be out of stock)


Yes, the infinite state is also implicitly found within the state set. Conveniently found at the infinity index.


Ok then everyone must do it like us, right?

There's no culture where you're born at 1 year old and turn 2 on New Years when everyone gets older?

You sure?


Yeah age is confusing even for non-computer-folks :)

If you have 4 classes in school today, you would never talk about the 1st class as number 0, the last one is the 4th, not the 3rd.


Although, there is a fifth state in your example: When you are not in class. Which is different to an empty set that implies nothingness. When it comes to age, 0 being birth works well because there is truly is nothingness (from your perspective) before birth. When counting from 1 there is suggestion that there are variables that aren't worth speaking of because they are obvious.


Assuming there is nothingness for the fetus the entire nine months in the womb. For that matter, I can't recall being younger than four, so it's all nothingness before then.


Not really. Perspective isn't scientific and often cultural. Koreans, for example, famously have a different take and use a different counting systems to accommodate.

I am assuming that the reader is biased towards the average HN user. That won't always work for everyone who will come across my comment, but close enough for an unpaid contribution. I'm not about to write a novel to make sure I catch every edge case.


That's why I prefer the Superior(TM) Korean age counting. You are one year old when you're born (it's your first year!). You are two year old on the next New Year's day. (Congratulations, it's your second your now!)

So, if you're born on December 31st, you're two years old the next day. (I see no problem, but apparently some people are hung up on such minor details. I can't fathom why.)


> You are one year old when you're born (it's your first year!)

That makes no sense to me. It's also your first century. Does that make you one century old? Of course not! The moment you're born, you're not even one hour old, let alone one day.


They must like rounding up. I guess it depends how they refer to it in their language. If it's "he's in his 1st year", that's a big difference from, "he's 1 year old". I'm wondering if the parent comment simply doesn't know Korean or only knows it somewhat to misconstrue the culture behind this.

It could be though that Korea simply never encountered the Mayan or Arabic civilizations as most did encounter one of those two in history, who are famously known to have independently discovered the concept of zero.

Birth of an array can only logically be defined as index zero, the same as a child. The concept of zero was not universal globally and Korea is pretty isolated.


Except that we actually do use this system for years. Thus 1 BC (the first year BC) was followed by 1 AD (the first year AD). Also for centuries, as in 'the 20th century' being the years 1901 to 2000.


A more sensible English translation from Korean would be to use the phrase "in year X" rather than the phrase "X years old": a newborn is in year 1; after 12 months they are in year 2; etc.

In fact, this whole discussion is more about a choice of phrasing rather than the numbers. When indexing arrays, sometimes we're talking about an offset from the first element (starting with "0"), and sometimes we're talking about ordinal element numbers (starting with "first"). Some programming language designers found that offsets are more useful (because that choice tends to simplify the underlying arithmetic), while others found that ordinal numbers are more useful (because the word "first" should mean "1", to simplify communication between people).


Right. I try to refer to a[n] as 'element number n' rather than 'the nth element'.


> I see no problem, but apparently some people are hung up on such minor details. I can't fathom why.

Americans and their alcohol laws…


This system is used for racehorses as well.


There's nothing wrong with the definition per se, but there's the question of what purpose you're putting it to. We should expect more similarity from two "newly 2" children with the "western" system than the Korean one.


Doesn't Korea use the same new year as China? Usually second new moon after winter solstice.


hung up on minor details!


Rulers too. A ruler starts at 0, not 1.



Nice! I want a Lua ruler now.


That's because it's dimensional. How long is the shortest possible ruler?


1 planck length, and... It starts - and ends - at 1?


No. Your age is 1-indexed. It's a 'birthday' in English and German ('geburstag'); in French it's 'anniversaire', and so on. But pretty much everyone indexes age from 1. The fact of your birth is the transition from (legal) non-existence to existence, the equivalent of a dimensionless point.


Age is definitely 0-indexed - a newborn and exactly 1 year old differ by a year. People also count months during the first year, which is still 0 years old. As in 00:xx is the first hour, and 01:xx is the second. If you're 30 years old, it's your 31st year of life now.

But gregorian epoch itself is 1-based. 1AD (0001-01-01) goes right after 1BC (-0001-12-31). There was no 0000-mm-dd. That's why 3rd "millenium" and 21st century started at 2001-01-01 and not at 2000-01-01. YYYY means not how many whole years already passed, but which incomplete year goes right now. On the other hand, your age means "whole years passed since birth [plus maybe a few months]".


Age is not indexed.

And the birthdays are definitely 1 indexed: you denote your first birthday with 1, and not with 0, and so on. You don't denote any birthdays of yours with 0. You may denote something else with it, but birthdays[0] gives an out of range exception. (Especially true in French, where birthdays are called anniversaire, but in English too.)


Oddly(?) though, most parents of young children don't refer to a baby as being "zero years old." Rather, we break them down into smaller units: two days old, six weeks old, four months old.


It helps that change happens particularly fast at that age, so the difference between newborn and 6 months is worth mentioning. Later on the units go up again and we just say "I'm in my 30s" :P

On the other hand a computer/car/house can also be 10 years old but not 0.


Duration only ever departs from 0, it can never arrive there.

That's why zero-based indexing is good!


True, a few weeks can mean a few milestones at that age. It's still wild to me how quickly babies/kids develop. Every week brings new abilities, experiences, and emotions that weren't there before.


On the other hand a computer/car/house can also be 10 years old but not 0

Really? When I buy a new PC, to me it will get 1 year old only after a year. Before that it is just NEW. Is that what you meant?


Exactly. What we call "new" is the 0th year. It's the same when you start counting seconds. You say "ok counting starts now", then wait a second and say "one", then wait and say "two". Before you say "one", you don't say zero, but it's implied by you waiting a second after you or someone says "go". That's still 0-indexed.


Demographers sometimes describe age as “the number of completed years”, which always struck me as a wonderfully simple explanation.


The first floor in a USA building is not where a European would expect.


No, birthdays are 1 indexed. Your Nth birthday is the day you turn N years old. When you're 1 year old, you've been alive for 1 year, not 2 years.


The moment you are born you are 0 years old (or perhaps 0.75 years old, but we don't usually recognize that). We count from zero in this case, at least implicitly.

In some cultures you are considered 1 the moment you are born, so the zero indexing isn't universal here, but typical in North America as noted earlier.


Typical counting of things starts from 0. If you count apples you implicitly start at zero and add 1 for each apple. If you count age, you start at birth (0) and count years; one for each birthday. That isn't zero based. The difference is the index of the item between the starting point and the next item. In zero based this item is number 0, in one based, this item is number 1. The first year of life is generally considered the year following birth.


This is a philosophically deep point which took me a long time to grasp.

There is a narrowing from "nothing at all" to "zero apples" which doesn't happen 'in the world' but is a necessary precondition to counting apples. The existence of any apple is a requirement for there to be zero of them before you put anything in the basket.


>If you count age, you start at birth (0) and count years; one for each birthday.

nitpick: unless you were born on Feb 29.


I just dealt with this problem(how to store periodic events(including birthdays)). It is a surprisingly difficult problem. After reading up on postgres interval types my latest attempt uses them to store events, where a per year event(like a birthday) is stored as "2 months 15 days". it turns out the postgres project has put quite a bit of thought into making interval types work as expected with regards to months, not an easy task when you consider how difficult it is to treat dates mathematically.

The nice part is that now February 29 "just works" the downside is the impedance between how months and days are numbered and a how an offset from the epoch(beginning of the year) is defined. January 3rd(1-3) is stored as "0 months 2 days" as when it hits, 0 months have passed and 2 days have passed.

So the specific case of February 29 hits when 1 month has passed and 28 days have passed. 3 years out of 4 this will be the same as "2 months 0 days"(3-1) but every forth year this will be(2-28). As an aside, and the specific reason I went with interval types, every event past feburary 29 works just fine with or without a leapyear, that is, the extra day in the middle does not mess up the offset to days after it.

Honestly I curse a little as I wish months and days were 0-based. At least clocks get this right, almost, 12 hour clocks are a special breed of stupid. start at 12, then go to 1 and proceed up to 11. 24 hour clocks properly start at 0. The worst part about 12 hour clocks is that it is almost correct, replace the 12 with a 0 and it every thing be the same but now it makes sense from a moduler math point of view.

A special curse is reserved when I think about how there is no year zero. https://www.postgresql.org/docs/13/functions-datetime.html#F...


I didn't know that. Do you get to be 1 year old if you are born on Feb 29, to account for leap years?


You're just 1/4 of the age of everyone else born the same year. You have the same number of laps around the sun as those people, but you've definitely had fewer birthdays. It's a common joke for leap year babies.


> If you count apples you implicitly start at zero and add 1 for each apple.

If that's the case, then how did the ancient Greeks or Romans count before zero was an acceptable concept in their counting system?


By describing it in other ways, I imagine: "How many apples do you have?" "I don't have any."


No, we don't count from 0. That's like saying we start counting a baker's cup from 0 because you can have half a cup.


The counting activity doesn't begin when the first item is registered; it begins when the counter is initialized to zero. A decision is made to begin counting, along with the realization that nothing has been counting yet. That's when counting has started. When the first item is seen, the counting is then continuing.

Suppose your job is to count some events. You check in for work at 8:00 a.m., but the first event has not registered until noon. By your logic you should not be paid for four hours, because you're paid to count, and counting started at 1.


Are you sure?

We are talking about the he numbering, not the work-doing.

You start waiting at 8, and you wait between events. But you only count (increment) when an event happens.

The 0 comes for free when you are ready to count (hello golang and C++, as well as human intuition).

When you count stars in the sky you don't say "0, 1, 2".

You say "..., 1, 2", or if no stars show up, you say "there are 0" after timer expires.


> But you only count (increment) when an event happens.

Increment what?

> When you count stars in the sky you don't say "0, 1, 2".

No, you say, "I'm going to start counting stars now. Okay, 1, 2, ...".

The preparation part is the zero. You counted stars before and reached some number; you're not starting at that number.


You're still thinking like a computer. Most people think of counting in terms of 'here are some apples, how many exactly?'

If you just look at an empty space, the # of apples is equivalent is equivalent to the # of dinosaurs, but they're only equivalent by their absence.


No, the mathematical definition of counting (i.e. whether or not a set is countable) involves mapping to the natural numbers, which doesn’t include 0.


I don't think your definition is complete. We can count a set by mapping its elements to the natural numbers, and then identifying that number which is highest. However, we must have a provision for identifying zero as the highest when the set and mapping are empty.


Math counting doesn't care where you start. You could start -400 if it's useful for the problem


You can name things however you want, but there is a canonical bijection to zero-based ordinals.

https://en.m.wikipedia.org/wiki/Ordinal_number


What counter? You're saying a counter exists before people start counting? Is this a form of mathematical Platonism?


A counter can be lazily instantiated just before the first item is counted.


What does your stopwatch say right now?

Your pedometer?

Your traffic clicker?


Those are devices, not the spoken language.


So if the counter doesn't speak, there is no counting?


What do you do for a living, and why are they paying you not to understand this?


Do you know the definition of countable? A set S is countable if there is a one-to-one mapping from S to N where N is the natural numbers. Do you know that 0 is not a member of the natural numbers? We literally start counting at 1 by definition of countable.


Nope.

Is the empty set countable? (Yes.)

Dictionary:

nat·u·ral num·bers

  the positive integers (whole numbers) 1, 2, 3, etc., and sometimes zero as well

Countable: https://en.wikipedia.org/wiki/Countable_set

Set theory:

https://en.wikipedia.org/wiki/Ordinal_number


From your own link on countable sets:

> Equivalently, a set S is countable if there exists an injective function f : S → N from S to N; it simply means that every element in S corresponds to a different element in N.

Defining N is usually done via a successor set, on which case 0 makes no sense to include.


A successor set is the set of successors of... 0 or 1, depending on what you are doing.


An empty set is countable; it has an empty mapping to the natural numbers. Its cardinality is zero.


And its ordinalitiy is also 0.

Standard construction of ordinals is that each ordinal is the set of all its predecessors. (0 has no predecessors , hence 0 is the empty set.) (And so finite ordinals have the same ordinaliity as cardinality).


Show me a mathematical text where ‘ordinality’ is defined.


Are you a time traveller from the late 17th century?


Can't they just .. disagree?

Birthdays are clearly 1 indexed, the first birthday is indexed with 1, and not with 0.


They day you're born is birthday number 0


Exactly.

Birthday[0] gives you an out of range exception, since there is no birthday called 0th. Birthdays are [first,second,..] indexed from 1: brithday[1] = first, birthday[2] = second, and so on. That's what indexing a series from 1 means.


Ok, use the baker and cup as an example. If you have an empty cup and put half of a cup of flour in it, you now have 0.5 cups of flour. Notice the zero before the ".5". That is us, normal humans, realizing that until you add enough to have 1 of something, you have between 0 and 0.999 repeating.


When I'm baking I start by zeroing the scale, then adding enough flour to reach 500 g (or whatever the recipe calls for). That's counting from zero.


If we (speaking as a North American) count from 1, does that mean other cultures (e.g. Korean) count from 2?


No, it just means we have a different notion of what a year is in regards to age. Similar to how different cultures can use different units of measurement for length, mass, etc.


Yes, exactly. One of which carries an implicit zero indexing. The date of birth doesn't disappear just because you decided to use a different measuring device.


No, it doesn’t carry an implicit 0 index. Measuring age (in the West) is like measuring distance. You start at 0, but that doesn’t mean the first item is at index 0.


If you took a standard ruler, a measurer of distance and which has literal index marks painted on it, and placed items along it, the item found at the head of the ruler would be found at the 0th index. You're quite right that we think of birth and its anniversaries in the same way.


Birthdays are literally indexed by the moment of birth as zero.

You can't possibly be serious.


I start with a empty cup, then I fill it up to 1 cup.

I then put it in an initially empty bowl


Yes this is correct!

The milliliters in a baker's cup are a scale based on 0!


No, old calendar systems are one indexed. In those system, there is literally no year zero; the first year is year one. This leads to crazy things like year 100 being part of the "first century" and year 101 being part of the "second century".

That is not the case with age or birthdays which are, thankfully, zero indexed. The first year of human life is age=0, birthdays=0.


So that we don't create more falsehoods programmers believe about dates, ISO has defined decades and centuries as starting with 0, not 1.

So as far as your job is concerned, that's when they start. Hope that helps.


It's not "crazy". No one care about which century X00 is. The term "century" doesn't have enough sig figs. X00 is "the turn of the century".


If you don't care then sure, it's not crazy, but if you did care then believe me, it is crazy. Crazy enough that astronomers[0] and software engineers[1] rebelled against the historian's practice and renamed the years preceding 1 AD in the proleptic gregorian calendar year 0, year -1, year -2, et cetera. A major benefit of this is it allows the leap year pattern to stay consistent and the rule to remain legibile for all years, going back before 1 AD. It's also nice because it lets us say "the 90's were the last ten years of the 20th century" and be correct.

0. https://en.wikipedia.org/wiki/Astronomical_year_numbering

1. https://docs.oracle.com/javase/8/docs/api/java/time/temporal...


Your birth day is the day you are born.

Your first birthday is the first anniversary of your birth day.

We celebrate the anniversaries of our birth day.


Yep, clearer in Spanish. They say cumpleaños which literally means completed year.


Haha, this thing is messed up. Your "first birthday" is technically second, because the first was at your, well, day of birth. But we people love to complicate things and count 1st birthday as a number of annual celebration events after the first mm-dd of birth. Off by one as it is.


So how old is a not-yet-1-year old?


Elapsed time timers start at 0. Time is continuous. The elapsed time being “out of the womb”, that we call “age”, starts near 0. Five minutes after birth, the baby has been in the world for, or it’s age is, five minutes. If someone asked how old a new baby is you could hear “3 weeks”.

Another definition might be from conception. But birth being “one year old” is illogical. The sperm didn’t even exist yet, one year before birth.


Your mistake is using years - use milliseconds (or nanoseconds) when you wish to express a duration. Years don't even have the same duration/length (leap years, and leap seconds)


N days or N months depending on how old the baby is.


And before an hour has even passed, we'd probably say "minutes old" or perhaps "not even an hour/day old", all to avoid saying "zero days/months/years old". But some might still say zero days old, and they'd be both understood and correct (at least logically if not stylistically). That's the "implicit zero" everybody is talking about. We avoid saying it, but that's just a convention of communcation. Logically it's there. You're zero days old before you're one day old.


And it turns out that 0 days, 0 months is 0 years.


> human-focused (1) approach.

TBH humans would have been better of if we were 0-based, it's just a convention. And we have the confusing language where "20th century" means 1900's. If we wanted to bring the 1-indexing we use in language to the fullest extent here to fix that particular issue, time counting would have to start at 1111. Except that won't work once reaching 5-digit years.

If we would start with "zeroeth" instead of "1st", then this would have solved itself and 20th century would mean 20xx's.

I especially don't understand why mathematicians use 1-based indexing (for matrix rows/columns etc...). Like programmers, they should see the advantages of starting at 0 (e.g. the coordinates of subdividing into block matrices are simpler if starting at 0). Mathematicians do start at 0 for the origin of plots, after all.


> If we wanted to bring the 1-indexing we use in language to the fullest extent here to fix that particular issue, time counting would have to start at 1111.

That's funny, but there's actually an elegant way to do it called Bijective numeration (https://en.wikipedia.org/wiki/Bijective_numeration). We're currently living in the 1A22th year.


> The 1 makes a lot of sense in a human world, when we count, we start at 1, we talk about the "1st", counting on finger starts with 1, etc.

It is more like counting from 1 is just a leftover from times where zero was not commonly considered as number. Once one have zero, it makes sense to use it as an initial ordinal (see e.g. set theory, where zero is both initial ordinal and initial cardinal number, way before computers).

Another example is time and date, we start counting of days from 1, but counting of hours (at least in 24-hour notation) and minutes from 0.

> Luis started explaining why Lua was 1-index with this sentence: "The 1st argument ...."

Note that for spoken language, it is "The first argument ..." and 'first' is etymologically unrelated to 'one', but related to 'foremost', 'front', so it make sense to use 'first' for the initial item in the sequence even when using counting from 0.


When talking about discrete things, 0 has a specific meaning: it is the absence of things. It does not make sense to count the 'first' element as the 0th. When you encounter the 'first' element, how many elements do you have? 1.

This is of course different for continuous quantities. When counting seconds, for example, we should absolutely start from 0.


In Western music theory, intervals are one based. No pitch change is "unison"; one diatonic step is a "major second" and so on. As a result of this silly state of affairs, an octave occurs every 7 notes, even though the root "oct" means eight. Furthermore, a "rule of nines" is needed to invert an interval: e.g. inversion of minor 3rd is a major 6th (exchange major/minor, subtract from 9).


And addition also gets broken. Like a third plus a fourth is a sixth.


True, and it's a great example of how this whole drama is about a practical trade-off, not about a unique Right Answer. If you play piano, the second is the second finger; the fifth is the fifth finger, and it all makes sense. No problem. On the other hand trying to actually count that way (two thirds make a fifth and so forth) is maddening.


Clearly the solution is we need to start numbering fingers from 0.


No, just use subtraction for the intervals. Third finger minus first finger = 3 - 1 = 2.

The floors of a building might start at 1, but you go up 1 flight of stairs from the 5th to the 6th floor, not two.


Maybe that would do it. The only problem is that I'd have four fingers on each hand.


I don't think 1-based arrays are better notation, even completely ignoring that code has to run on a machine.

99% of the times, the correct approach is to use iterators. When you really need indices (and you almost never do), 0 is more practical, because it matches the "including start, not including end" convention.


> when we count, we start at 1

That is false; counting begins by initializing an accumulator to zero. When you register the first item, the count jumps from 0 to 1.


Who sets an accumulator to zero and then adds one in everyday language?


For instance, someone who makes a fist with zero extended fingers before counting.

Or someone who simply becomes motivated to count something, without making any utterances or gestures to that effect. The motivation is followed by the persistent awareness that nothing has been counted yet, which then changes to 1.


Not saying "zero" doesn't mean we don't mean it. All counting starts with 0. We just say "one" out loud as the first number. We probably should say "zero" too, but why say a word when you can say no words?


Or we just start counting with 1, since 1 is the first number we start with. Do small kids start with a concept of zero and then adding one to it, or do they just start at one?


If you start at 1, how do you answer the question "how many apples are in the basket", when the basket is empty?

Or do you believe that the answer "none" or "zero" is then given without the activity of counting having taken place?

What do we call the meta-activity then: the procedure that results either in the "empty" answer or "one", "two"? Whatever that activity is called, it starts with a concept of zero. Let's call that activity "quanting". Quanting starts with a motivation to enumerate items, and an initially empty result. When no items are present, quanting terminates, reporting that zero/nothing/none result. Otherwise quanting branches into a subprocedure called counting, and that begins at 1.


Humans could do basic counting prior to the concept of zero. Obviously kids or anyone prior to zero would say there are no apples in the basket, but if you were to ask an ancient Greek philosopher if that meant "no apples" is something worthy of being denoted, they might think you're doing sophistry and trying to elevate nothing to something.

A smart ass kid might reply there are zero oranges in the basket, or zero miniature unicorns. Since the basket is empty, it could have potentially had anything if we're just going to imagine things in baskets. But we don't enumerate over all possible zero items in the basket. And anyway, the basket isn't really empty since it has N air molecules, N fibers or whatever.

The pedantic point I'm making is that counting at zero is a convention we developed for mathematical reasoning when appropriate, but not a starting place for counting things in everyday language.


If an ancient Greek philosopher had three apples in his basket, and I took them away while he wasn't looking, oh, he would definitely find "no apples" something worthy of being denoted.


> when we count, we start at 1

If I ask you to count the number of red balls in a bag with only 3 yellow balls, then the initial count in your head is 0, you inspect the balls one by one, never encountering a red ball, and thus never incrementing the count. And then you pronounce your final count of 0. So that's counting starting from 0.

What you call "starting at 1" is not so much the start as it is the first increment, which need not arise.


> human-focused (1) approach.

TBH humans would have been better of if we were 0-based, it's just a convention. And we have the confusing language where "20th century" means 1900's. If we wanted to bring the 1-indexing we use in language to the fullest extent here to fix that particular issue, time counting would have to start at 1111. Except that won't work once reaching 5-digit years.

If we would start with "zeroeth" instead of "1st", then this would have solved itself and 20th century would mean 20xx's.


> where "20th century" means 1900's

to this day I can't seem to be able to explain to people that Jan 1 2000 was NOT the start of the new millennium, but rather Jan 1 2001

https://www.latimes.com/archives/la-xpm-2000-dec-26-mn-4810-...


I have used 1 based in basic/pascal and 0 based (well in pretty much everything), 0 based effectively no off-by one mistakes. 1 based - common. Most idioms - incl. forward and reverse iterations work better, and are easier to remember with inclusive/exclusive pattern

Other than that - binary AND and power of 2 sized arrays are the backbone of any hashtable. Overall modulus (binary AND) is actually useful.


It has nothing to do with pointers and everything to do with basic computer arithmetic. You can constrain the range of an integer with a bitwise AND operation providing a cheap modulus by power of two. In this regime, zero-based indexing is the natural result. You have to make an adjustment for 1-based. There are whole host of other operations that are simpler with 0-based indexing.

The problem it that most people, even programmers, don't understand how computer arithmetic works and have fantasies of mathematical number lines that the hardware only partially simulates. You see this consistently in the post-Java crowd who think that unsigned integers are some sort of unholy aberration because the languages they've grown up went further to maintain the fictional number line semantics.


Human makes a lot of inconsistent thing: We usually think the 1st floor, and the basement as 1st underground floor ( -1), but the floor jumps from 1 to -1!

Also, the time jump from 11AM to 12PM to 1PM! So I think more human friendly sometimes means more confusing.


In Europe, the floor level with the ground is called "ground" or "0".


That's true. But as for floor numbering, specifically, it depends on the country: some places have a "ground floor" between the 1st underground floor and the 1st "above ground" floor.


It's like English pronunciation, there's little logic. I gave up finding logic in these things at an early age, and resorted to memorizing everything instead.


Is it machine-focused that we can convert a distance of 1.5 m to 150 cm just by multiplying by 100?

Maybe distances should be human-focused, so that no displacement is equivalently expressed as 1 cm, 1 m, 1 km, ...

Then converting a distance d from m to cm is (d - 1) x 100 + 1.


... but if you have something that's 150cm, that's not 151cm, that's 150cm?


I believe there's a powerful status quo bias.

Doing the reversal test, if programming languages had all been 1-indexed, then I doubt we'd hear much from people, in 2022, saying "I think the first element should be 0, and the second 1".


> The 1 makes a lot of sense in a human world, when we count, we start at 1

As a kid I learned "one one-thousand, two one-thousand, three one-thousand" when counting time out loud, but at some point I realized this was incorrect. The prefix is the start of the nth second but it isn't complete yet, so for example stopping in the middle of saying "two one-thousand" you actually haven't reached two seconds yet.

My fix was to move the prefix to the end, so I say "one-thousand one, one-thousand two, one-thousand three".

In retrospect maybe I should have used "zero one-thousand, one one-thousand, two one-thousand".


I don't think it's necessary incorrect to start at one there. If someone asks you to count out 3 seconds, you're going to say "one one-thousand, two one-thousand, three one-thousand" and only at the end of the "three one-thousand" will you have considered the 3 seconds to have actually elapsed. Basically you're already accounting for the time it's taking you to say it. Which to me seems better because if you do it the other way because it gives a better heads up as to when that second has been reached.


None one-thousand, one one-thousand, two etc

Doesnt sound half bad


Ah, yes. Cardinal numbers and ordinal numbers both end in the word "numbers" so they're the same thing. No need to distinguish between having three apples and having the third apple.


If you ask most people to just label their apples, they'd call the first one "1", and so on. Very few people would say, "please pass me apple #0".


And if someone did say that to me, I would pass them no apple, since that's how we use everyday language. And if I knew they were a programmer, I'd first ask them in which programming language they'd wish me to pass the apple.


Yes, but "human-focused" 1-based indexing comes at a cost since at the end of the day the CPU has to add (index * element-size) to the array base address to get the address of the indexed element. With a 1-based index there's additional overhead in either having to subtract 1 from the index or to have a wasted "element 0" to avoid the need to do that.


I assure you, that if we counted from zero, there would be an instruction to add and multiply in the same number of cycles as a multiply.


That's not possible - the subtract and multiply need to be consecutive (adjust index before multiply by element size), so even if it was a single instruction it would still take longer than a multiply that didn't have to wait for a preceding subtraction.

The only way to avoid the speed penalty would be either to have a wasted element at offset 0, or to maintain the array base address as (address - (1 * element-size)) to avoid having to subtract 1 from the index when accessing. In the latter case for dynamically allocated arrays the code would still have to do a subtraction to adjust the pointer returned by the memory allocator, but at least that would be a 1-time penalty rather than per-element-access.

Of course this is supposing a high level language where an array is abstraction, not one explicity aliased to a chunk of memory such as C where an array and a pointer to it's first element are interchangeable.


That goes against my intuition. Multiplication in hardware to this day relies on addition. Is one adder going to add an extra cycle? Or would that time be amortized? Take a look at slides 45-46 here. https://acg.cis.upenn.edu/milom/cis371-Spring08/lectures/04_...

Do you know the answer to that question? (I don't, but if someone does, it will settle this issue).


I don't know, but looking at that 3-input add makes me think you may be right and the extra addition/subtraction could perhaps be combined into the multiplication.

OTOH, for arrays who's contents are size 2^n (char, short, int, long) I'm sure the generated code isn't using multiply in the first place.

Anyways, an optimizing compiler could certainly remove much of any overhead added by 1-based indexing .. for an array access in a for loop it could, if necessary, calculate the "base-1" address once at start of loop.

Personally, having grown up with assembler and C, and still using C++ today, I'm quite happy with 0-based.


There are a lot of cases where a zero index makes sense. In physics and math there are cases where you start the index at 0 and others where you start at 1. And even some where you start at -1. Which you use depends on what is most convenient for the problem at hand.


At which number starts the first centimeter on a ruler ?


At which number starts the zeroth centimeter on a ruler?


it's the first and it is 0, just like the first index in a C array is 0


I always thought of the machine focus being that the location of the first element was the location of the array plus zero.


If you think of the index as an offset you would start with 0.

BTW on which level is the 1st floor?


> BTW on which level is the 1st floor?

in Europe or in the US?


Does Europe has a zero floor? Please tell me Europe zero-indexes their stories!


They do, it's called the ground floor.


In France, it is called the rez-de-chaussée. The “premier étage” (literally translated as “first floor”) is what the US calls the second floor.


A particularly topographically challenged building in Paris has exits to two enclosing streets end up on different floors. The elevator is numbered -2, -1, rez-de-rue [exit north], rez-de-chaussée [exit south], 1, 2, ... .


Don't know French, but I wonder if the meaning of "etage" is similar to Polish "piętro", which literaly means something like "elevation". So, basically in Polish we have a specific word for ground level, and then we count how much elevated above the the ground the current level is. That's why "1st floor" is the one "elevated one level above the ground".

In English you count "floors" and floor is a usable, hard surface on which you can put something, like a chair. That's why a floor on the ground level is treated the same as the floor above it - it is equally good on accommodating chairs, beds, and other stuff.


Interesting, thanks!

Etymologically, étage comes from the Greek στέγω (and gave the English word “stage”); it is a typically wooden cover. Since the first floor was often instead a continuation of the outside road (way back!), it was not considered a “stage”.


"Bajo" in Spain


I've always called floor zero and everybody understands.


Yeah, most or all of Europe does that. In English it is called ground floor, first floor, etc. In Swedish the zero numbering makes total sense becasuse instead of numbering the floors we say ground floor, "1 stair", "2 stairs", etc.


That depends on your language and culture, "floor" is not easily translated, some languages have a word describing all the layers added to the base layer, so "1. sal" (Danish as an example) actually means "the first layer added on top of the base house".

Again, this is more a spoken language/culture thing, and this goes back to what premises we use to communicate with the machines, ours or the machines...


You've got ...P3, P2, P1, G, M, 1, 2, 3...


In Tucson I've seen some buildings where the basement is the first floor. You enter at floor two.


Nobody knows that one.

No elevator I've seen has yet taken a cue from UI design: simply put the buttons within an outline of the building, along with the local numbering scheme.

No more visits to the serial killer lurking in the basement.

The players in this market evidently have been operating at T'ump levels of intelligence. /s


There were a number of systems oriented languages that, while algol-like in syntax, used zero-based array indexing, predating C. BCPL was intended to be a generic systems oriented language, but Burroughs had ESPOL for their mainframe architecture, and HP wrote their operating systems and compilers in SPL, which was specific to their stack-oriented HP3000 architecture. All these used zero-based indexing, because for low-level, memory conscious manipulations, it's much more natural than one-based. But nothing is universal, and IBM's systems oriented languages (BPL, PL/S, and PLS/II) were more an evolution of IBM assembler and then Wirth and PASCAL, and used one-based array indexing.

Personally, I've always found zero-based more intuitive and satisfying, absent arbitrary indexing (in which case I'd chose zero-based for most purposes where there wasn't a compelling solution based reason to use a different index base). But then spent years writing SPL ... I


On a cursory look, I'd say one based is more intuitive, for the sole fact that the a[len(a)] is the last element. Actually I remember there was a small time when I started programming in assembly it took me a while to get used to reasoning about the end of the array.


much of intuition is made, not innate, at least for things as abstract and arbitrary as languages and notation. If you come at this question from a point of view where an array is a block of memory words indexed by a pointer - which is pretty much how system programmers back in the day when I did such things, 50 years ago - then zero based seems natural and obvious. The pointer to the array is the array, and the first element in the array is the pointer plus zero, the second, the pointer plus one, and so on. Once you start thinking that way. For multi-dimensions, the arithmetic just falls out as well.

But I don't think the fact that many of us developed that intuition, makes it inherently better. It just makes it more intuitive for those of us who developed that inuition. I've written oodles of codes with both paradigms. Either works. Zero is just more natural, to me.


Unusually for me, I read TFA first. Immediately thought "this is all wrong", then realized 400 commenters had posted the same thing already.

I have Martin Richards book on my shelf, and for a while shared an office with his coworker. I'm also one of the few folks here who has coded in BCPL.

Ok, so that said, this article is obvious nonsense. Zero indexing comes from the CPU itself. It's how you generate an indirect address by adding the offset. Many CPUs have this in hardware. They don't support non-zero offsets. The reason some languages have zero-based indexes is that their developers had the mind set of doing the same thing you do when writing machine code. They probably also wanted to directly interface between their language and libraries written in assembler (and the kernel was typically written in assembler too). Also they wanted to be efficient. Any offset other than zero is not efficient because you can't use the hardware addressing modes.

So then why do some languages _not_ have zero based indices? The obvious answer it that Mathematics by convention always used 1 as the base (or some arbitrary value). Many languages were conceived by mathematicians or people with a strong mathematical background.


I've discussed this a lot in real world in a different field: apartment floors. In Japan (where I live) they are 1-indexed, where the floor on the ground is number 1, while in Spain (where I am from) they are 0-indexed, where the floor on the ground is number 0.

Both have inconsistencies, like in Spain you might have a "middle ground" (entresuelo) which is neither 0 nor 1, but sits between, and is normally commercial or non-livable, reserving the 1 to the first floor where people live. I like that system better though because it usually goes 1 => 0 => -1 (underground), while in Japan you go 2 => 1 => -1, so I feel like it's missing a floor. You could justify it though as 0 being the ground line, so +1 is "the first above the floor and -1 is "the first below the floor", but I still prefer having each floor to be a natural consecutive number.


English also has that "mezzanine" concept, for partial floors with a balcony over the main floor. You see it sometimes on elevators as an M.


It seems like most newer buildings in the US and EU have "G" or "L" be the ground floor (0 / PB), and then the floor above is "1".

In older buildings, the ground floor is usually "1" - which means the floor above has to be "2".


I (US) don’t remember seeing the second floor labeled 1. I often see G, 2, 3


Arrays in Ada start at the index based on the index type of the array. You can even use an enumeration type as the index:

    type Day is (Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday);
    type Hours is array (Day range <>) of Natural;

    V : Hours (Monday .. Friday);
Which index type you should use depends on the problem that you're trying to model. You can find out the lower/higher end of a type/variable with the 'First and 'Last attributes.

IIRC there's an RFC though to force the lower end of the index to a certain value like 1 or any other number/enum.


Wow, I never really looked at Ada code. Been using VHDL for the past 2 years a lot and when I looked at your code I was like: 'huh strange, this looks a lot like VHDL'.

Turns out both were invented by the DoD.

>Due to the Department of Defense requiring as much of the syntax as possible to be based on Ada, in order to avoid re-inventing concepts that had already been thoroughly tested in the development of Ada,[citation needed] VHDL borrows heavily from the Ada programming language in both concept and syntax. - https://en.wikipedia.org/wiki/VHDL

Maybe I should pick up Ada soon. That could be a fun journey! (I really love writing VHDL)


I always wonder why such fundamental features were not adopted by all languages.


"Should array indices start at 0 or 1? My compromise of 0.5 was rejected without, I thought, proper consideration." -- Stan Kelly-Bootle


This comes up when giving fractional coordinates into a grid. Is (1,1) the upper-left corner of the upper-left grid box, the center, or the lower-right (or the center of the next grid box)? Alternately, is the upper-left corner of the upper-left grid box (-0.5, -0.5), (0.0, 0.0), (0.5, 0.5), or (1.0, 1.0). I've seen all four conventions used in different places, depending on whether the grid boxes themselves are numbered starting at 0 or 1, and when extended to fractional if it makes more sense to have the corner or the center of the grid box take the value of the box itself.


Which is how we got the dx9 half pixel offset in fragment shaders. luckily not replicated in newer revisions.


Worst than 0 and 1 is to have the choice. In Visual Basic 6 you have the option between the base 0 or 1 for each module. It make debugging a nightmare and you can create bug just by copy/pasting code from one program to other with a different base index.


You also have the choice in Perl, because of course, it is Perl.

The old way of doing it was setting the $[ variable to 1. You can actually set it to any value, in case you prefer to start your arrays at 42. It is now deprecated but you now have Array::Base that offers similar functionality.

Use it if you absolutely hate the person who will read your code, as if having someone read Perl code wasn't hateful enough.


One of the least-loved APL features is the fact that you can change the index origin at runtime.


My favorite example of this confusion is JS and (original) Java dates:

    > new Date()
    Wed Aug 24 2022 ...
    > new Date().getFullYear()
    2022
    > new Date().getMonth()
    7
    > new Date().getDate()
    24
    
So 2022-08-24 comes out as (2022, 7, 24). One-based indexing for the year and day, zero-based indexing for the month.


That comes from C’s standard library (and presumably from somewhere else before that). Classic bad design, took a very long time for people to figure out it was unhelpful and dangerous.


https://linux.die.net/man/3/localtime:

    tm_mday: The day of the month, in the range 1 to 31.
    tm_mon:  The number of months since January, in the range 0 to 11.
I wonder where C got it? It goes back to at least 1973's V4: https://github.com/dspinellis/unix-history-repo/commit/92779...


https://minnie.tuhs.org/cgi-bin/utree.pl?file=V6/usr/source/...

Convenience. Look at how the days in the months are stored and accessed. Using 1-based months would introduce an extra calculation (-1) on all searches or an unused value in the 0-index. Also look at how printing is handled for weekday and month names. They, again, take advantage of 0-based indexing.

Day and year are already represented as numbers, so it's natural to keep them as the "correct" (conventional) number as used by most people. Since the months aren't being stored as strings but as an index, this saves them from having useless data (entries in 0) or doing an extra calculation.


Subtracting 1 from the user provided index isn't that expensive.

Or, if you're willing to give up three bytes, index into "ErrSunMonTueWedThuFriSat".

Or, and this is mildly insane but perhaps in keeping with early C, have your pointer be three bytes before the beginning of "SunMonTueWedThuFriSat" and use one-based indexing.


It all boils down to the fact that C's memory model is meant to map closely to the physical memory model, and the C and Unix developers preferred simplicity of implementation over simplicity of interface. Leaky abstractions (like that the weekday or months start at 0 so that they can reference into arrays of names) are fine given that prioritization. Not everyone shares that preference.


Dijkstra's answer (linked in the article) is best:

"When dealing with a sequence of length N, the elements of which we wish to distinguish by subscript, the next vexing question is what subscript value to assign to its starting element. Adhering to convention a) yields, when starting with subscript 1, the subscript range 1 ≤ i < N+1; starting with 0, however, gives the nicer range 0 ≤ i < N. So let us let our ordinals start at zero: an element's ordinal (subscript) equals the number of elements preceding it in the sequence. And the moral of the story is that we had better regard —after all those centuries!— zero as a most natural number."

https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/E...


It could just as easily be 1 ≤ i ≤ N and then I wouldn't have to remember that the lower bound is inclusive and the upper exclusive.


Closed ranges (with both ends inclusive) are super annoying to work with. You can represent the empty range (unless you are willing to do [i : i-1]), and they don't compose like half open ones: [a : b) + [b : c) = [a : c).


For some reason I am thinking in the terms of closed ranges if not specified otherwise/doing it for myself. [i:i-1] is how i think of empty ranges, and [i:i] if I want to capture the element i with a range. Also [a:b] + [b+1:c] corresponds more what I want, than [a:b) + [b:c).

I guess the majority of the people are not like this, but arguments like "just look at it how strange it looks" don't do it for me, because it looks natural, and the other way looks complicated.


I'm curious if you would also accept [i:i-2] as a an empty list, or in general anything where the right side is smaller than the left?

If I am working with closed ranges, [i:i-1] looks like the list [i, i-1]. Like [5:2] would be [5, 4, 3, 2].

With [b+1:c] I would feel like I needed to insert a check to ensure b+1 <= c. With the closed ranges the invariant "left is <= right" is maintained automatically. Though I guess it doesn't matter so much if you accept any list with left > right as the empty list.

The issues with compositionality become even more noticeable with floats. Then you would need [a:b] + [b + minimum_float : c], or something like that.


cannot?


Then the delta of both bounds (N – 1) wouldn’t equal the length of the range (N). The inclusive-exclusive convention is used in order for `end = start + length` to hold.


Why does that matter when the starting point is 1? In that case, you don't need the delta because you have the length already. In the case of a 0-based range you also don't need the delta, though you will want to use the inclusive-exclusive convention so that you get the length "for free".

  1 <= a <= N -- N = length, no delta
  0 <= a < N  -- N = length, no delta
You only calculate the length when dealing with other than 0- or 1-based ranges. There, the inclusive-exclusive convention is very handy as you point out. So if we're fixing the initial offset at either 0 or 1, then use the appropriate convention for that offset. If we let the initial offset float, then the inclusive-exclusive makes sense.


It matters whenever you need to process a proper subrange of an array, and that shouldn’t be different from when the subrange happens to be the whole range. It’s simpler if the same convention is used in all cases.

E.g. in Java, a typical example is that OutputStream has the following two methods, where the first can delegate to the second, and the second (which write the specified subrange of the array) can easily calculate the number of bytes to write:

  int write(byte[] bytes)
  {
      return write(bytes, 0, bytes.length);
  }

  int write(byte[] bytes, int start, int end)
  {
      int length = end - start;
      ...
  }


I addressed that:

> If we let the initial offset float, then the inclusive-exclusive makes sense.

But you replied to someone using inclusive-inclusive for 1-based arrays and complained about the delta not matching the length. Which is a nonsensical complaint, you have the length already why would you need to calculate anything?


In case of the subrange you don’t have the length (or you conversely have the length but not the upper bound), and you don’t want to use different conventions based on whether you process the subrange or the whole range. I’m not sure how I can make it clearer.


I don't understand why 0 ≤ i ≤ N wouldn't have worked instead, since you already have a less than or equal operator at 0.


That would give a sequence of N+1 elements, though. Confusing if you had a function like range(N), for example.


Why, it would be an inclusive instead of exclusive range. N is the last element, so no reason for N+1.


If we use inclusive ranges where 0 ≤ i ≤ N, then len(range(N)) == N+1.

e.g. len(range(3)) == len([0, 1, 2, 3]) == 4


Wait, why would range(3) start at 0 and not 1 in a 1-based language?


You said: "I don't understand why 0 ≤ i ≤ N wouldn't have worked instead". Was that a typo? Did you mean 0 < i ≤ N?


I have always assumed that is just one less operation required to resolve the absolute memory address.


The mental model of being an offset to a memory address is, I think, part of it.

I'd be surprised if the use of zero vs one actually made any difference, from a number of operations point of view -- from the compiler's point of view, the first element in the array is the first element in the array, no matter what we call it.

I mean in an extreme edge case maybe if you are computing an index, and it happens to be zero, then in your math maybe some identity related to zero could be exploited (go to element simple_integer*complicated_function(), where simple_number might be zero) but that seems a bit silly.


How would it not make a difference? If you calculate an index at runtime, to get access to the element, in 0 based would be pointer + index. In 1 based it is however pointer + index - 1 clearly there is an extra subtraction there?

In x86 you could probably hide it in the addressing but that does not mean it does not to be computed


One option, at least, would be for malloc and friends to return pointer-1, rather than pointer. That said, I'm sure a compilers expert could come up with a much better option.


If you’re calculating indexes at runtime, you’re probably not worried about that level of optimization.


This is primarily the actual answer. While most languages force you to see the array as an array, in C/C++ (in C++ I'm only referring to the memory allocated variable type, not an "array class") you can see it as a pointer and do your own math to address any part of it that you want. And the calculation for that memory address is idx * sizeof(the thing in your array). So the real answer here is that array semantics needed to match memory address semantics in the languages that provide that.


There's no reason why memory addresses had to start at 0.


No.

1. Optimizing compilers exist.

2. Addressing at fixed offsets is cheap (a single instruction): https://en.wikipedia.org/wiki/Addressing_mode

0-based indexing is superior (as a default) because it simplifies a lot of common math, as discussed in TFA.


> 0-based indexing is superior (as a default) because it simplifies a lot of common math, as discussed in TFA.

If that was true, Fortran, Julia, R and Matlab would use 0 instead of 1-based.


3. Less buggy for small systems since the Base Pointer address points directly to a record.

4. Macro expansion is often used to store indexes and other 'magic numbers' in Assembler, and this pattern originated either when Assembly was the primary programming language, or even earlier when humans directly punched out cards with machine instructions. Compilers in any remote sense of the luxury we have today did not exist or were not common.


That is true for vectors, but not matrices, right?

  char at(char matrix[10][10], char i, char j) {
    return matrix[i][j];
  }
still has more computation on 1-indexed than on 0-indexed, I believe.


Well you've either got:

memory_address + array_width*i + j

or

memory_address + array_width*(i-1) + (j-1) = (memory_address-array_width-1)+ array_width*i + j

so you can just absorb the extra computation into the pointer.


TFA = Thanks For Asking?



The "Friendly" Article


I'm fine with 0-based, 1-based or anything-based arrays (I recall it being convenient solving 8queen with pascal), but for Rage-Over-A-Lost-Penny sake, music note intervals are always beyond my understanding.

Same pitched notes are called "interval 1" and there goes thirds, fifths, sevenths... All off-by-one in my base-offset-addressing mind... And then major vs. minor which creates all kinds of "aliased addresses"...

I'd really love a BASE-12 floating number representation. Like 4.00 for the middle C; Chords can be then represented by a tuple of such numbers -- major = [+0.04, +0.07] (some sequencers already do something like that and I'm far better at reading that kind of sequencer data than a sheet)


You do understand that besides thirds, fifths and sevenths, there really are seconds, fourths, sixths, ninths (same as second), elevenths (same as fourth) etc... as well, right? There even are intervals that are not named after a number e.g. the "tritone". The reason the 2nd and the 3rd note in a chord are called third and fifth is because usually chords are made with these intervals instead of dissonant intervals like seconds or fourths. It seems pretty clear you'd already know these things, so can you explain what's your issue with music note intervals?


It's that off-by-one nature of intervals that always bumps me. The difference between note a and b is (a-b+1). Calling an octave "an octave" feels to me like calling a numeric system with digits 0x0-0xf as "base 17"


By your logic, an octave/unison would be "seventh"/"zeroth"? (Note that "octave" literally means "eighth" in Latin.) I would think that could work too but the established terminology is not inconsistent. It's just 1-indexed. A lot of things are 1-indexed, in fact I think almost everything in spoken English is 1-indexed. A lot of mathematics, like number theory, is also 1-indexed. I think software engineers are a little too obsessed with 0-indexed things. I understand that things not being standard is annoying to us but pretending like this somehow makes music terminology broken is going too far.

I don't see how 0x0-0xf could be called base 17. 0xf is 15. Did you mean base 15? I think if mathematics terminology developed differently it could be called base 15. The same way binary is 0 and 1 but we call it base 2 because there are 2 digits, but we could totally call it base 1 too, who cares, it's all convention.


And then, there's the imperial system.


I found it interesting that the author phrased their critique so definitely, i.e. "it's NOT this and NOT that", yet concluded the article by saying their whole argument was speculative. That said, I do agree with their premise that it's going to be pretty tough to get a definitive answer as to why 0-indexing won out. And their criticism of others being so ready to declare results is apt, if not a little ironic in this case :)


> yet concluded the article by saying their whole argument was speculative

It's possible to have a speculative argument and also refute other arguments that claim certainty. Pointing out another answer as being wrong does not mean one needs to know the correct answer or claim to have a precise answer.


When he talked about BASIC having 1-based arrays, I had to check my memory against online AppleSoft BASIC documentation and indeed, AppleSoft, at least did have 0-based arrays.¹ I also had to check on Pascal which I haven’t written anything significant in for some 20 years and it turns out that by default, Pascal arrays are 1-based, but most of the code that I worked with (which tended to have its roots in Knuth-written Pascal) explicitly specified the range of indices.

1. As did all the other contemporaneous BASICs that I encountered. For added fun, when creating an array with DIM, you gave the highest index and not the number of elements so, e.g., DIM A(20) created a 21-element array. The 1980 Apple ][ manual I found which discusses both Integer BASIC and AppleSoft doesn’t admit that 0 is a valid index for an array, but elsewhere I saw it indicated as such which leads me to suspect that Integer BASIC has 1-based arrays.


Most of the BASICs I remember had 0-based indexing by default. But you could change this with the statement OPTION BASE 1 at the top of your program.

This was true of even TI-99/4A BASIC, which was closer to Dartmouth BASIC and not written by Microsoft.


Modern Basic (VBA etc.) still supports 1 based arrays, and there's even Option Base (0|1). This is a global directive for all arrays declared in the same file. And individual arrays can be declared with any starting index you like.


I may think that counting 0 as a natural number makes a lot of math more elegant, but clearly I’m just too dumb to rise to the level of wrong.

  fruits = ['apple'] \\ Hello I have fruits 
  x = len(fruits)    \\ Only one kind tho lol
  fruits[x]          \\ OK here is what I have
    SUBSCRIPT OUT OF RANGE
I have always hated zero indexing for this reason (other than thinking in assembler where it is beautiful). It is of course useful in many contexts, albeit with tradeoffs; more importantly everyone is used to it and it's predictable. But I was absolutely thrilled to discover Julia defaults to indexing from 1 like a normal person.

Essentially what we have here is a mismatch between the tool (computer that does everything in binary) and the task (mathematical abstraction of quantities). Now it's completely understandable that people work within the limitations of the tool when there is no other choice, but that's the same sort of path dependency that creates technical debt.

If zero indexing were so great, mathematicians would have made it the default centuries if not millennia ago; but mathematicians don't want to do off-by-1 adjustments on all sorts of common operations because it makes things unnecessarily complicated. To be honest, I think this has become a moat to keep people out of programming even though it's a shallow one.

It could be interesting to test this, say by taking two classes of schoolchildren and teaching one Julia and the other Python (or...). By not having to take on the idea of zero-indexing at the same time as the concept of an array, the Julia group can get into collections using their intuitive understanding of the natural numbers. I expect that picking up language elements quickly will have a compounding effect and that at the end of the evaluation period the Julia group will be able to make significantly more complex programs than a control group.

Edit: in your heart you know I'm right


> If zero indexing were so great, mathematicians would have made it the default centuries if not millennia ago

They did. For any context where the index contributes to the math[1] (i.e. is not simply a label convention like "the x (first) component") the index always start at zero. Polynomials, Fourier series, geometric series, Bessel functions (both their order and their series expansions), etc.

[1]: Except, of course, when a divergent 1/0 term shows up.


Also, considering that Python uses -1 for indexing the last element, instead of -0, which means the last element is not an offset when using negative indices.


Funny, just a few hours ago I asked myself the related question "Should indices start with 0 or 1?" (not for the first time). I pretty much switch my opinion as many times as I think about it.

Here is a nice discussion on stackoverflow https://cseducators.stackexchange.com/questions/5023/why-do-...

The first answer nicely retells the dijkstra argument: integer ranges should be described using half open intervals, and [n, m) is nicer than (n, m]. Furthermore, [0, n) is nicer than [1, n+1), and that's that.

The second answer makes the observation that there is a difference between indexing and counting, and that even in daily life often the first element is indexed by 0.

Still, I rather like indexing the first element of a sequence with 1.


> The usual arguments involving pointer arithmetic

Yeah, I think this exactly why it is zero based. The index became a simple multiplier for sizeof(int).


An assembler spillover into C.


Was C ever much more than a veneer on top of assembler?

Don't get me wrong, it's why I like C.


> Was C ever much more than a veneer on top of assembler?

Maybe in the very early days, but that necessarily had to end as soon as compilers started doing non-trivial optimizations. Today, C is very far from being "portable assembly".


The crucial fact is that if your language supports 0-indexing, you can do that if you like. OR, you can pretend it has 1-indexing, and do that instead.

If your language doesn't support zero-indexing, then you are stuck with 1-indexing, and that's that.

So, zero-indexing is the natural thing to put in a language, as it accommodates everybody.


"It's an offset from the beginning. The first element is 0 slots from the beginning." is what I was taught. (Of course, it's more complicated than that, with pointer math etc etc, but that works as a general gist)

In the past, I've written roughly half a million lines of Lua. Arrays don't always start at 0. I've regularly shaken my fist at Lua and cursed it's wicked ways, but it's really damn useful in a lot of contexts.


Indexes start at 1. Offsets start at 0.

Arrays in programming use offsets.

    for(int i=0;i<10;i++) 
    {
       array[i] = x;
    }
The error here is calling it 'i' for index. It should be 'o' for offset.


Arrays in the programming languages you’re familiar with use offsets.

Not all do [1].

1. https://therenegadecoder.com/code/which-programming-language...


Of course, and I've used Lua for a long time.

The point is that offsets and indexes have different meaning.


Where are you getting this definition of index?


Because zero is a concept that can mean "at the beginning"

The symbol changed over time as positional notation (for which zero was crucial), made its way to the Babylonian empire and from there to India, via the Greeks (in whose own culture zero made a late and only occasional appearance; the Romans had no trace of it at all)

Another meaning is "nothing" but we're talking about position, not nothingness

The mathematical zero and the philosophical notion of nothingness are related but are not the same. Nothingness plays a central role very early on in Indian thought (there called sunya), and we find speculation in virtually all cosmogonical myths about what must have preceded the world's creation. So in the Bible's book of Genesis (1:2): "And the earth was without form, and void."

https://www.scientificamerican.com/article/what-is-the-origi...


I find 0-based indexing easier to work with when working with (dynamically sized) multi-dimensional arrays. For example, with 0-based indexing getting an element at (x, y, z) can be done with

  a[z * h * w + y * w + x]
But with 1-based indexing you need to substract one first from each component except the last:

  a[(z - 1) * h * w + (y - 1) * w + x]


The 1900's were the twentieth century. Except the year 1900 specifically was still the nineteenth century; the twentieth century started in 1901 and ended at the end of 2000, much to the chagrin of all the chumps who celebrated the millennium a year early.

You know what would make all this confusion go away? Zero-based indexing.


In English, the day of your birth, when you're 0, is also your birthday. So, technically speaking, when you're turning 29, you're having your 30th birthday, even though no-one says it like that. I always assumed 0-based indexing was something like that, similar to the example given about the ground floor being the 0th floor in some countries and the first floor in others.

Another example is if I'm getting directions from someone, and they say, "Walk 3 blocks that way", in my mind I conceptualize that I'm presently on the 0th block and I don't yet count it.

I don't think every day people disagree with these examples so much (well, maybe the birthday one, but that's because we use the same word for the literal day as well as anniversary), and they're all fairly intuitive. I never realized that 0-based indexing required such a deep background to be justified.


> In English, the day of your birth, when you're 0, is also your birthday.

That depends on whether you think of "birthday" as synonymous with "the actual day of my birth" or "a celebration of the day of my birth". I think most of us actual consider it the latter, which is why the first one is after you've been born for a year. If you look up the definition of "birthday" (one word, no spaces) you'll see that there is often some acknowledgement of this in the differing definitions offered.

> "Walk 3 blocks that way", in my mind I conceptualize that I'm presently on the 0th block and I don't yet count it.

You have to decide whether the person is saying something is on the third block from your position, or you need to walk about three blocks of distance. You probably do this automatically based on how close you are to the edge of the current block. If I'm 30 feet from the edge of a black, I'm probably not going to include the current block in that distance. I imagine you probably won't as well.

> they're all fairly intuitive. I never realized that 0-based indexing required such a deep background to be justified.

I think they're only intuitive because we're all context sensitive. The issue is when people don't have the same context, or the context no longer strictly makes sense when the terminology is used in contexts that make less sense.

As an offset, and in programming languages where it's easy to see it's an offset (C), it makes perfect sense. In languages where that's all hidden from you, and you're really just referencing the nth item in a list, the 0th item doesn't make a lot of sense. People in these different contexts will likely have different ideas of what "intuitive" means with regards to this.


I liked how in Pascal, one had a lot more flexibility here.

Something like the following IIRC: Type IntArray = Array [-10...10] of Integer;


what size is that array? (hint - is not 20)


In Mathematica/Wolfram Languege, the part 0 of an expression is reserved for the head of the expression: f[x, y, z][[0]] == f. Or g[f][x, y][[0, 1]] == f.


I started programming in assembly, then Fortran, Algol, COBOL, etc.

0 based index makes sense to me. The memory location is offset 0 bytes from where the first data byte is stored (without getting into endianism).


For me, it was 1 based index that makes sense to me.

I started programming in FORTRAN, then COBOL, then (Lord help me) RPG II. All had 1 based arrays. I got paid to program in RPG II eight hours a day, five days a week, for two and a half years.

A common bit of code was to create an array of month names. Then take an input date, say 20220825, and split that into $Year (first four characters) and $MonthNumber (next two) and $Day (next two).

$MonthIndex = strip the leading zero (if present) from $MonthNumber

Then, this code worked: $MonthName = @Month[$MonthIndex]

When I ran into my first language with 0 based arrays, and @Month[1] returned February, I was pretty much convinced you all had gone insane. December is an out of bounds condition now? Really?

Much later, a friend explained the stuff about multiplying the index by the array element length, and easily getting the memory location of the item in that array. It does make sense; but, I've always hated the code I've had to write that says $MonthName = @Month[$MonthNumber - 1]


Simple answer: because everything in computers starts at 0. Address 0 is the first byte of memory, etc.

You can now ask, "why does everything start with 0"? I guess because it gives you nice round (base-2) numbers. With 4 bits you can have 16 different "numbers", either 0 to 15 or 1 to 16. However, representing the number "16" in binary requires 5 bits, so you need an additional bit for the same number of elements. So representing 4 bits as 0 to 15 makes more sense


The correct starting ordinal is 0. A list with a 0th element has cardinality 1. This is a concept that exists in many places in english, where the naturality of starting with the zeroth element trumps inertia but english is wildly inconsistent so it has both.

Constructing the naturals without including 0 and starting with it is incredibly awkward. Addition doesn't even have an identity.

Look at a clock, is 1 at the top, or is it the next one after the top.

Look at a ruler. Does it start at 1?

Do you want to have to shuffle everything around when you go from whole numbers to halves? Do you start at 0.5 or 0? Depends if it is "halves" or the real number 0.5

This is a recipe for pain and off by ones any time you're dealing with time steps (which is why it's so stupid that matlab and fortran are 1 indexed by default, if anything there's a better argument for C to be 1 indexed than scientific languages).

Now index and compose a bunch of ranges. A half open range of two elements is incredibly stupid 1 <= i < 3.

So closed ranges go with 1 indexing. Composing and splitting them is incredibly awkward [a,c] is split into [a,b], [b+1, c] ... that's kind of okay, but not unique as you could use [a,b-1], [b,c]. Now do [d,e], [e,f] compose at a glance?

Half open ranges go with zero indexing and they just work.


So that modulo works the way god intended inside a ring buffer.


I've spent so long working with both 0-indexed and 1-indexed (I work with Lua a LOT at work) that it makes almost no difference to me, and I can switch between the two with usually no hiccups.

Either one is something you get used to with practice. I do prefer 0-based, but 1-based isn't as big of a deal as many people treat it in the vast majority of situations, and almost every situation where you'd be using something like Lua in the first place.


my thought is just "why not"?

0 is a number, and just because the west ignored it for quite some time (read: "Zero: The Biography of a Dangerous Idea") we have these odd built-in aversions to it but in the end it makes plenty of sense.

0th element is 0, so why don't we just do a better job of educating kids to start counting with 0 and thinking about 0.

in the end - as a CS person, i think it's good - but i also don't mind if you disagree with me. :)


If it starts at zero we should call it an "offset"

That would fix so many of these discussions.


Maybe, but then they'd just become discussions about whether `y` in the `x[y]` syntax "should" be an offset or an index. Which one gets the language priority?

...Of course, the answer is offset, because offsets are beautiful, composable, elegant things that harmonize with bitmasks, modulo, array slices, and everything else, while (one-based) indexes are nothing but "offset plus one", no more useful than "offset plus two".


0-based indexing is clearly the best option because it results in more elegant code and it matches what computers actually do.

In fact, it's so fundamentally more elegant that I've come to the conclusion that the real issue is that it's actually the mathematicians and language itself that got indexing wrong. Instead of "first" being associated with 1, it should be associated with 0. We should give athletes 0th place, and talk about the 0th man on the moon etc.

That's where the cognitive dissonance for the 1-based-indexing people is coming from. They can't deal with the fact that their "normal" way of indexing is wrong.

It's a bit like how pi should really be tau (2*pi), or the electron should really be positive. We got it wrong, but we're stuck with it because it's too much of a hassle to change it. Fortunately computing got it right! But now there's a mismatch with everyday life where people are used to the wrong thing.


Ignoring the past; 0 indexing is:

- closer to hardware

- makes index == element wise offset from the first element

- there are both algorithm which are (very slightly) easier to implement with 0 and 1 indexing, AFIK more of the "bread and butter" algorithm are easier with 0 then 1 (e.g. indexing of C-style arrays used all the time, heaps etc.)

- 0 is not special => less bounds checks in typed languages (if you have an unsigned int index you only have to check the upper bounds, with 1-index you also have to check for 0)

- 0 is not special => with signed integer you can have negative indices (from the end) in which case all signed integers are valid (but potential out of bounds), with 1 indexing you have a "gap" in the middle which for some (rare) use cases can be very painful to handle

So independent of what the history was. It's just much more pragmatic to use 0-indexing in many common use-cases. Especially given that 1-indexing produces more special cases for very common use-cases makes it sub-optimal for both usability and performance.


Reminds me of a simple implementation of a Prisoners and Boxes riddle that showed up in my YouTube feed where I had to add a comment so others (and eventually future older me) will understand what I did.

>// if one prisoner fails all are dead (1st try -> tries = 0)

https://gist.github.com/amalic/c107b5289d946b758418778a95fdc...

That being said, it seems to be a junior software engineer issue where the brain isn't wired yet for "0-based" indexing. I don't hear such complaints from more experienced developers who have simply learned to appreciate it because like in my example above it is more intuitive to me.

I have added "and eventually future older me" above because I don't do much coding these days, and my career trajectory looks like it's going to be even less in the future.


I can understand the rationale for 0 and 1. I can even understand the rationale for a arbitrary range (from -5 to +10) in current era where languages have iterated for many generations.

However I did not understand why they would have a range in the early days of programming. On one hand it would have been hard to make a language and its compilers and then you add arbitrary ranges?


Early programmers were mostly mathematicians and physicists, so the notation borrowed a lot from their fields, where it's common to index sequences with arbitrary numbers. And it's not particularly complicated to implement either.


I always assumed it came back to math, and that t0 or time-zero is useful for the beginning state before anything happens.


What if there was no t0 for the universe, as Hawking and a few other physicists have argued? There was just the first plank second and no meaningful before.


Well then clearly you'd have to index at -1, for the time state before the point at which there is no meaningful before. I'm not sure why this answer didn't occur to you.


I think Hawking reference imaginary time, so you might have to index at i.


Another reason can be performance/memory.

When you index starting at 1, you either have to add/subtract 1 internally, which sometimes but not always can be optimized away by a smart JIT/compiler, or you have to "waste" the first element in memory, trading memory for performance.

Lua always subtracts 1, while LuaJIT instead went with the latter approach.


For me, far more significant than anything else here is having to convert array indices into human readable text describing objects. And the objects are named by mechanical engineers who don't use 0 based indexing. So my code is full of

  print("Finger {}'s motor has overheated".format(i+1))


I think that zero indexed is generally better, including that it involves adding the index number to the base address, and other properties that can be helpful.

However, it can also be useful for many purposes to allow arbitrary ranges, so a programming language probably should allow that; if the index of the first element is not zero, then the base address will not be the address of the first element (if the first element index is positive, then the base address may actually point into a different array, or a different variable). Then, if you have the base address and add the index, you will have the proper address of that element, whether or not the first index number is zero.

Zero is not always the most useful starting index; sometimes other numbers (which may be positive or negative) are useful. But, I think in general, zero is better.


The best source for the development of C language is Bell Labs https://www.bell-labs.com/usr/dmr/www/chist.html

Excerpt:

   By a general rule, in B the expression
      *(V+i)
   adds V and i, and refers to the i-th location after V.
   Both BCPL and B each add special notation to sweeten such 
   array accesses; in B an equivalent expression is V[i]
C was designed to be as close to the machine as possible. Prior to C the only programming language for systems programming was assembly. If the compiler-generated code subtracted 1 from the index every time you indexed an array that would have been seen as not as efficient as assembly.


Maybe my thinking is just warped by the language, but I think 0 indexing is the right thing most-- but not all-- of the time. Looking at my code I have relatively few places where I need to -1 to convert a natural 1 index into a zero index.

It would be easy for languages to offer two array access syntaxes, one for 0 and one for 1 like array[i] for 0 index and array{i} for 1 index access, then the accessing code could use the right tool for the job. Unlike defining the indexing type as a property of an array you shouldn't have problems with accidentally using the wrong one at time of use.

... but to really know if it's a good idea you'd have to test it, and sadly very few concepts in programming are subjected to any kind of rigorous scientific study.


Starting at one led to the whole "2000 is not the millennium" thing, as there is no year zero.


Even more painful is that the 2001 began the 21st century, not the 20th.


Array indices denote offsets from the base address. The first element is at the base address of the array, hence the offset is zero. It avoids a conditional when resolving the address, introducing it would slow down all forms of memory access across the language domain.


Yet somehow Fortran manages to be highly performant.


After programming in C for a while, I really began to have the feeling that arrays were physical things that took up space in RAM. When I think like this, starting an index at 0 is very intuitive, as 0 implies the origin point of the physical construct.

If you think of arrays just as a kind of abstract list, then beginning at 1 makes much more sense. No one who looks at a to-do list at home talks about the 0th thing on their list to do. But the spacial nature of arrays makes starting at 1 confusing. After I take 1 step, I am no longer at my starting point. And cycling through arrays in code for me has a very similar feeling to taking actual physical steps through a data structure.


An Array in Ada is interesting. https://learn.adacore.com/courses/intro-to-ada/chapters/arra...

Like other languages, it is a collection of contiguous elements that can be selected by indexing.

Unlike other languages:

- any discrete type can be used for indexing, not just integers

- bounds can be any value (doesn’t have to start at 0 or 1. First index could be 13)

- one consequence of the aforementioned is that arrays are types that map to the problem domain, rather than being tied to the computing domain

- and more…


I'd claim intuitively that having all bits 0 is a proper starting point and more efficient if there were any constraints on address, which was likely the case historically.

I'm happy to hear any dissenting opinions if this is inaccurate.


"math [...] sums seem to always be 0-indexed,"

What? Is this some kind of trolling challenge? If you don't care about math, just say so. Most people don't. But making up a nonexistent convention is just silly.


If your arrays start at 0, you have offsets.

If your arrays start at 1, you have indexes.

"0-indexed" shouldn't be a thing. I like the screenshot quote that refers to this as either "1-origin" or "0-origin".


A related question: why is a rectangle containing a single pixel defined as (x,y):(x+1,y+1) instead of (x,y):(x,y)?

Here's a discussion from some years ago that starts with some of Guido's thoughts on array indexing and moves on to the QuickDraw coordinate plane and points and rectangles, including how grid lines and points are infinitely thin/small, but pixels occupy the space between the grid lines.

https://news.ycombinator.com/item?id=6602497


The general term of arithmetic and geometric sequences seem simpler when indexing from 0 rather than 1. I do not think that '1' is more human focused for anything than '0'.


Its an absolute pain in my arse, i honestly don't know what problems i would have with an array system that started at 1, but since the length method sometimes starts at 0 and sometimes 1, (I'm thinking Perl vs JS) I'm always forgetting whether to put = or <= into for loops and when to add one for the read out or subtract 1 when looking at the array again. I mean in what universe should [2]+[3]=7?


The first element having the index zero may be natural or unnatural depending on the context. In mathematics, we count the rows in a matrix starting from 1, but we count the monomials in a polynomial starting (or ending) with 0. Generally, there is less need to use zero as the starting index: the Common Era does not have "the year zero" (there, -1 is followed by +1), etc.


Easy: I have an array with 256 elements. And I want to store the indexes of various elements of the array off in another data structure. So many of them, in fact, that it’s important that the index values each fit into a byte. So, I want the index values to be 0..255, and not 1..256, since 256 doesn’t fit into a byte.


You can still fit that information into one byte and convert it at compile time or runtime :)

In a minecraft clone I made, each chunk consists of 16 subchunks so that I only need 12 bits to represent the block position in the shader. Technically, the block position ranges from INT32_MIN -> INT32_MAX, but since each subchunk only consists of 16x16x16 blocks I can get away with storing the position in 12 bits and then passing in the subchunk's world position once as a uniform.


That reminds me of this article about the history of hjkl in VIM: https://www.hillelwayne.com/post/always-more-history/

Lo and behold, it's written by the same author.


Arrays start at 0 because they're really just pointers. "[x]" is just shorthand for "+ x * sizeof". The first element is the one stored at the pointer address (0 sizeofs ahead of the pointer, the next element is stored 1 sizeof ahead of the pointer, etc.


In C, "a[x]" is effectively syntactic sugar for just "*(a+x)" (conversion rules for "+" means you don't need the sizeof). It also means you can reverse it and write e.g. "5[somearray]" if you really want to make developers want to throw rocks at you.


hehe, that was me :) when I was young.


That's just convention though. There's no reason "[x]" couldn't have been defined to be "+ (x - 1) * sizeof". There would be no runtime cost to this, and the compile time penalty would be tiny even on ancient systems.


Is `(x - 1)` not a runtime cost if `x` is a runtime variable?


Not on many instruction architectures - addressing modes often support adding/subtracting a constant.


That answers something like "what are the technical benefits to zero-based arrays" but not "why are zero-based arrays such a strong convention in programming languages." For that you'd have to read the article I guess.


Or it could be:

"technical benefits to zero-based arrays" -> "why zero-based arrays are a strong convention"

Article mentions that Hoye says as much but is dismissive of it. I am not sure why he would be so quick to dismiss.


The article this one points to has had a few threads:

The origin of zero-based array indexing - https://news.ycombinator.com/item?id=6879478 - Dec 2013 (107 comments)

Zero-based arrays and the mythology of programming - https://news.ycombinator.com/item?id=6708409 - Nov 2013 (1 comment)

Citation Needed - https://news.ycombinator.com/item?id=6595521 - Oct 2013 (2 comments)

Threads about the EWD mentioned by jaapsen01:

Why numbering should start at zero (1982) - https://news.ycombinator.com/item?id=22162705 - Jan 2020 (220 comments)

Dijkstra's argument on why numbering should start at zero [pdf] - https://news.ycombinator.com/item?id=17850441 - Aug 2018 (1 comment)

Why numbering should start at zero (1982) - https://news.ycombinator.com/item?id=17765034 - Aug 2018 (63 comments)

Why numbering should start at zero (1982) - https://news.ycombinator.com/item?id=13186225 - Dec 2016 (216 comments)

Why numbering should start at zero (1982) - https://news.ycombinator.com/item?id=9761355 - June 2015 (47 comments)

Dijkstra: Why numbering should start at zero - https://news.ycombinator.com/item?id=777580 - Aug 2009 (71 comments)

Also these. Others? I'm a little surprised there aren't more.

Why do we count starting from zero? - https://news.ycombinator.com/item?id=17923391 - Sept 2018 (1 comment)

Why C Arrays Start at Zero: I Don't Know - https://news.ycombinator.com/item?id=11228267 - March 2016 (3 comments)

Why C Arrays Start at Zero: I Don't Know - https://news.ycombinator.com/item?id=11114704 - Feb 2016 (2 comments)


Perhaps also worth adding that the linked article (the one titled "Citation Needed") itself originally had hundreds of comments beneath it, many quite argumentative, which seem to be now missing - I guess lost in a site update. It was a very contentious article.

I'm sympathetic to its author's approach - we talk about indexing and it's a strangely fascinating topic but the history of it hasn't been all that well dug out. We do indeed tend to respond to articles like this by going "well obviously [x]-based is good because" - you can see that at work in this discussion here - but those are not necessarily the true historical reasons. But I think the author made a leap too far, the article was a bit too brittle, and it landed in a slightly too argumentative spot.


Because indexing happens with integers. Back then space was a concern, so you'd use unsigned integers, and those start with zero. Internally they'll all be using offsets, and its easier to not do a conversion, than to do a conversion.


Unfortunate that the Julia folks weren't exposed to this when they designed the language.


https://juliaarrays.github.io/OffsetArrays.jl/stable/

0-based or any arbitrary offset of your choice so that your data model can more closely map to what you need it to. And use eachindex instead of making assumptions and you can work with arrays using 1-based, 0-based, or arbitrary-based indexing.


For this particular problem, having a choice even worse than 1-based indexing. Code become a giant PITA to read because you have to switch back and forth.


Looking at my own coding experience, I certainly favor 0 over 1. When I examine why, I can't help but see my aversion to & distaste of Visual Basic as a driver.

VB is the only 1-indexed language I have used. VB is bad. Therefore 1-indexed languages must be bad.


The index of an array is an integer, and integer values start at zero, so it seems like an obvious choice as to where the starting point should be. It's only outside of computer science that it seems to be counter-intuitive.


There is definitely a bunch of math that works out better in 0 based indices. I saw this a lot when I was working on filesystem focused stuff.

I feel like I've also seen algorithms that work out better with 1 based indices, but not as many.



Actually in not all languages do arrays start at 0. In ColdFusion for example arrays start at 1. I've never been able to receive a reason as to why CF's original author, J.J. Allaire, did it that way.


Came here to say this about CF. I work in Coldfusion but coming from a background of other languages I would always reference 0 as the start of an array. This was the source of countless errors for me until I got use to it.

My manager thinks it was done to make the language easier to understand for people just getting into programming.


Came here to say this about CF. I work in Coldfusion but coming from a background of other languages I would always reference 0 as the start of an array. This was the source of countless errors for me until I got use to it.


And Julia, an otherwise nice language.


also Lua and R.


Almost as interesting as hearing arguments for/against tabs/spaces.


The argument that "BCPL doesn't have pointers, so it's not about pointers" fails immediately because every language has pointers, but some of them don't expose them to the developer.


Nobody mentioned Lua? Really?

Lua's arrays are 1-indexed, and it is a terrific interpreted language! Array[0] is nil, makes sense. v5.1 has a solid JIT, and it's got a great 2D game engine called Löve2D.


Pascal and SQL also use 1-based arrays


Because back when computers were young, all bit representations were needed to represent something and having 000 not be used as a valid value wastes a huge percentage of available space.


In defense of starting at 1: if you're counting things, starting at 1 ensures that the last number you said is equal to the number of things you've counted so far.

Toddlers love this invariant.


I always assumed that it was because arrays and pointers would've been interchangeable in the start, and that adding a zero index times the object size gives you the first element


Because memory starts at zero, because the lowest address you can select in memory has all its address lines set to zero.

If you don't understand this, you should not be writing computer software.


Welcome to pagination and virtualization where physical address is different than what your program think is. Also welcome to hypervisors where entire operating systems think they own an entire physical line, only to have something like Spectre/Meltdown say otherwise.


Because sometimes we number things, and sometimes we count things.


base^0 = 1 base^1 = base base^2 = base * base et cetera.

I've always considered the index and the power to which a base has been raised to be equivalent in many circumstances.

Thus starting at 0 makes sense.



Wow, we are still talking about this, are we? Honestly, linear algebra texts should just give in and adopt zero based indexing for matrices at this point.


The concept of nothing doesn’t really belong in the number line. It’s like inventing a new “number” to represent blue and then using it instead of pi.


Simple: Because you add zero to the memory address of an array to access the first element.

The array indexer was syntactic sugar for pointer arithmetic.


Qbasic uses 1 and that’s what I always loved about it. Coincidentally it was the same language I learned programming with at age 12.


I always thought it was because it’s easier multiplying the index by the size of the array item to get the correct offset


I write code that interfaces between C# and R and you have to be careful to remember to do the indexing change. Fun!


Languages like fortran and Ada allow arrays to start at any index. This allows you to do what matches your problem.


Natural numbers start at 0, so it makes sense.

1 is just as arbitrary as 2 or 79823 etc. Math wouldn't work without 0.

Thats good enough for me.


So what's the actual reason you'd want to have zero based arrays and such in your language (In 2020, not 1970)? In a language from scratch that doesn't care about being like previous ones, and doesn't care about micro optimization.

I've been annoyed recently by this obnoxious neckbeard behavior when they make things zero based for no reason (other than to conform to their supposedly existent philosophy), like Ratpoison and Screen selecting windows by a zero based index, which means you have to move your hand to the far right of the keyboard to select the first window with the 0 key, the far left to select the second window, with the 1 key, and one to the right to select the third window, with the 2 key.

I so happen to have been trying to work out a tangible explanation on paper of which system is better on trips a few weeks ago, and could not come up with one (and none are given in the article nor the top comments in this thread).

On a side note, the moment you say, "zeroeth", you can no longer be sure anyone knows what you're talking about because at any moment you may count in English or this pretentious UN*Xtard dialect of English. Seems like another pedagogical stumbling block held on to boomers and boomer-wannabes who just want to have their little counter counter counter culture or whatever.


My 4 year old starts at 0 when asked to count, so proud! Never said a thing about starting at 0.


They point at 3 apples and goes "0,1,2" naturally?


Look at your 4 year old's ruler. What's the first number on it?


A problem with 0-based arrays is that 0 is the only index with a Falsely Boolean value.


Because ordinal counting has been a tenant of computer science since inception.


Because an array is just syntactic sugar for pointer arithmetic ... at least in C :)


If everyone just called it offset instead of index there'd be no confusion.


For me it comes down to; are you enumerating intervals or items to offset into.


Seeing 544 comments about this story has restored my faith in HN. :-)


It's an offset from the beginning. Simple as that.


Simple: because we start with index 0 in mathematics.


I have encountered a lot of linear algebra textbooks that use 1-indexing to describe matrices and matrix operations.


Pascal's arrays start at 1, at least by default.


This is false. Pascal arrays start at whatever you want. As for what is happening behind closed doors (aka linker), those start at zero, but with an offset kept separately (that's why C blew out of the water Pascal in terms of speed when looping through an array). And in Pascal you cannot create a dynamic array with anything other than 0-based.

You are confusing arrays with strings. Those used to start at 1, but since Unicode took over (~2007) strings are implemented as zero-based too (even though you can still think of them as 1-based when writing code).

Also in Delphi, when creating a cross-platform application, strings are treated as 0-based, with even multiple locations in documentation stressing this importance due to having enough differences between classic VCL and the new kid on the block (aka FireMonkey).


Pretty much all variants of BASIC have arrays starting at 1.


I think a language needs to decide if it is "high level" or if it is instead a convenience language on top of assembly.

BASIC and Pascal I am sure would have been considered a high-level language in their day and naturally would have arrays be 1-based.

C on the other hand....


Except VB.NET which is only 0-based but you can just shut your eyes and imagine it's 1-based because you declare an array by specifying its upper bound which equals its number of elements in 1-based.


Pascal is a variant of ALGOL-60, made in part because Wirth thought ALGOL-68 was way too complicated. BASIC instead comes from the FORTRAN line.


BSD bash arrays are 1 indexed for some reason


But the first year of this century was 2001…


Good article.

Especially the DBAD comment.


Why do months start at 0? is my real claim.


The Lessons section is brilliant.


0 is offset based. 1 is index based.


This is my take and when working directly with computer memory/addresses, 0-based is most convenient. For working with data/stuff at a higher level, 1-based is more natural and convenient.


Why do tape measures start at 0?


Obligatory, there are a lot of memes on this subject:

https://www.google.com/search?q=arrays+start+at+zero+meme&tb...


data Position = Start | Next Position


Zero-based indexing is always wrong; an ordered collection of things does not have a zeroth thing.

Sometimes you have a data structure called an "array", which means "a bunch of things that are the same size next to each other in memory". You can store the address of the whole array as the address of its first element, and offset into it by an offset; clearly, the offset of the first element is zero, but it's not the "zeroth" element.

And finally, sometimes you have cargo cult behavior by people who think that the way offsets into arrays work is because "numbering starts at zero in computers!" or some similarly wrong rationalization. This is when you get stupid things like (nth 0 list) in a lisp.


I hate to break it to you but natural spoken language is not a fixed concept, words can have many meanings and interpretations, and those meanings and interpretations can mean different things to different people at different times.

Because you predominantly think of indexing as the first definition as shown here[1], does not mean everyone else does. Be careful when using absolutes, especially when talking about languages.

[1] - https://www.merriam-webster.com/dictionary/index


An ordered collection of things starts no things from the beginning.

A journey starts 0 metres from the origin.

If you count the number of _ you are along your journey and start with 1, then at the second metre you are at the 101st centimetre.

When someone says how many dragons have you slain? Just as you leave and haven't heard of any dragons yet, you don't say "I'm slaying my first dragon now"

Every construction of the natural numbers I have seen starts with 0.

The first hour of the day has passed when the clock strikes one. The hour preceding that is the same day. English calls it 12 because it was invented before we had a firm grasp of zero and english is inconsistent. 24 hour clocks call it 00.

1 based indexing can be correct, but 1 based ordered discrete sets map incredibly poorly to any ordered set with a different number of elements (such as the reals). Additionally dealing with ranges is less easy to make consistent.

0 based indexing is always correct.


An ordered collection of events in time usually starts at t(0).


Some guitars have a zeroth fret; an actual fret at the nut.


This reminds me of arguments about male -> female and man -> woman (and human etc). Without offending anyone's delicate sensibilities - you may note that there is a short version and long version of those words, suggesting that the longer version is derivative of the short version.

However, when you examine the origin of the words, that is not the derivation. The derivation and path to English is quite complicated. You may feel a sense of righteous indignation or righteous repudiation on this topic. I would gently suggest deferring that feeling because I think the story is very interesting to follow.

That is not the end of the story though. The reason those pairs of words stuck around is almost certainly because it makes a consistent mental picture.

Bringing it back to the topic at hand - "Why we use 0" - because it works. There are many times when it has been evaluated, and at least to the people who design languages, it is more mentally appealing.

https://www.etymologynerd.com/blog/man-vs-woman

https://medium.com/interesting-histories/interesting-histori...


I really fail to understand what the etymology of sex and gender classifications has to do with array based indexing.


> "Bringing it back to the topic at hand - "Why we use 0" - because it works. There are many times when it has been evaluated, and at least to the people who design languages, it is more mentally appealing."

The linked post is not just about a technical decision, it's about a social history.

The conclusion of the linked post:

> "Lessons

Things can have more than one cause. Don’t trust easy explanations.

Always look for information that refutes your theory, not just information that supports it. Hoye stopped as soon as he had a satisfying answer and didn’t keep researching.

Don’t be a dick.

It is tragically easy to trick me into doing free research."




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: