I don't quite understand the argument "0-based being easier for pointer arithmetic is nonsense because the language doesn't have pointers".
Whether or not the language presents the concept of "pointer" to the user is independent of whether or not it uses pointers internally. And if it exposes arrays as a concept, it has to implement them somehow.
The simplest possible implementation of arrays is having a start address and putting all elements next to each other in RAM. To get the address of a particular item, this layout naturally leads to the formula "base address + index * element size", with "index" being 0-based. If you want to expose other indexing schemes in your language, you'll have to add more logic to convert the user-visible index back to 0-based before you can obtain the address.
All of this is completely independent of the fact whether your language exposes pointers to the user or not.
Even the yacht story sort of hints at this:
> To keep people from having their jobs cut short by yacht-racing, Richards designed the language to compile as fast as possible. One optimization was setting arrays to start at 0.
If all indexing schemes were equal, why would this change even be an optimisation in the first place?
The "language" that has a 0-based indexing scheme is assembly. An HLL with 1-based counting that compiles to assembly will introduce additional computational overhead for the translation of the index.
If 1-based indexing was used in assembly, then "mov 1(%ebx),%eax" would be the equivalent of "mov (%ebx),%eax".
Another option for users is to "sacrifice" the 0th entry of an array. Depending on the size(as in sizeof) of the entry, it can be worth it.
A benefit is that you can then use 0 as a sentinel value; for instance if you have a find() routine that surely can fail, it can just return 0 instead of having e.g. -1 (which can introduce minor issues).
In my experience, though, I am so used to 0-based index that switching schemes can cause stupid off-by-one bugs. I guess that's the main reason behind complains about Lua. It's not that "natural" arrays are thought of as bad, but mixing both schemes (often C an Lua) is error-prone.
It's basically only x86 among modern ISAs that lets you do base + literal + register * literal, aarch64 only gives you base + register shifted by a literal, and I believe RISC-V is similar: https://gcc.godbolt.org/z/dz768768z
OOE doesn't necessarily save you if you end up with a hard dependency on the value of the read (and even if it did, the little cores on ARM SoCs are in-order). This is a pretty obvious candidate for macro-op fusion, but I'm not sure whether this actually happens (and if it happens on ARM little cores, etc.)
If it is not on the hot path, it is likely free, but not guaranteed. If it is on the hot path then it is wasting a whole cycle. And of course in highly ALU-dependent code it is another instruction, so a fraction of a clock.
What do you mean it wastes a whole cycle? It may indeed have worse performance due to blowing the instruction cache, but I don’t see why would out-of-order execution be slower on the hot path - I doubt there would be too many hot paths without any dependence on memory fetches outside specific benchmarks - the memory loads will take significantly more time even if they hit cache.
What would memset(arr, 0, sizeof(arr)) do in that case? I can see myriad problems with having arr itself point outside the memory range allocated to arr[min..max].
For most cases you could get away with some index transformation logic in the compiler (but probably your language would need special types for indices, so I do agree with you that it is not worth the trouble).
> An HLL with 1-based counting that compiles to assembly will introduce additional computational overhead for the translation of the index
Not necessarily so for translation of an index that is evaluated at run time. PL1 (back in the 1960's) compiled arrays into 'dope vectors'. Each array's dope vector included highest and lowest valid subscripts for each dimension of the array and the address where the element with all zero subscripts would be found, if it existed.
Not sure if assembly/machine code is that relevant. If one based indexing was more prevalent, the LEA instruction on x86 would just subtract one during execution
u/blutomcat's point clearly (I think) was that instruction sets didn't do what you suggest, so zero-based index is (was and still is) what you'd do if you wrote in assembly (which was not uncommon!), and on any given platform, assembly was the main language 50-60 years ago. It follows that it's easier to carry that over to higher level programming languages.
Whether that's what actually happened in the cases of HLLs that adopted zero-based arrays or not, I don't know. But today, to me, zero-based array indexing feels very natural, and the idea that zero-based arrays being simpler in assembly carrying over to HLLs seems at the very least plausible.
If you did this, you'd subtract one once at array creation. Or subtract whatever the offset is (really, you'd subtract 1 * data type size). Under the hood, using C syntax instead of assembly:
int foo[n]; // what the user wrote
// what happens behind the scenes, after a fashion
int* foo = malloc(n * sizeof(int); // or sp - n * sizeof(int) if stack allocated; subtraction since stacks usually grow "down"
foo = foo - 1;
All references into foo will now work just fine so long as they are within the [1,n] range (same issues as with 0-based there since C doesn't carry size information for checking array bounds access). This adds one extra instruction per allocation (which includes allocation on the stack) for all non-0 offsets, but then all access will have the same cost whether 0-based, 1-based, or arbitrary-based. That's a non-zero cost, but it's not exorbitant since you'll be accessing much more often than allocating (and if it's reversed, something weird is happening).
Since as you say, C doesn't carry size information, you would actually need to do it each time a pointer is created from an object. In C you can have arrays of arrays (not pointers) or arrays in structs, where there is no pointer on creation:
// 11 arrays are created here
int foo[10][10];
It doesn't make sense to create pointers for the 10 inner arrays, so the subtraction would presumably happen when referring to each inner array:
In any case, this is more work, both for the computer and for the human. It's analogous to trying to figure out "which year of which century is X in?" (2022 is the 22nd year of the 21st century):
The above formulae work for the 1-based convention, because they do the necessary 1 subtraction/addition in the right places. If we instead counted from 0, things would be a lot simpler (2021 would be year 21 of century 20):
year(x) = x%100
century(x) = floor(x/100)
Unfortunately, the Romans started counting years before anyone knew about 0. I think this is basically why we have a convention of counting from 1: 0 simply wasn't discovered until relatively recently in human history. Earlier number systems such as Greek/Roman numerals don't have a way of representing it.
Indeed. End exclusion is advantageous because we don't need to add or subtract 1 when constructing or interpreting ranges. Forgetting to add or subtract 1 is basically why off-by-one errors exist (and doing add rather than subtract or vice versa is probably why off-by-two errors exist).
`a..(a+n)` is an n-length range, not an (n + 1)-length range (oh look, an "add 1" operation!).
And an empty range is denoted by `a..a`, not `a..(a-1)` (oh look, a "subtract 1" operation!).
[0, 1, 2] would be how I would refer to my 3 apples. Each apple is identified by the number of apples that precede it.
I don't think it's inherently more natural to start from 1, just conventional. Disregarding history/convention, I think it would be more natural to use the lowest available natural number.
Back in Roman times, the lowest natural number that people were aware of was "1", so obviously they started counting from that number.
Our understanding of and use of mathematics has evolved since then, and accordingly there are fields such as computer science and combinatorics where there are clear advantages to starting from the smallest number (zero). In virtually all other cases, the reason "1" seems more natural is because that's the way it's been done historically.
It seems that when labelling those apples using a 1-based count, the logic is basically: each apple is identified by the number of apples that precede it, plus 1. The reason for the "plus 1" is that that was your starting number, but it could have easily been 2 or 3. If you instead start from 0, you can omit the "plus X" logic, just as I omitted the "+ 1" and "- 1" logic in my year/century formulae when moving from 1-based to 0-based counting.
The romans weren't the only people who knew how to count back then... And how many of the other (presumably arbitrary) choices for smallest number were zero? And why did it take over one thousand years to come up with ordinal "zeroth" after we knew about cardinal "zero"?
> It seems that when labeling ... number ... that precede it, plus one.
I don't think so. It's the number that you have counted once you've counted that one.
How do you write zero in Roman numerals? Answer: you can't. Even though they used their number system for adding amounts of money, they hadn't figured out "cardinal zero" yet.
It was introduced into western mathematics through Fibonacci in the 1200s at the same time that Hindu-Arabic numerals were adopted, which use "0" as a placeholder (compare this to earlier Greek numerals which work similarly to the system we use today but without placeholders and using different sets of symbols for the different places—and of course no way of representing zero).
That's what I'm talking about (though I guess the "thousand years" estimate was slightly high). Why did it take hundreds of years after the introduction of zero to get "the zeroth element of a sequence", which is quite a new usage, and even now is confined to specialized fields?
The well known “Numerical Recipes in C 2nd ed.” talks about how the first edition promoted exactly this - offset the pointer after allocation and you can use any indexing origin you want. And they promoted 1-indexing and wrote all the algorithms as such, before switching everything to 0-indexing for the 2nd edition.
I agree with you the subtraction instruction is not exorbitant, especially considering malloc is much more expensive and always has been.
Integer add/subtract never really took a long time. Slower than today’s sub-nanosecond, and it depends on the computer, but there were CPUs in the mid 1960s doing millions of instructions per second. They weren’t crazy slow, they were just crazy expensive.
A million, perhaps. But you're talking about adding one instruction to every single data access. Plus your code bloats up with all those extra dec instructions. BCPL and C were created for departmental and lab computers like the pdp-9.
Note, for example that the mighty 6502 succeeded in the market because its designers had heard from customers that the $100 price for the 6800 was too much, so they removed some instructions and addressing modes to get the die small enough that it could be sold for $20.
And even if it was an instruction for every access, memory access is much more expensive than and add (at least nowadays -- I don't know if that's always been true).
Similarly I like to think about it as the add/subtract instruction is very small compared to malloc (and maybe could be included in a custom allocator for free). So the one extra instruction will always be somewhere between relatively minor to negligible and unnoticed.
Yeah today memory access on a cache miss is really expensive. I think memory was normally running at a different clock rate than the CPU even 50-60 years ago, so has always been something where you can’t touch memory every instruction. The latency of a cache miss wasn’t nearly as bad in the past. When the data cache was invented, a main memory access was like 4 times slower than register access. Today, main memory access can be more than a hundred time slower, sometimes it can approach a thousand times slower, which is why we always have multi-level caches now. And memory latency is still getting worse. If someone figured out how to make memory faster without increasing the cost or the energy consumption, they’d be rich! ;)
the "differently wired circuit" would be an extra stage of logic computing a carry all the way from lsb to msb (an ALU outside the ALU?) and would contribute a fair bit of extra time. Easier to just use the ALU to do it, which is inserting an extra instruction, also a time waster.
If you're already doing base index addressing (eg, [SI + DI]) you're already using the ALU to compute your memory address, and it's not _that_ much extra wiring to have the ALU support either a constant power-of-two displacement or even a three-way addition; a constant displacement for a basic ripple adder is a XOR and an AND NOT, eg. The x86 family does three-way addition (and a shift) to support "[ESI] [EBX * scale + displacement]".
It's not just the extra wiring, but the extra time; each additional computation layer in an integrated circuit introduced delays that would made the operation noticeably slower.
Well, maybe, if the specific use case of 1-based arrays was all that got supported.
It'd be a pretty huge waste of processor design space to do that, though: if an HLL wants to support n-based arrays for n != 0 it can just store the base pointer with the initial index offset already applied.
Early 8-bit processors had a pretty poor set of addressing modes: you couldn't even expect [R1 + R2] let alone [R1 + R2 * scale + displacement], so random access into an array would be a multi-instruction task. By the time of the 80386 base scaled index addressing with displacement was in the CPU core, but it's not _free_ - there's an additional byte in the instruction encoding for the displacement, so even if the CPU computes the displacement with no additional clock cycles you'll have a latency cost for fetching the byte.
Defining index as the element wise offset is a precise well formed definition/name.
It's also a definition commonly used in math, through not always. It depends a lot of the area of mathematics and even the cultural context.
The only reason we sometimes feel 0-index is wrong IMHO is because the english language describes entities in a sequence as 1st, 2nd, 3rd etc.
But I wouldn't be surprised if there is some human language somewhere which does it differently. Like something which roughly matches to english like "head, head+1, head+2, ...".
Depends which country your building is in, and more!
My apartment building is 1,2… but my mall is G,M,1,2… (same country different architects). For many years I lived in Europe where it’s usually G,1,2… (where G can also be E or Fsz or whatever) and when I was in the US I had to remember that 1 is G, though L,2,3… is also common.
City people live with index ambiguity all the time and our brains manage just fine.
Actually, "rés-do-chão" would be the more common term. Still... in Portugal, the "first floor" is implicitly never the ground floor (but the floor above it), so 0-based indexing is respected (as you suggest).
On elevators you don’t see “R/C” unless the elevator is very old, and the same for office building directories or shopping centres. In almost all cases it’s the number zero. OTOH, in countries like Germany you will see the “E” for Erdgeschoss everywhere.
> Which floor is the first floor in your building?
0.5 ... yep that won't work
In some buildings build around 1900 where I live the "ground" floor is not on the ground but around 1/2 a floor above the ground floor. While the cellar is just half below the ground. Because of this the cellar has above ground level windows allowing legal cellar apartments (oversimplified in Germany apartments require windows which are above ground).
This creates a lot of confusion.
By convention this 0.5 floor still counts as ground floor with the floor number 0. But people confuse it all the time for the first floor. Especially if in some cases you do have a few utility rooms exactly at ground floor.
Now things get worse if you live in a cellar apartment in such house: By convention floor -1.
Because nearly everyone will thing you mistyped as normally you legally can't have apartments below the ground floor.
Another fun index ambiguity is for apartments build on a mountain side/cliff.
In this case depending on which side of the house you look at the ground floor might be at, above or below the ground level. Often the floor level with the main entrance counts as ground floor, but sometimes it is not the case and e.g. the lowest floor not below ground or the highest floor even with a "ground".
So yes, index/floor level ambiguity is everywhere and much worse then just is ground floor 0 or 1 ;)
Not only that but which floor is the 13th (in the West) or 14th (in China)? Often those numbers are skipped because "luck", "feng shui", or other superstitious nonsense.
Do you count the machine floors in high rise buildings?
In AU, the ground floor is effectively "zero", the first floor is the one above ground. Depending on the design of the building, there might be a "mezzanine" floor between ground and 1st.
Basements usually follow the same pattern, so B1 is the first floor below ground etc.
In Belgium (french speaking part of the country, like in France) we call the ground floor (where you enter the buidling) "rez-de-chaussée" (or "rez" for short) which is the equivalent of 0 and then the first floor, numbered 1, is on top of it and you use an elevator or stairs to reach it.
I've just come back from the vacation in the US, and my experience was: when the hotel has G or L for the ground floor, it still has 2 for the floor directly above it. "G or L followed by 1" never happened to me.
It's the consistent inconsistencies that causes it to be less confusing. It would be really weird to go from someplace very orderly to another that is haphazard.
> Defining index as the element wise offset is a precise well formed definition/name.
But it's less precise than offset and obviously introduces confusion. Whoever originated the term "0-indexed" should have just suggested we use a better word rather than keeping the inaccurate word (proven by the fact you're currently looking for a way to remove some confusion around the currently chosen word) and prefixing it with a digit, which itself is a confusing thing to do to an English word, thus adding to the confusing while subtracting none.
you can say the same for 1-indexed, because from the get to go both existed.
Also it's not less precise, as there are many different kinds of offsets too. First you have element wise and byte wise and similar offsets, then offsets are inherently relative to something, and that's not always the first element. But could be an offset from the back and I have even seen a offset from one element before the first element in some very unusual use-cases.
That so people in this thread argue about the higher-level language (missing the point) shows that few people found access to the underlying machine code.
Which is sad, because all code is still executed as machine instructions even when the developer does not see it or does not want to care.
We're 20 years past the point where you can expect everyone in tech to trace every instruction down to machine instructions. Higher level languages abstract away the need for it, and for the most part, we can rely on the authors of those languages to make many of the decisions that impact performance. The rest of us learn about these details on posts like this. It's not a sad fact, in fact it's probably one of the most useful things that's happened in tech. It frees people to think about other problems.
I barely touch assembly in my day to day work, but I do understand on a fairly deep level how a computer works, which I feel is extremely important and very frequently influences how I write high-level code.
Certainly one can bang out code their entire career without ever having a clue how machine code works, but I really wouldn't advise it. At worst it leads to total ignorance, and at best you accumulate a disconnected set of "best practices" as inscrutable lore handed down from on high.
When I did web dev it was more important to know how the WEB BROWSER works, not how machine code works
it's not useful to know how the browser allocates memory or garbage collects it, it's more useful to know that it has web sockets. Don't get the idea that things you know are relevant to other people.
You go as deep as you need to. My day job involves writing tons of Javascript and there is zero need for me to dive into the underlying processes until something goes wrong. So far the lowest I’ve ever needed to go was the node source.
It only stays relevant if the JavaScript is too slow. In the case of removing two elements from an array of length 8, it's not too slow to remove them one by one since, again, it runs in microseconds
If you're not raining about how your code translates to logic gates, you're not a competent programmer. /s
Abstractions are good, and while we should be careful not to completely trust a brand new abstraction, high level languages are established enough that we don't need to worry about it.
Not sure if you were around 20 years ago, but you sure couldn't expect everyone in tech to understand machine instructions as well as the layers above, all the way up to corporate politics.
If this were the primary reason, it seems odd that some of the earliest popular languages used 1-based indexing in an era with much slower CPUs (<= 1 MIPS) and less available memory to store instruction code. If the authors of FORTRAN were comfortable with 1-indexing on the IBM 704 (40 KIPS) in 1957, it couldn't have been prohibitive. At the same time, as computer use cases became more interactive there may have been a greater focus on performance, but it likely that the semantics of the high-level language were at least as important.
“machine code” isn't really the zero point that's special nowadays.
The code the programmer wrote is compiled to something such as LLVM IR, LLVM IR is further compiled by LLVM to Assembly, this is further compiled by an assembler into machine code, and then the c.p.u. further compiles this to it's internal code as it executes it. “machine code” really is no more special in this chain of events than, say, LLVM IR.
... then the c.p.u. further compiles this to it's internal code as it executes it. “machine code” really is no more special in this chain of events than, say, LLVM IR.
I think this is the core point. Even though you are mechanically feeding machine code into the CPU, modern CPUs deeply transform your program before (or even while) running it, with potentially deep performance implications. With modern CPUs, the only choice we truly have is which compiler we trust to optimize our program.
It's slightly different - machine code is special because it's a visible API between our programming work and the CPU sillicon. Microcode is private implementation detail which you usually can't affect as a software developer. Machine code is something you are directly creating by writing things though.
Except if the CPU is microcoded, those instructions might not mean what you think they do, and you need something like Intel's V-Tune to actually understand what the CPU is doing with them.
It has, and if you manage to know everything on those pages, for every single processor on the market, while taking into consideration motherboard architectures, and OS workloads you're our hero.
We are way beyond MS-DOS days and Michael Abrash's books.
It was just most basic example. Indices are connected with math all the time. Consider implementing circular queues with 1-based arrays. Any arguments about what’s _possible_ miss the point. Arguments about what _feels_ natural purport to know the human mind which I find generally suspect. Mostly I think of computers as tools and am interested in getting the most functionality out as smoothly as possible not as slaves whose job it is to make my life worry free.
> To get the address of a particular item, this layout naturally leads to the formula "base address + index * element size", with "index" being 0-based. If you want to expose other indexing schemes in your language, you'll have to add more logic to convert the user-visible index back to 0-based before you can obtain the address.
The easy way to do that is by shifting the base address. In pseudo-C (I think that computing the b pointer is non-conforming, even if the code never tries to access b[0], but compilers can do this without problems):
int a[100]; // array of 100 integers, zero-based
int * b = a - 1; // array of 100 integers, one-based
Things only get costly when you want to check array bounds or when you have multi-dimensional arrays. There also may be non-standard architectures where this kind of stuff isn’t possible.
In this case, b is a pointer that does not point to a valid address. You can't memset(b, 0, 100), or do any of the other regular pointer things with b.
It feels like you're introducing a million edge cases.
This is undefined behaviour, as a pointer is only allowed to point to valid addresses (as well as the index of the last element + 1, but it can't be dereferenced).
I think, in theory, it would work regardless of the starting address. As long as you don't try to access the invalid address (which you wouldn't assuming that it's starting in the index 1, you would always be accessing the first valid address)
“In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i + n-th and i − n-th elements of the array object, _provided_they_exist”
[…]
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; _otherwise,_the_behavior_is_undefined.
In this case, the minus-one-th element doesn’t exist, so the expression
int * b = a - 1;
triggers undefined behavior.
I think some compilers use this in practice to produce faster code (that, often, will not do what the programmer expects it to do). Start reading at https://stackoverflow.com/questions/56360316/c-standard-rega... if you’re sure they don’t. I expect that will change your opinion.
That is a description of the commonly agreed upon definition of the C abstract machine and language semantics. You could simply define the language another way with regards to this behaviour.
Not that I would want to do it, I think zero-based addressing is not very taxing for the convenience of being closer to how we think of memory addressing.
Amusing but probably irrelevant: the sailing yacht racing handicapping calculations mentioned are almost certainly under the Performance Handicap Racing Fleet (PHRF) system[0]. This system is "zero based" in its own way. A sailboat's performance is assessed and handicapped against a standard yacht with a zero rating. Your time to complete the race course is adjusted by your handicap to determine your final standing. This lets you race slower and faster types of boats in a "fair" way.
Implementation matters for performance, but even beyond that, the interface matters for users. 0-based offsets are convenient for users doing math on indexes.
A pointer is just one kind of array-like indexing scheme. Good pointery languages will distinguish Address from Offset from Integer.
I like to treat the naturals as starting at zero, but not for any mathematical reason. I just think that since it's easy to write ℤ+, it's more useful for ℕ to refer to something slightly different than it would be for ℕ to be a synonym for ℤ+ while anyone wanting to include zero in their set was forced to spell out "nonnegative integers".
A dichotomy between "ℤ+" on the one hand versus "ℕ" on the other is just plain more convenient than a dichotomy between "ℤ+" and "ℕ" on the one hand versus "ℤ\ℤ-" on the other.
It is because you mean two different things by "natural numbers", even though the same words are used for things that are not the same things.
I prefer to start by zero; I think it is more useful in general, and makes more sense mathematically for many (although not all) purposes, and systems that you will find objects and operations that have these properties too.
If there were any justice in the world, they'd start at 2, as a handicap. They might even make it out of the bottom of the NL east if that was the case.
other important and day to day tools like rulers, clocks and speedometers also start at zero. So it's not exactly "bending humans to microprocessors weird alien ways".
It all makes perfect sense in the context of measuring.
The analogy is slightly off though. The first position of an array occupies 1 unit of memory. 0 is not a possible measurement in the world of arrays.
Maybe a null terminated string could have length 0 if you don't count the terminator. But that 0 is a property of the "string" abstraction. The actual "array" would still be length 1.
> An array of length 0 starts at 0 and ends with 0
Nope. The moment you "start" somewhere you occupy 1 unit of memory. Thus no longer an "array of length 0".
There is no such thing as an array of length 0. It absolutely does not exist. You cannot write source code to represent it.
ie assuming we are talking about the pure data structure of an array. Some languages may have some abstraction built on of arrays (ie c-strings) that can have length 0 but these are not "arrays" and if they are defined they still occupy 1 unit of memory for the terminator.
C++ requires every value to take at least one byte of memory, but this isn’t true of all languages. Additionally, C++20 adds [[no_unique_address]], which allows zero-sizes types.
iopq points out Rust as a language that does allow zero length arrays in a sibling comment.
While ISO C does not allow zero length arrays, GNU C does. There’s also flexible array members, which can be length 0.
Sure a language can define some abstraction that says "hey I'm length-0 array!". But this is more like a statement of "hey I don't exist yet, but if i do exist in the future I'll be of type array and at least length-1".
The programming languages that I use most often can all represent empty arrays just fine. Of course, indexing into them is an error. But defining them is commonplace, and not being able to define them would make a lot of my code a lot more complicated.
With those other things zero is zero, but with arrays the zero index is the first item.
The disconnect really starts when you get the size of an array and then have to minus it by one to have the last index.
To take your ruler example and apply array logic to to. Say you are measuring a 2cm thing. If the ruler follows the index then it would show 1cm (the last index), and so you would have to add + to get the size/length.
Having the index not align with the size is a pain that results in a lot of +/-1 code that a compiler could just as easily have handled.
it's a good idea to call the 0th element of an array the zeroth element, else you encourage off-by-one errors. The size of the array, the "count", is that which your index must be less than.
I'm not saying it has to be this way and it can't start with 1, I'm saying what you are pointing out as a flaw is actually the mixing of systems.
in computer programming teams it's best to adopt a standard unambiguous language to communicate to one another so as not have to constantly say "do you mean...?"
I'm not arguing that the natural numbers should start from one, rather that the usual path for developing the natural numbers starting with set theory is that zero is the size of a set with nothing in it. In maths they always talk about the first element in a vector, not the zeroth. It is a bad argument to point to mathematics for using zero based indices. It is not at all common to use zero based indexing there.
I'm a mathematician, and 0-based indexing makes way more sense to me.
Dijkstra's argument (https://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF) is that we count the number of predecessors. This fits well with the mathematical usage, at least among mathematicians who care to dig into the order-theoretic foundations: the von Neumann construction of ordinals (and of cardinals as least ordinals of a fixed, well, cardinality), each finite ordinal is equal to its cardinality: 0 = \emptyset is a set with 0 elements, 1 = {0} is a set with 1 element, 2 = {0, 1} is a set with 2 elements, etc.
I'm a mathematician too, and it doesn't make sense to me. It seems somewhat arbitrary depending on the application, i.e. I think it is a type problem. We use an Int to mean something special about array elements. I'd be happier with first() and last() methods really, I don't care for coming up with meanings based on implementation details.
> I'm a mathematician too, and it doesn't make sense to me.
Just to be clear, I wasn't making some sort of bizarre appeal to authority; whether something makes sense is a matter of taste, not of mathematics, and can't be proven or dis-. I meant only to respond to:
> In maths they always talk about the first element in a vector, not the zeroth. It is a bad argument to point to mathematics for using zero based indices.
I don't know for sure which convention is more common, though I suspect you're right that it's the 1-based convention, but I do know that it's false that 'they'—meaning, among others, we!—always talk about the first element/component in a vector. I probably will do that in a first Linear Algebra class, because the textbook often does and I don't want to introduce unnecessary confusion; but usually in my own work, and sometimes when I teach upper-level math classes, I use 0-based indexing when it's necessary to choose.
It's not a convincing argument because you index into an array if you want to retrieve an element contained in the array. If 0 is the cardinality of an empty collection, it's not a valid index, because you can only index into non-empty collections.
I think an age-based way to phrase it: in your 1st year, your age is 0; in your 2nd year, your age is 1; and so on. We can assign people numbers indicating what year of their lives they're in, or how many years they have lived, and both are fine, but we've settled on the latter.
That's not an argument. It's a coincidence. There are applications where what you care about is the number of predecessors (indeed, that's what the compiler cares about, which is why we have 0-indexing in the first place), but they are a tiny minority of all indexing.
> I think an age-based way to phrase it: in your 1st year, your age is 0; in your 2nd year, your age is 1; and so on.
But that isn't even true. No one ever reports the age of their new child as 0; instead, they will report a positive number of months, or -- if it's an extremely new child -- of weeks or days.
> That's not an argument. It's a coincidence. There are applications where what you care about is the number of predecessors (indeed, that's what the compiler cares about, which is why we have 0-indexing in the first place), but they are a tiny minority of all indexing.
I think there's nothing to say to the first two sentences but that I regard it as an argument that may be more or less convincing. I don't know exactly what it means for something to be an argument vs. a coincidence; it is a coincidence that, say, my name is what it is, but it is nonetheless correct for me to argue that that is my name.
These are all conventions anyway, and there is not much use (or even meaning) in arguing about which one is the right or wrong convention, just which one makes more or less sense; and this is one way to make sense of 0-based indexing, though of course there are also ways to make sense of 1-based indexing.
I'm not sure I buy that these cases are a tiny minority, but I'm certainly in no position to produce any data to the contrary.
> > I think an age-based way to phrase it: in your 1st year, your age is 0; in your 2nd year, your age is 1; and so on.
> But that isn't even true. No one ever reports the age of their new child as 0; instead, they will report a positive number of months, or -- if it's an extremely new child -- of weeks or days.
I can believe that ages aren't reported that way, although I think that a child who will be 1 year in 1 year should logically be said to be 0 years; but we can avoid that debate by considering future years: in, to pick the example that applies to me, my 42nd year of life, I am 41 (I could say 41 and 1 month, but I don't—in fact, there's a Seinfeld joke about that). Similarly, the 42nd entry in a 0-indexed array is indexed 41. It doesn't have to be that way, but I don't think one can argue that there's anything logically amiss about it (nor about 1-based indexing … but we do have to pick one).
> I think there's nothing to say to the first two sentences but that I regard it as an argument that may be more or less convincing. I don't know exactly what it means for something to be an argument vs. a coincidence; it is a coincidence that, say, my name is what it is, but it is nonetheless correct for me to argue that that is my name.
Sure. But we're talking about whether to index arrays from 0 or 1. It is true that naming an array element after the quantity of its predecessors will tell you the number of predecessors the element has. But that's not an argument for why you should do it; there would need to be some kind of benefit to having that information. Without a benefit, it's just something that happens to be true.
That's the difference between an argument and a coincidence.
I learned them in elementary school (in the US) as starting at 1. Indeed, Google's first result for natural number is from Oxford Languages, and defines natural number as:
the positive integers (whole numbers) 1, 2, 3, etc., and sometimes zero as well.
Waste 0th element or reuse it for something like length (hello, pascal strings). Another option is using base_address - element_size as your array value. Another option is using +element_size for all array accesses, assembly languages usually have this instruction. There’re many options to use 1-based indexing without sacrificing performance.
At least x86_64 can put something like `ptr[index * 4 + 4]` in one instruction (assuming that variable values are in registers). Not completely sure about ARM.
newer pascal strings are #0 terminated, original pascal strings were 255 chars max with string[0] being the string length, they were also not Unicode friendly, newer pascal compilers still support this as the shortstring type. I use pascal style strings in my embedded C library I wrote in 1989 much more efficient on 8/16-bit systems
I always assumed it started at zero otherwise you don’t use the full range of an unsigned integer and you can only have an odd number as your capacity (for whatever you use indexing for, not just RAM addressing).
Other advantages of zero based indexing, beyond being 'closer to the machine':
It works better with the modulo operator: `array[i%length]` vs `array[(i+length-1)%length+1]`. Or you would have to define a modulo-like operator that maps ℕ to [1..n].
It works better if you have a multi-dimensional index, for example the pixels in an image. With 0 based indexing, pixel `(x,y)` is at `array[x+widthy]`. With 1 based indexing it is at `array[x+width(y-1)]`. You might argue that programming languages should support multi-dimensional arrays, but you still need operations like resizing, views, etc.
Another advantage is with ranges: 0-based indexing and exclusive ranges work well. This is apparent with cursor position in text selection
Consider:
Characters h e l l o
Cursor index 0 1 2 3 4 5
Char index 0 1 2 3 4
Range [0,3) [0,1,2]
Range [2,5) [2,3,4]
Range [1,1) []
If we used 1-based indexing and exclusive ranges, it leads to ranges where the end index is greater than the string's length...
Characters h e l l o
Cursor index 0 1 2 3 4 5
Char index 1 2 3 4 5
Range [1,4) [1,2,3]
Range [3,6) (!) [3,4,5]
Range [2,2) []
but if we use inclusive ranges, it leads to ranges where the end index is less than the start index...
Characters h e l l o
Cursor index 0 1 2 3 4 5
Char index 1 2 3 4 5
Range [1,3] [1,2,3]
Range [3,5] [3,4,5]
Range [2,1) (!) []
Also:
Characters h e l l o
Cursor index 0 1 2 3 4 5
0-based range [0,3) [0,1,2]
1-based range [1,4) [1,2,3]
for the 0-based range [0, 3), the left array bracket is at cursor index 0, and the right bracket is at index 3. With 1-based indexing it doesn't work like that because the range is [1, 4)
This is the bane of my existence working with Lua.
Iterating an array or adding to the end are fine, we have ipairs and insert for that, but ranges on strings I'm constantly having to think harder and write more code than necessary.
I love the language, wouldn't trade it for another, but the 1-based indexing on strings, which represents an empty string at position 3 as (3,2), it's egregious.
Not as egregious as a dynamic language where 0 is false though.
Yeah, offsets are just easier to mathematically manipulate than ordinals. It's not just pointer arithmetic where it matters that item i corresponds to start+i*step. Any time you want to convert between integer indices and general linearly-spaced values, 0-based indexing is more convenient.
In many domains code maintenance is more important than hardware costs. In many domains 1-based indexing is a better fit, meaning less conversion code, meaning simpler code. Thus, the best indexing choice depends on the domain and circumstances, as do many controversial questions. Most tend to specialize in specific kinds of domains and over-extrapolate their experience into other domains.
Ironically I learned this from a discussion of Julia bugs. Apparently changing offsetting of arrays has proven to be a source of bugs in Julia. So maybe someday they will come to the same conclusion as languages like Perl and stop allowing it.
I'd argue 0-base is a source of bugs too!
Ideally we'd be able to catch more array indexing bugs at compile time - there are definitely cases where it should be possible to determine that arrays are being incorrectly indexed via static analysis.
The problem is that libraries which assume 0-base break when you have a 1-based array. And vice versa. Trying to combine libraries with different conventions becomes impossible.
Therefore changing the base leads to more bugs than either base alone.
That said, the more you can just use a foreach to not worry about the index at all, the better.
Of 0-based and 1-based, the only data point I have is a side comment of Dijkstra's that the language Mesa allowed both, and found that 0-based arrays lead to the fewest bugs in practice. I'd love better data on that, but this is a good reason to prefer 0-based.
That said, I can work with either. But Python uses 0-based and plpgsql uses 1-based. Switching back and forth gets..annoying.
I'd expect the compiler not to let you to pass a 0-based array to a library function expecting a 1-based array. I'm pretty sure that's how it worked with Visual Basic, which was the only language I ever used such a feature in.
Search for OffsetArrays in https://yuri.is/not-julia/ for practical problems encountered in trying to make this feature work in a language whose compiler does try to be smart.
As Jens Gustedt points out[1], the following intentional unsigned overflow works perfectly for downwards iteration (even when length is 0 or SIZE_MAX), though it looks a bit confusing at first:
for (size_t i = length - 1; i < length; i--) ...
You are also free to start at any other (not necessarily in-bounds) index, just like with ascending iteration.
(more or less, some scoping things not encompassed by the above; this is also how pretty much every for loop in a C-syntax language works) The condition of a for loop is equivalent to a while loop's condition. So yes, length - 1 < length will be true on the first iteration, which is fine because the loop continues as long as that condition is true.
What the above approach takes advantage of is that when underflow eventually happens you'll have this condition:
MAXINT < length
Which will terminate it for all possible values of length.
That was my reaction - why should anyone think it might be the latter? Are there languages that do have such a syntax without explicit keywords ("do...until")?
The loop continues until i transitions from 0 to 0 minus 1. 0-1 in this case actually doesn't equal -1 since size_t is an unsigned type, instead it wraps around to be the largest possible positive integer instead. TLDR; yes as you speculate it terminates when the number underflows.
the footnote [1] should be [0], just for the sake of this very topic.
Seriously though, while the idiom does work for unsigned integers, it's a bad idiom to learn [makes code reviews harder]. The post-decrement one in the loop body works with everything (signed/unsigned), and it's well known.
Or count from `length` to 1, but subtract 1 in the loop body, or count up and subtract the length in the loop body. Any modern compiler should be able to optimise these to be equivalent.
In the majority of cases, counting down is not necessarily. Nor is ordered iteration. Most languages have a `for each` style syntax that's preferable anyway.
I like how GP is called "operator-name" and instead of doing himself, he makes others joking with operator names. Although I'm not sure if it's altruism or highly manipulative behaviour.
the correct way/idiom to reverse iterate an array is
for (size_t i = length; i-- > 0; )...
It's surprising how often the issue pops, it works well with both signed and unsigned integers.
(edit) I've started with one based indexing (basic)... mixed with 0 based (assembly), more 1 based (pascal), then more stuff (all zero based). I am, yet, to see a real advantage of a one based indexing... after the initial process.
Sure, why wouldn't it be? As far as a cache is concerned, I don't think reverse sequential iteration would be any different than forward sequential. The actual RAM accesses may be less optimal if there's some speculative pre-fetching with assumed forward sequential access, but that's conjecture.
With some exceptions, hardware prefetch works in terms of ascending accesses. To learn if a particular CPU will prefetch for descending access, benchmarking is essential. Best to use soft prefetch calls if performance is critical.
i would suspect that the cache prefetch/prediction could use the "velocity" of the memory access to predict the next access; so if the access pattern was going backwards, the "velocity" would be negative, but prefetching would still work if they just followed the predicted pattern.
It’s not. It was nice on architectures were cache didn’t matter much and were subtracting and comparing to zero was just one instruction (looking at you old core ARM)
In the C programming language unsigned integers do not overflow. They wrap. This is well-defined behaviour and the example code is simply incorrect. Most modern compilers will give you a diagnostic for this.
Unless wrapping underflow is sensible for the domain (which it isn’t when representing the size of something), unsigned integers are usually a bad idea.
That's quite broken (you'd want a different variable inside the body, vs clobbering the iteration counter, else this would process the last item in your list, then exit).
A rare case where 1-based indexing is more convenient is complete binary trees laid out breadth-first (as in a standard binary heap): parent is i div 2 and children are 2i and 2i+1 when starting at one and who knows what when starting at zero. But that’s the only one I know.
Except 1-based indexing is what we use in normal language. We don't use "zeroeth" or "player (number) zero" etc. And the word "first" is shortened to 1st etc.
Personally I think we'd be better off if programming languages stuck to the same convention - off-by-1 errors aren't the hardest problems to deal with but they're still annoying.
That's confusing ordinal and cardinal numbers. The element with index 0 is the first number. The element with index 1 is the second number, and so on.
Using the term "zeroth" is basically some form of showing off (even though it's kinda fun), but will be utterly confusing when you get to the fifty-second element which is the last in a group of 53 elements.
I'm not confusing them, my point about abbreviating "first" as 1st was that in typical speech we start counting at 1. Nobody says "let's start with item zero on the list".
But programmers are stuck with having to say/think "item 0 in the array".
> With 0-based indexing the children are at 2i+1 and 2i+2. The parent is at (i-1) div 2.
> Not hard to figure out.
While that's true, "you just shift by 1" is equally good at all arguments for or against 0-based indexing, so deploying it here probably won't convince.
That said, the effort of one versus the other is so trivial that there is no point in ever using effort as an argument either way. Doubly so because what seems like effort to us is simple unfamiliarity.
What is important is which one leads to more careless errors in practice. As a trivial example, consistent indentation takes effort, but failing to do it leads to more careless errors. Therefore everyone indents code.
An incredibly unrare case happens all the time in my work: array[length - 1], or variations of this. Anything involving the last element of the array, and often iterating through the whole array will use something similar at some point.
I think the 1-indexing folks would have to argue that a%b should return a value from 1 to b inclusive. This does make the same sort of intuitive sense as 1-indexing. For example we number clocks from 1 to 12.
but nobody "says" zero o'clock - it's always twelve o'clock!
and the example is in 24hr format - which needs to have the 00 to differentiate it from being 12:30. But if you write in 12hr format, you don't ever use 00 - it's always 12:30am or 12:30pm
First of all, you don't. Or I guess most Americans don't. I do, or most Europeans do. 12.30am and 12.30pm feels just very wrong in Europe.
But yea, I do agree with you. we don't say 0, we say 12, no matter if it's noon or midnight. That's because we humans avoid saying zero when we mean zero.
It's the same with other comments here, talking about counding seconds, we say "ok go, one, two,.. and not "zero, one, two,..". We say 3 months old baby, and not 0 years old baby.
We use 0-index in so many things, we just avoid saying the word "zero", and we use other names or other units to avoid that word.
Because otherwise you would be wasting a perfectly good number for no reason, which means you need to use more bits to do the same thing.
To write 4 numbers (including zero) you only need two bits
0: 00
1: 01
2: 10
3: 11
To write 4 numbers if you avoid using the number zero, you need three bits
1: 001
2: 010
3: 011
4: 100
If you extrapolate that a little bit, you'll realize that you'll need two bytes (1 Byte + 1 bit from another byte) to store the indices of an array with 2^8 elements, which is just dumb.
Some of you might be thinking: you don't need to store it the same way it's written, you can just substract 1 from whatever the user typed and convert it behind the scenes.
Yes, you could subtract 1, but then you would be making the whole system more complex, opaque, and unelegant, while hiding information from the programmer for no good reason.
As a kid, I saw these things often on multiple-choice test sheets. The tens digit at the end of each row not matching that at the beginning always seemed awkward, and I wondered why they didn't just start at zero. This was long before I did any programming.
The first year CE being "1" resulting in the new millennium starting at 2001 instead of 2000 also seemed idiotic, and the 1900s being referred to as "the 20th century" was something I always had to consciously compensate for. Numbering items starting at zero would have given us more elegant/less confusing ways to communicate those things.
Zero-based indexing makes things inherently simpler because it matches the way we write numbers (and becomes much more noticeable once you have more than one digit). It's not just an optimization for computers.
To keep it confusing: the traditional proleptic Gregorian calendar (like the Julian calendar) does not have a year 0 and instead uses the ordinal numbers 1, 2, ... both for years AD and BC. Thus the traditional time line is 2 BC, 1 BC, AD 1, and AD 2. ISO 8601 uses astronomical year numbering which includes a year 0 and negative numbers before it. Thus the ISO 8601 time line is −0001, 0000, 0001, and 0002.
The index is the distance away from the first element. So index 0 is the first element. Index 3 is 3 away from the first element, so the fourth element.
It's the difference between ordinal and cardinal numbers. In other words, counting and indexing.
Formally, ordinals start at zero too :-) [0]. We owe their latest definition to Von Neumann, but AFAIK, the former definitions were similar.
[...snip...]
--
[0] This is meta-meta-contrarianism.
- The layman counts from 1
- The uptight programmer counts from zero, because Dijkstra said so (or so he thinks).
- The meta-contrarian (I used to be one) says fuck it, ordinals start at one.
- The meta-meta-contrarian reads Wikipedia[1], realizes he was formally wrong, and goes one step further in pedanticity, back to zero [0].
That being said, my brain prefers 1-indexing programming languages like Lua, Julia, R and Matlab...
Counting fingers (or other items) isn't a positional numeral system, so it's really apples to oranges. Also, "first" has a meaning of "nothing precedes it", which is separate from the indexing system", it shouldn't mean "at index 1", unless otherwise specified.
Though the idea of an index as a position instead of an address is weird to me, you can indeed make that analogy.
The "first" element of an array has the index zero, as in "zero elements precede it".
That gets really dicey if you're storing the value of the index in a variable. Does the programmer need to know that if it's an INDEX they're storing that goes from 1-256 they can fit that into an 8-bit value because the compiler will magic away that last bit? Speaking of which, how will the compiler know which 1's are really 0's and which 256's are really 255's? Will it compile that 1+1=1 if I use that result as an index? Just having a 0th index is much simpler than all of that noise.
not really? It just changes how the compiler emits the x := *(addr + idx) operation where instead of doing something like
ldr x, [addr, idx]
it just does
sub t1, idx, #1
ldr x, [addr, t1]
If we're talking about C, accessing invalid/OOB indices is undefined and so if idx happens to be zero and unsigned, we'll overflow and hit something unexpected, which is fine.
That's what I was thinking too, except that they have a point. You might end up using more bytes to store indexes as variables (if the index is 256 for example), than you would with zero indexing.
Someone linked to your johnny decimal system earlier today. https://johnnydecimal.com/
I liked the idea but I noticed that it didn’t seem to be 0 indexed. If I adopt this i’m definitely going for zero indexing.
But then of course the reason why 1-indexing is better (or really, arbitrary indexing, which is even better) is that it matches human semantics better.
Edsger Dijkstra wrote an interesting article titled 'why numbering should start at 0'.
Perhaps not answering the question directly but an interesting read nonetheless.
As the article implies, Dijkstra was specifically wrong about FORTRAN, as defined at the time — the '77 standard, when it was still only conscionable to SHOUT 6 alphamerics.
“… you probably know that arrogance in computer science is measured in nano-Dijkstras.” — Alan Kay
I've never understood the first half of Dijkstra's argument. Unless we are programming a theoretical Turing machine, there will also be a maximum number that can be represented, so the argument about using inclusive bounds for the lower end also applies to the the upper end. So you can't represent all numbers with a single range style. You either end up excluding the smallest number, the largest number, or the empty set. Or you need special notation to handle one of those three cases.
I suppose the 0,1,infinity principle argues that including the empty set and smallest number is more important that including the largest number, so that notation should be preferred by default. And when writing using A<=i<B it is clear enough, but once you remove those symbols and start using them in indexing or function call syntax, like [A:B] or range(A,B), the inconsistency of bounds really breaks my brain, and having to pass a number larger than the largest array index as an upper index range bound gives me nervous ticks. I'd rather use inclusive bounds everywhere and have special syntax for empty set.
The handwriting skills were common back in the day... By the way, are you sure that it was Dijkstra himself who handprinted these texts (rather than have, say, his secretary do that)? Because this is not exactly the (beautiful) handwriting as it was taught back then.
He handwrote them, and occasionally typed them (especially early on, I gave up trying to find the transition point). It was something he was known for. If you read the EWD's over the years you'll see the same (or very similar) handwriting throughout, which would not be the case if he had a secretary writing them for him (who wouldn't have been the same person as he moved between countries).
If you ask people which floor of building they're on, it's going to depend on which country they're in. In North America, at least, the first floor you walk into (in a sane city: I understand there are some which do not qualify in this respect due to hills or historic disaster recovery) is the first floor. On other continents, you enter the ground floor and need to take stairs or an elevating device to get to the first floor.
A big international corporation (Siemens) had, at one point, standardized their buildings world wide to have room numbers starting with 2 on the ground floor, 3 on the floor above etc., leaving space for two basements.
All room numbers were 5 digits, and skipped numbers if there were multiple windows in one room, based on the theory that such a room might later be sub-divided, and wanting to avoid renumbering of rooms down the hallway.
(I encountered this around 2003, so might have well changed since then; didn't find any references only with a quick search).
I worked for them relatively recently and didn't encounter this; or at least never noticed. Then again, our office building was an open office plan where desks next to the windows and rooms (conference or office) were all in the center of each floor.
The ground floor is 0. The floor above 0 is 1. The floor below 0 is -1. Anything else is crazy TBH, the whole point of integer numbers is to count things that start at a defined point and go opposite ways. Why shift it and then reinvent weird pseudo-negative numbers like S1?
But then again people measure distance in feet and write dates as month day year :)
It would be nice if this was the case. In grad school I taught in a building that was on a hill and had once been separate buildings so different entrances were on different floor and the transition between the former separate buildings was 4 stairs.
So there was a floor with rooms numbered 1xx and 0xx. There was also a floor below the 0xx floor. They just numbered it 00xx. And you could enter the building on any of those floors depending on what door you picked.
The class I taught was in 0005. Room 005 was someone's office. So on the first day of every semester we would have students waiting outside the door of 005 instead of 0005...
Cool, Dwinelle at Berkeley is a fascinatingly bizarre building for seemingly very similar reasons, and (if I remember correctly) also with pretty much the same weird numbering system plus tiny staircases to adjust for mismatched heights between floors!
> If I build a building in the shape of a double helix, what order are the two floors in?
Why separate the floors that are on the same level?
> Hmm, are there actually an infinite number of floors?
I assumed there isn't and that we use a subset of integers. If you have infinite number of floors - feel free to use full set of integers (if you have enough memory that is).
> Do we need to switch to floating point?
No point, if you have more than aleph0 floors - floats won't help you.
We have a building at the local university that has ground at one end level with transitory step between floors -1 and 0, you enter the building through the stair well, and descend or ascend to get to a floor. I wouldn't be surprised if an architect designed a building that did this on flat ground, violating the ability to number floors relative to ground with integers.
There’s some confusion here about the exact word being used. The German word that is mistranslated here as “floor” is “Stock”, which actually means “addition”. So the floor above the ground floor is the first addition, not the “first floor”.
When I was at Leeds University the ground floor of the Physics Admin building was Level 6 (it went up to Level 11). It was built on a hill and according to legend this allowed room for expansion (which never happened) down the hill without requiring negative floor numbers.
To add to the fun the only continuous corridor through the entire T-shaped building was on Level 10, one end of which (the base of the T) was at ground level on its part of the hill. There was a famous 'kink' in the middle of the top / cross-bar of the T (apparently the longest corridor in Europe) because they started building from each end and were slightly off.
The Patterson Building at Acadia University in Wolfville, NS, has a basement and four floors. The rooms on the first floor are numbered 1XX, second floor 2XX, etc. up to the top floor 4XX.
The elevator, however, goes from floor '1' (the basement), to floor '5' (the top or fourth floor).
I've been living in North America for decades. Every building has G (zero). This is the floor that you walk into and has a lobby. The next floor up is 1 as such is labeled in the elevator.
So your initial assumption is incorrect from my experience.
This is not my experience in the US. Most places either have G or 1 for the first floor. Then 2 for the next. So some elevators say G, then 2, 3, and so on.
I've seen setups like you mention, but they definitely aren't the majority in places I've lived.
It really comes down to a choice between a machine-focused (0) or human-focused (1) approach.
The 0 makes a lot of sense in a C pointer world where memcpy and other alike functions can be written very thight.
The 1 makes a lot of sense in a human world, when we count, we start at 1, we talk about the "1st", counting on finger starts with 1, etc.
I once were at a Lua (1 indexed language) conference where this was discussed, and Luis started explaining why Lua was 1-index with this sentence: "The 1st argument ...." :)
> It really comes down to a choice between a machine-focused (0) or human-focused (1) approach.
Just think of 0-based as offset-based and 1-based as index-based. Both are intuitive just like that. I never get why people arguing over this bring pointers and memory (or anything computer related) to the table. No normal person is going to understand that, but everyone understands that if you don't move at all (0 offset) you stay at the first (index 1) item. Add one and ... you get the point.
"Offset-based" is bringing in pointers; that's the thing it's an offset "from". "The beginning of the array" is just a pointer.
I suppose saying that does have an advantage over explicitly talking about pointers, in that the word "pointer" is a piece of jargon that has a lot of baggage. That's just avoiding jargon, though, not really using a different model.
No, offset means "measuring from origin", pointers use that language they don't provide it.
The advantages in measuring from origin can accrue to the person choosing to do it, because there are other reasons to do so which aren't satisfying the CPU.
No, offset means "measuring from a defined location". In a computer, the choice of location is more or less arbitrary[0]. If you pick the first element, you get zero indexing. If you pick a space before the first element, you get one indexing. Either one is a pointer, since a pointer is just computer jargon for "a defined location".
Calling that defined location "origin" doesn't suddenly make it not a pointer.
[0]- NUMA and cache effects aside, as they don't matter for this purpose
Offset-based lets you refer to an abstract location that is after the last element without resorting to n + 1.
[ ] [ ] [ ] ... [ ]
0 1 2 3 n-1 n
We can regard this n as a "virtual zero", and then make it possible to index the n - 1 element also indexable as just -1. The index -n then aliases to 0.
In Google standard SQL, to access an array it is simply not allowed to put a number inside square brackets. You must specify which way you mean. So `SELECT some_numbers[OFFSET(1)], some_numbers[ORDINAL(1)]` is allowed but not `SELECT some_numbers[1]`.
that's actually good verbosity, because the intent is very clear using this method, and multiple people can read the code and unambiguously identify the intent without having to talk to the original author.
This type of verbosity makes sense in a big organization where the left hand doesn't talk much to the right hand much, so communication naturally evolves to happen at the code level.
I like for example using enum types to ensure that what’s actually passed is the expected value even though fundamentally the semantics do not need to be much more complicated than integer values. There could be the same thing with a distinction between offset and indices as two different integer numerical types to avoid any ambiguity.
But I think that still leaves open the question of "offset from what?"
If you move zero from the first element, you're still at the first element... but if you move zero from the fourth element you're still at the fourth element. If you move one from before the first element, you're at the first element.
I think you're more or less right, but I think we still need something to motivate the first element as the point of reference, and the machine focus is one way to do that.
In some languages/environments, the array is literally the same as the pointer to the first element.
Other languages are more sophisticated (like humans are!)
and can say that position 0 does not exist. An array of size 0 has no elements. An array of size 5 has elements 1at through 5th. An array of size N has 1st through Nth, inclusive.
"ordinal" is preferred to "index", as "index" doesn't naturally suggest starting at "1", or even restricting the key to integers. "Ordinal" starts from 1, everyone agrees (except mathematicians, of course :-) ).
Do people not study grammar in American schools? I thought all kids learn the distinction between cardinal (one, two, three) and ordinal (first, second, third) numbers.
At the age that cardinal and ordinal numbers are taught, kids simply don't remember "cardinal" and "ordinal", they remember "one, two, three" and "first, second, third".
99% of the instances I've seen "ordinal" outside of this thread has been in code/documentation. It is not a common word in everyday language.
The English grammar that Americans study in school is likely somewhat different than the English grammar that is taught outside of America as the goal of the latter is likely focused on helping students map their native language onto English. I would agree that `ordinal` is uncommon word for many Americans, but it wouldn't at all surprise me if there were languages where the equivalent word was far more common, and therefore its use and translation was a part of standard English as a second language curriculum.
> The English grammar that Americans study in school is likely somewhat different than the English grammar that is taught outside of America as the goal of the latter is likely focused on helping students map their native language onto English
English is spoken as a native language in many countries outside of America.
I disagree. The CS definition of array, sure, but if you were to say “we have an array of options to eat”, most people with a high school education would know what you mean.
And then the counting niceties come along: those perennial +1 mistakes are caused by the fact that, e.g. 4 and 5 are two numbers but they are only one apart.
> when we count, we start at 1, we talk about the "1st"
Although often with an implicit zero. Under typical North American culture, your 1st birthday, for example, is more accurately the first anniversary of your birthday. Your birth is zero indexed.
It's fundamentally a distinction between end-index and start-index.
Most human counting uses end-index. I.e. "1" is after 1-thing (has passed, is physically obtained, etc.).
Most computer counting uses start-index + length, for efficiency and to better generalize. I.e. "1" is at the memory address immediately before the "1"st item.
Which ultimately creates the "0 index is 1st thing" linguistic confusion.
PS: Also, language predates computers by a few years, and the concept of zero is hard.
i think a more accurate and complete idea is that humans refer to a thing in its entirety, with things being lined up and scanned in order as only a potential convenience
if you ask someone to identify an object, they'll point to the middle (or center of the most important component) of the object, not to the 'start' or 'end' of it in their field of vision..
that said, the human perspective is subtle and convenient in completely different ways to how a computer manages memory, and this 'end-index' idea seems like a useful way to map the human whole-object perspective to a linear memory-index perspective
They may point to the middle, but that's not a reference to half an object. ;)
The assumption is that the thing is its entirety, as you said.
Hypothetically, I imagine an array index reference, in human terms, would be communicated as "this thing starts here" or "this is the beginning of this thing."
Which isn't a concept or phrasing we have much occasion to use, other than for routes or long length-measured objects?
Birthdays are anniversaries. Anniversaries are annual activities when we celebrate/honor past events. They do not include the event itself. The first anniversary is 1 year after the initial event.
Yes, that’s the explanation for how we count them as we do, but it has no greater weight (and I’d argue less weight) as to why than saying whether a[0] or a[1] should be the first element in an array.
I say it has less weight as birthday is a compound word, the root words of which suggest that your first birthday could logically be the day of your birth rather than a year after it.
My first weddingday was not a year after I got married. My first graduationday was not a year after I graduated.
Indeed they are, which is why I literally said so in the previous comment. Did you forget to finish reading it?
> They do not include the event itself.
That is true because the event itself is implied information. There is no value in speaking of it. If you stand before us, we can be certain that you had a day of birth (your 0th anniversary). We don't necessarily know how many times you've gone around the sun following that, however, so that is where we find value in communicating additional information.
If you were counting apples, there is the state where you have no apples (index 0), the state where you have one apple (index 1), the state where you have two apples (index 2), etc. When counting you don't need to worry about index 0 because the no apple state is naturally implied. It only becomes interesting and worthy of communication when you have at least one apple to speak of, thus you start at 1. The state found at index 0 is still implicitly there, though.
If no apples is the unusual state, like you expected to find an apple in your lunchbox but someone ate it without your knowledge, then certainly there would be reason to communicate the no apples state. It is ignored in communication when it does not provide useful information, but it is not forgotten. The 0th index is implicitly there.
Although, there is a fifth state in your example: When you are not in class. Which is different to an empty set that implies nothingness. When it comes to age, 0 being birth works well because there is truly is nothingness (from your perspective) before birth. When counting from 1 there is suggestion that there are variables that aren't worth speaking of because they are obvious.
Assuming there is nothingness for the fetus the entire nine months in the womb. For that matter, I can't recall being younger than four, so it's all nothingness before then.
Not really. Perspective isn't scientific and often cultural. Koreans, for example, famously have a different take and use a different counting systems to accommodate.
I am assuming that the reader is biased towards the average HN user. That won't always work for everyone who will come across my comment, but close enough for an unpaid contribution. I'm not about to write a novel to make sure I catch every edge case.
That's why I prefer the Superior(TM) Korean age counting. You are one year old when you're born (it's your first year!). You are two year old on the next New Year's day. (Congratulations, it's your second your now!)
So, if you're born on December 31st, you're two years old the next day. (I see no problem, but apparently some people are hung up on such minor details. I can't fathom why.)
> You are one year old when you're born (it's your first year!)
That makes no sense to me. It's also your first century. Does that make you one century old? Of course not!
The moment you're born, you're not even one hour old, let alone one day.
They must like rounding up. I guess it depends how they refer to it in their language. If it's "he's in his 1st year", that's a big difference from, "he's 1 year old". I'm wondering if the parent comment simply doesn't know Korean or only knows it somewhat to misconstrue the culture behind this.
It could be though that Korea simply never encountered the Mayan or Arabic civilizations as most did encounter one of those two in history, who are famously known to have independently discovered the concept of zero.
Birth of an array can only logically be defined as index zero, the same as a child. The concept of zero was not universal globally and Korea is pretty isolated.
Except that we actually do use this system for years. Thus 1 BC (the first year BC) was followed by 1 AD (the first year AD). Also for centuries, as in 'the 20th century' being the years 1901 to 2000.
A more sensible English translation from Korean would be to use the phrase "in year X" rather than the phrase "X years old": a newborn is in year 1; after 12 months they are in year 2; etc.
In fact, this whole discussion is more about a choice of phrasing rather than the numbers. When indexing arrays, sometimes we're talking about an offset from the first element (starting with "0"), and sometimes we're talking about ordinal element numbers (starting with "first"). Some programming language designers found that offsets are more useful (because that choice tends to simplify the underlying arithmetic), while others found that ordinal numbers are more useful (because the word "first" should mean "1", to simplify communication between people).
There's nothing wrong with the definition per se, but there's the question of what purpose you're putting it to. We should expect more similarity from two "newly 2" children with the "western" system than the Korean one.
No. Your age is 1-indexed. It's a 'birthday' in English and German ('geburstag'); in French it's 'anniversaire', and so on. But pretty much everyone indexes age from 1. The fact of your birth is the transition from (legal) non-existence to existence, the equivalent of a dimensionless point.
Age is definitely 0-indexed - a newborn and exactly 1 year old differ by a year. People also count months during the first year, which is still 0 years old. As in 00:xx is the first hour, and 01:xx is the second. If you're 30 years old, it's your 31st year of life now.
But gregorian epoch itself is 1-based. 1AD (0001-01-01) goes right after 1BC (-0001-12-31). There was no 0000-mm-dd. That's why 3rd "millenium" and 21st century started at 2001-01-01 and not at 2000-01-01. YYYY means not how many whole years already passed, but which incomplete year goes right now. On the other hand, your age means "whole years passed since birth [plus maybe a few months]".
And the birthdays are definitely 1 indexed: you denote your first birthday with 1, and not with 0, and so on. You don't denote any birthdays of yours with 0. You may denote something else with it, but birthdays[0] gives an out of range exception. (Especially true in French, where birthdays are called anniversaire, but in English too.)
Oddly(?) though, most parents of young children don't refer to a baby as being "zero years old." Rather, we break them down into smaller units: two days old, six weeks old, four months old.
It helps that change happens particularly fast at that age, so the difference between newborn and 6 months is worth mentioning. Later on the units go up again and we just say "I'm in my 30s" :P
On the other hand a computer/car/house can also be 10 years old but not 0.
True, a few weeks can mean a few milestones at that age. It's still wild to me how quickly babies/kids develop. Every week brings new abilities, experiences, and emotions that weren't there before.
Exactly. What we call "new" is the 0th year. It's the same when you start counting seconds. You say "ok counting starts now", then wait a second and say "one", then wait and say "two". Before you say "one", you don't say zero, but it's implied by you waiting a second after you or someone says "go". That's still 0-indexed.
The moment you are born you are 0 years old (or perhaps 0.75 years old, but we don't usually recognize that). We count from zero in this case, at least implicitly.
In some cultures you are considered 1 the moment you are born, so the zero indexing isn't universal here, but typical in North America as noted earlier.
Typical counting of things starts from 0. If you count apples you implicitly start at zero and add 1 for each apple. If you count age, you start at birth (0) and count years; one for each birthday. That isn't zero based. The difference is the index of the item between the starting point and the next item. In zero based this item is number 0, in one based, this item is number 1. The first year of life is generally considered the year following birth.
This is a philosophically deep point which took me a long time to grasp.
There is a narrowing from "nothing at all" to "zero apples" which doesn't happen 'in the world' but is a necessary precondition to counting apples. The existence of any apple is a requirement for there to be zero of them before you put anything in the basket.
I just dealt with this problem(how to store periodic events(including birthdays)). It is a surprisingly difficult problem. After reading up on postgres interval types my latest attempt uses them to store events, where a per year event(like a birthday) is stored as "2 months 15 days". it turns out the postgres project has put quite a bit of thought into making interval types work as expected with regards to months, not an easy task when you consider how difficult it is to treat dates mathematically.
The nice part is that now February 29 "just works" the downside is the impedance between how months and days are numbered and a how an offset from the epoch(beginning of the year) is defined. January 3rd(1-3) is stored as "0 months 2 days" as when it hits, 0 months have passed and 2 days have passed.
So the specific case of February 29 hits when 1 month has passed and 28 days have passed. 3 years out of 4 this will be the same as "2 months 0 days"(3-1) but every forth year this will be(2-28). As an aside, and the specific reason I went with interval types, every event past feburary 29 works just fine with or without a leapyear, that is, the extra day in the middle does not mess up the offset to days after it.
Honestly I curse a little as I wish months and days were 0-based. At least clocks get this right, almost, 12 hour clocks are a special breed of stupid. start at 12, then go to 1 and proceed up to 11. 24 hour clocks properly start at 0. The worst part about 12 hour clocks is that it is almost correct, replace the 12 with a 0 and it every thing be the same but now it makes sense from a moduler math point of view.
You're just 1/4 of the age of everyone else born the same year. You have the same number of laps around the sun as those people, but you've definitely had fewer birthdays. It's a common joke for leap year babies.
The counting activity doesn't begin when the first item is registered; it begins when the counter is initialized to zero. A decision is made to begin counting, along with the realization that nothing has been counting yet. That's when counting has started. When the first item is seen, the counting is then continuing.
Suppose your job is to count some events. You check in for work at 8:00 a.m., but the first event has not registered until noon. By your logic you should not be paid for four hours, because you're paid to count, and counting started at 1.
I don't think your definition is complete. We can count a set by mapping its elements to the natural numbers, and then identifying that number which is highest. However, we must have a provision for identifying zero as the highest when the set and mapping are empty.
Do you know the definition of countable? A set S is countable if there is a one-to-one mapping from S to N where N is the natural numbers. Do you know that 0 is not a member of the natural numbers? We literally start counting at 1 by definition of countable.
> Equivalently, a set S is countable if there exists an injective function f : S → N from S to N; it simply means that every element in S corresponds to a different element in N.
Defining N is usually done via a successor set, on which case 0 makes no sense to include.
Standard construction of ordinals is that each ordinal is the set of all its predecessors. (0 has no predecessors , hence 0 is the empty set.) (And so finite ordinals have the same ordinaliity as cardinality).
Birthday[0] gives you an out of range exception, since there is no birthday called 0th. Birthdays are [first,second,..] indexed from 1: brithday[1] = first, birthday[2] = second, and so on. That's what indexing a series from 1 means.
Ok, use the baker and cup as an example. If you have an empty cup and put half of a cup of flour in it, you now have 0.5 cups of flour. Notice the zero before the ".5". That is us, normal humans, realizing that until you add enough to have 1 of something, you have between 0 and 0.999 repeating.
No, it just means we have a different notion of what a year is in regards to age. Similar to how different cultures can use different units of measurement for length, mass, etc.
Yes, exactly. One of which carries an implicit zero indexing. The date of birth doesn't disappear just because you decided to use a different measuring device.
No, it doesn’t carry an implicit 0 index. Measuring age (in the West) is like measuring distance. You start at 0, but that doesn’t mean the first item is at index 0.
If you took a standard ruler, a measurer of distance and which has literal index marks painted on it, and placed items along it, the item found at the head of the ruler would be found at the 0th index. You're quite right that we think of birth and its anniversaries in the same way.
No, old calendar systems are one indexed. In those system, there is literally no year zero; the first year is year one. This leads to crazy things like year 100 being part of the "first century" and year 101 being part of the "second century".
That is not the case with age or birthdays which are, thankfully, zero indexed. The first year of human life is age=0, birthdays=0.
If you don't care then sure, it's not crazy, but if you did care then believe me, it is crazy. Crazy enough that astronomers[0] and software engineers[1] rebelled against the historian's practice and renamed the years preceding 1 AD in the proleptic gregorian calendar year 0, year -1, year -2, et cetera. A major benefit of this is it allows the leap year pattern to stay consistent and the rule to remain legibile for all years, going back before 1 AD. It's also nice because it lets us say "the 90's were the last ten years of the 20th century" and be correct.
Haha, this thing is messed up. Your "first birthday" is technically second, because the first was at your, well, day of birth. But we people love to complicate things and count 1st birthday as a number of annual celebration events after the first mm-dd of birth. Off by one as it is.
Elapsed time timers start at 0. Time is continuous. The elapsed time being “out of the womb”, that we call “age”, starts near 0. Five minutes after birth, the baby has been in the world for, or it’s age is, five minutes. If someone asked how old a new baby is you could hear “3 weeks”.
Another definition might be from conception. But birth being “one year old” is illogical. The sperm didn’t even exist yet, one year before birth.
Your mistake is using years - use milliseconds (or nanoseconds) when you wish to express a duration. Years don't even have the same duration/length (leap years, and leap seconds)
And before an hour has even passed, we'd probably say "minutes old" or perhaps "not even an hour/day old", all to avoid saying "zero days/months/years old". But some might still say zero days old, and they'd be both understood and correct (at least logically if not stylistically). That's the "implicit zero" everybody is talking about. We avoid saying it, but that's just a convention of communcation. Logically it's there. You're zero days old before you're one day old.
TBH humans would have been better of if we were 0-based, it's just a convention. And we have the confusing language where "20th century" means 1900's. If we wanted to bring the 1-indexing we use in language to the fullest extent here to fix that particular issue, time counting would have to start at 1111. Except that won't work once reaching 5-digit years.
If we would start with "zeroeth" instead of "1st", then this would have solved itself and 20th century would mean 20xx's.
I especially don't understand why mathematicians use 1-based indexing (for matrix rows/columns etc...). Like programmers, they should see the advantages of starting at 0 (e.g. the coordinates of subdividing into block matrices are simpler if starting at 0). Mathematicians do start at 0 for the origin of plots, after all.
> If we wanted to bring the 1-indexing we use in language to the fullest extent here to fix that particular issue, time counting would have to start at 1111.
> The 1 makes a lot of sense in a human world, when we count, we start at 1, we talk about the "1st", counting on finger starts with 1, etc.
It is more like counting from 1 is just a leftover from times where zero was not commonly considered as number. Once one have zero, it makes sense to use it as an initial ordinal (see e.g. set theory, where zero is both initial ordinal and initial cardinal number, way before computers).
Another example is time and date, we start counting of days from 1, but counting of hours (at least in 24-hour notation) and minutes from 0.
> Luis started explaining why Lua was 1-index with this sentence: "The 1st argument ...."
Note that for spoken language, it is "The first argument ..." and 'first' is etymologically unrelated to 'one', but related to 'foremost', 'front', so it make sense to use 'first' for the initial item in the sequence even when using counting from 0.
When talking about discrete things, 0 has a specific meaning: it is the absence of things. It does not make sense to count the 'first' element as the 0th. When you encounter the 'first' element, how many elements do you have? 1.
This is of course different for continuous quantities. When counting seconds, for example, we should absolutely start from 0.
In Western music theory, intervals are one based. No pitch change is "unison"; one diatonic step is a "major second" and so on. As a result of this silly state of affairs, an octave occurs every 7 notes, even though the root "oct" means eight. Furthermore, a "rule of nines" is needed to invert an interval: e.g. inversion of minor 3rd is a major 6th (exchange major/minor, subtract from 9).
True, and it's a great example of how this whole drama is about a practical trade-off, not about a unique Right Answer. If you play piano, the second is the second finger; the fifth is the fifth finger, and it all makes sense. No problem. On the other hand trying to actually count that way (two thirds make a fifth and so forth) is maddening.
I don't think 1-based arrays are better notation, even completely ignoring that code has to run on a machine.
99% of the times, the correct approach is to use iterators. When you really need indices (and you almost never do), 0 is more practical, because it matches the "including start, not including end" convention.
For instance, someone who makes a fist with zero extended fingers before counting.
Or someone who simply becomes motivated to count something, without making any utterances or gestures to that effect. The motivation is followed by the persistent awareness that nothing has been counted yet, which then changes to 1.
Not saying "zero" doesn't mean we don't mean it. All counting starts with 0. We just say "one" out loud as the first number. We probably should say "zero" too, but why say a word when you can say no words?
Or we just start counting with 1, since 1 is the first number we start with. Do small kids start with a concept of zero and then adding one to it, or do they just start at one?
If you start at 1, how do you answer the question "how many apples are in the basket", when the basket is empty?
Or do you believe that the answer "none" or "zero" is then given without the activity of counting having taken place?
What do we call the meta-activity then: the procedure that results either in the "empty" answer or "one", "two"? Whatever that activity is called, it starts with a concept of zero. Let's call that activity "quanting". Quanting starts with a motivation to enumerate items, and an initially empty result. When no items are present, quanting terminates, reporting that zero/nothing/none result. Otherwise quanting branches into a subprocedure called counting, and that begins at 1.
Humans could do basic counting prior to the concept of zero. Obviously kids or anyone prior to zero would say there are no apples in the basket, but if you were to ask an ancient Greek philosopher if that meant "no apples" is something worthy of being denoted, they might think you're doing sophistry and trying to elevate nothing to something.
A smart ass kid might reply there are zero oranges in the basket, or zero miniature unicorns. Since the basket is empty, it could have potentially had anything if we're just going to imagine things in baskets. But we don't enumerate over all possible zero items in the basket. And anyway, the basket isn't really empty since it has N air molecules, N fibers or whatever.
The pedantic point I'm making is that counting at zero is a convention we developed for mathematical reasoning when appropriate, but not a starting place for counting things in everyday language.
If an ancient Greek philosopher had three apples in his basket, and I took them away while he wasn't looking, oh, he would definitely find "no apples" something worthy of being denoted.
If I ask you to count the number of red balls in a bag with only 3 yellow balls, then the initial count in your head is 0, you inspect the balls one by one, never encountering a red ball, and thus never incrementing the count.
And then you pronounce your final count of 0.
So that's counting starting from 0.
What you call "starting at 1" is not so much the start as it is the first increment, which need not arise.
TBH humans would have been better of if we were 0-based, it's just a convention. And we have the confusing language where "20th century" means 1900's. If we wanted to bring the 1-indexing we use in language to the fullest extent here to fix that particular issue, time counting would have to start at 1111. Except that won't work once reaching 5-digit years.
If we would start with "zeroeth" instead of "1st", then this would have solved itself and 20th century would mean 20xx's.
I have used 1 based in basic/pascal and 0 based (well in pretty much everything), 0 based effectively no off-by one mistakes. 1 based - common. Most idioms - incl. forward and reverse iterations work better, and are easier to remember with inclusive/exclusive pattern
Other than that - binary AND and power of 2 sized arrays are the backbone of any hashtable. Overall modulus (binary AND) is actually useful.
It has nothing to do with pointers and everything to do with basic computer arithmetic. You can constrain the range of an integer with a bitwise AND operation providing a cheap modulus by power of two. In this regime, zero-based indexing is the natural result. You have to make an adjustment for 1-based. There are whole host of other operations that are simpler with 0-based indexing.
The problem it that most people, even programmers, don't understand how computer arithmetic works and have fantasies of mathematical number lines that the hardware only partially simulates. You see this consistently in the post-Java crowd who think that unsigned integers are some sort of unholy aberration because the languages they've grown up went further to maintain the fictional number line semantics.
Human makes a lot of inconsistent thing: We usually think the 1st floor, and the basement as 1st underground floor ( -1), but the floor jumps from 1 to -1!
Also, the time jump from 11AM to 12PM to 1PM!
So I think more human friendly sometimes means more confusing.
That's true. But as for floor numbering, specifically, it depends on the country: some places have a "ground floor" between the 1st underground floor and the 1st "above ground" floor.
It's like English pronunciation, there's little logic. I gave up finding logic in these things at an early age, and resorted to memorizing everything instead.
Doing the reversal test, if programming languages had all been 1-indexed, then I doubt we'd hear much from people, in 2022, saying "I think the first element should be 0, and the second 1".
> The 1 makes a lot of sense in a human world, when we count, we start at 1
As a kid I learned "one one-thousand, two one-thousand, three one-thousand" when counting time out loud, but at some point I realized this was incorrect. The prefix is the start of the nth second but it isn't complete yet, so for example stopping in the middle of saying "two one-thousand" you actually haven't reached two seconds yet.
My fix was to move the prefix to the end, so I say "one-thousand one, one-thousand two, one-thousand three".
In retrospect maybe I should have used "zero one-thousand, one one-thousand, two one-thousand".
I don't think it's necessary incorrect to start at one there. If someone asks you to count out 3 seconds, you're going to say "one one-thousand, two one-thousand, three one-thousand" and only at the end of the "three one-thousand" will you have considered the 3 seconds to have actually elapsed. Basically you're already accounting for the time it's taking you to say it. Which to me seems better because if you do it the other way because it gives a better heads up as to when that second has been reached.
Ah, yes. Cardinal numbers and ordinal numbers both end in the word "numbers" so they're the same thing. No need to distinguish between having three apples and having the third apple.
And if someone did say that to me, I would pass them no apple, since that's how we use everyday language. And if I knew they were a programmer, I'd first ask them in which programming language they'd wish me to pass the apple.
Yes, but "human-focused" 1-based indexing comes at a cost since at the end of the day the CPU has to add (index * element-size) to the array base address to get the address of the indexed element. With a 1-based index there's additional overhead in either having to subtract 1 from the index or to have a wasted "element 0" to avoid the need to do that.
That's not possible - the subtract and multiply need to be consecutive (adjust index before multiply by element size), so even if it was a single instruction it would still take longer than a multiply that didn't have to wait for a preceding subtraction.
The only way to avoid the speed penalty would be either to have a wasted element at offset 0, or to maintain the array base address as (address - (1 * element-size)) to avoid having to subtract 1 from the index when accessing. In the latter case for dynamically allocated arrays the code would still have to do a subtraction to adjust the pointer returned by the memory allocator, but at least that would be a 1-time penalty rather than per-element-access.
Of course this is supposing a high level language where an array is abstraction, not one explicity aliased to a chunk of memory such as C where an array and a pointer to it's first element are interchangeable.
That goes against my intuition. Multiplication in hardware to this day relies on addition. Is one adder going to add an extra cycle? Or would that time be amortized? Take a look at slides 45-46 here. https://acg.cis.upenn.edu/milom/cis371-Spring08/lectures/04_...
Do you know the answer to that question? (I don't, but if someone does, it will settle this issue).
I don't know, but looking at that 3-input add makes me think you may be right and the extra addition/subtraction could perhaps be combined into the multiplication.
OTOH, for arrays who's contents are size 2^n (char, short, int, long) I'm sure the generated code isn't using multiply in the first place.
Anyways, an optimizing compiler could certainly remove much of any overhead added by 1-based indexing .. for an array access in a for loop it could, if necessary, calculate the "base-1" address once at start of loop.
Personally, having grown up with assembler and C, and still using C++ today, I'm quite happy with 0-based.
There are a lot of cases where a zero index makes sense. In physics and math there are cases where you start the index at 0 and others where you start at 1. And even some where you start at -1. Which you use depends on what is most convenient for the problem at hand.
A particularly topographically challenged building in Paris has exits to two enclosing streets end up on different floors. The elevator is numbered -2, -1, rez-de-rue [exit north], rez-de-chaussée [exit south], 1, 2, ... .
Don't know French, but I wonder if the meaning of "etage" is similar to Polish "piętro", which literaly means something like "elevation". So, basically in Polish we have a specific word for ground level, and then we count how much elevated above the the ground the current level is. That's why "1st floor" is the one "elevated one level above the ground".
In English you count "floors" and floor is a usable, hard surface on which you can put something, like a chair. That's why a floor on the ground level is treated the same as the floor above it - it is equally good on accommodating chairs, beds, and other stuff.
Etymologically, étage comes from the Greek στέγω (and gave the English word “stage”); it is a typically wooden cover. Since the first floor was often instead a continuation of the outside road (way back!), it was not considered a “stage”.
Yeah, most or all of Europe does that. In English it is called ground floor, first floor, etc. In Swedish the zero numbering makes total sense becasuse instead of numbering the floors we say ground floor, "1 stair", "2 stairs", etc.
That depends on your language and culture, "floor" is not easily translated, some languages have a word describing all the layers added to the base layer, so "1. sal" (Danish as an example) actually means "the first layer added on top of the base house".
Again, this is more a spoken language/culture thing, and this goes back to what premises we use to communicate with the machines, ours or the machines...
No elevator I've seen has yet taken a cue from UI design: simply put the buttons within an outline of the building, along with the local numbering scheme.
No more visits to the serial killer lurking in the basement.
The players in this market evidently have been operating at T'ump levels of intelligence. /s
There were a number of systems oriented languages that, while algol-like in syntax, used zero-based array indexing, predating C. BCPL was intended to be a generic systems oriented language, but Burroughs had ESPOL for their mainframe architecture, and HP wrote their operating systems and compilers in SPL, which was specific to their stack-oriented HP3000 architecture. All these used zero-based indexing, because for low-level, memory conscious manipulations, it's much more natural than one-based. But nothing is universal, and IBM's systems oriented languages (BPL, PL/S, and PLS/II) were more an evolution of IBM assembler and then Wirth and PASCAL, and used one-based array indexing.
Personally, I've always found zero-based more intuitive and satisfying, absent arbitrary indexing (in which case I'd chose zero-based for most purposes where there wasn't a compelling solution based reason to use a different index base). But then spent years writing SPL ... I
On a cursory look, I'd say one based is more intuitive, for the sole fact that the a[len(a)] is the last element. Actually I remember there was a small time when I started programming in assembly it took me a while to get used to reasoning about the end of the array.
much of intuition is made, not innate, at least for things as abstract and arbitrary as languages and notation. If you come at this question from a point of view where an array is a block of memory words indexed by a pointer - which is pretty much how system programmers back in the day when I did such things, 50 years ago - then zero based seems natural and obvious. The pointer to the array is the array, and the first element in the array is the pointer plus zero, the second, the pointer plus one, and so on. Once you start thinking that way. For multi-dimensions, the arithmetic just falls out as well.
But I don't think the fact that many of us developed that intuition, makes it inherently better. It just makes it more intuitive for those of us who developed that inuition. I've written oodles of codes with both paradigms. Either works. Zero is just more natural, to me.
Unusually for me, I read TFA first. Immediately thought "this is all wrong", then realized 400 commenters had posted the same thing already.
I have Martin Richards book on my shelf, and for a while shared an office with his coworker. I'm also one of the few folks here who has coded in BCPL.
Ok, so that said, this article is obvious nonsense. Zero indexing comes from the CPU itself. It's how you generate an indirect address by adding the offset. Many CPUs have this in hardware. They don't support non-zero offsets. The reason some languages have zero-based indexes is that their developers had the mind set of doing the same thing you do when writing machine code. They probably also wanted to directly interface between their language and libraries written in assembler (and the kernel was typically written in assembler too). Also they wanted to be efficient. Any offset other than zero is not efficient because you can't use the hardware addressing modes.
So then why do some languages _not_ have zero based indices? The obvious answer it that Mathematics by convention always used 1 as the base (or some arbitrary value). Many languages were conceived by mathematicians or people with a strong mathematical background.
I've discussed this a lot in real world in a different field: apartment floors. In Japan (where I live) they are 1-indexed, where the floor on the ground is number 1, while in Spain (where I am from) they are 0-indexed, where the floor on the ground is number 0.
Both have inconsistencies, like in Spain you might have a "middle ground" (entresuelo) which is neither 0 nor 1, but sits between, and is normally commercial or non-livable, reserving the 1 to the first floor where people live. I like that system better though because it usually goes 1 => 0 => -1 (underground), while in Japan you go 2 => 1 => -1, so I feel like it's missing a floor. You could justify it though as 0 being the ground line, so +1 is "the first above the floor and -1 is "the first below the floor", but I still prefer having each floor to be a natural consecutive number.
Arrays in Ada start at the index based on the index type of the array. You can even use an enumeration type as the index:
type Day is (Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday);
type Hours is array (Day range <>) of Natural;
V : Hours (Monday .. Friday);
Which index type you should use depends on the problem that you're trying to model. You can find out the lower/higher end of a type/variable with the 'First and 'Last attributes.
IIRC there's an RFC though to force the lower end of the index to a certain value like 1 or any other number/enum.
Wow, I never really looked at Ada code. Been using VHDL for the past 2 years a lot and when I looked at your code I was like: 'huh strange, this looks a lot like VHDL'.
Turns out both were invented by the DoD.
>Due to the Department of Defense requiring as much of the syntax as possible to be based on Ada, in order to avoid re-inventing concepts that had already been thoroughly tested in the development of Ada,[citation needed] VHDL borrows heavily from the Ada programming language in both concept and syntax. - https://en.wikipedia.org/wiki/VHDL
Maybe I should pick up Ada soon. That could be a fun journey! (I really love writing VHDL)
This comes up when giving fractional coordinates into a grid. Is (1,1) the upper-left corner of the upper-left grid box, the center, or the lower-right (or the center of the next grid box)? Alternately, is the upper-left corner of the upper-left grid box (-0.5, -0.5), (0.0, 0.0), (0.5, 0.5), or (1.0, 1.0). I've seen all four conventions used in different places, depending on whether the grid boxes themselves are numbered starting at 0 or 1, and when extended to fractional if it makes more sense to have the corner or the center of the grid box take the value of the box itself.
Worst than 0 and 1 is to have the choice. In Visual Basic 6 you have the option between the base 0 or 1 for each module. It make debugging a nightmare and you can create bug just by copy/pasting code from one program to other with a different base index.
You also have the choice in Perl, because of course, it is Perl.
The old way of doing it was setting the $[ variable to 1. You can actually set it to any value, in case you prefer to start your arrays at 42. It is now deprecated but you now have Array::Base that offers similar functionality.
Use it if you absolutely hate the person who will read your code, as if having someone read Perl code wasn't hateful enough.
That comes from C’s standard library (and presumably from somewhere else before that). Classic bad design, took a very long time for people to figure out it was unhelpful and dangerous.
Convenience. Look at how the days in the months are stored and accessed. Using 1-based months would introduce an extra calculation (-1) on all searches or an unused value in the 0-index. Also look at how printing is handled for weekday and month names. They, again, take advantage of 0-based indexing.
Day and year are already represented as numbers, so it's natural to keep them as the "correct" (conventional) number as used by most people. Since the months aren't being stored as strings but as an index, this saves them from having useless data (entries in 0) or doing an extra calculation.
Subtracting 1 from the user provided index isn't that expensive.
Or, if you're willing to give up three bytes, index into "ErrSunMonTueWedThuFriSat".
Or, and this is mildly insane but perhaps in keeping with early C, have your pointer be three bytes before the beginning of "SunMonTueWedThuFriSat" and use one-based indexing.
It all boils down to the fact that C's memory model is meant to map closely to the physical memory model, and the C and Unix developers preferred simplicity of implementation over simplicity of interface. Leaky abstractions (like that the weekday or months start at 0 so that they can reference into arrays of names) are fine given that prioritization. Not everyone shares that preference.
Dijkstra's answer (linked in the article) is best:
"When dealing with a sequence of length N, the elements of which we wish to distinguish by subscript, the next vexing question is what subscript value to assign to its starting element. Adhering to convention a) yields, when starting with subscript 1, the subscript range 1 ≤ i < N+1; starting with 0, however, gives the nicer range 0 ≤ i < N. So let us let our ordinals start at zero: an element's ordinal (subscript) equals the number of elements preceding it in the sequence. And the moral of the story is that we had better regard —after all those centuries!— zero as a most natural number."
Closed ranges (with both ends inclusive) are super annoying to work with. You can represent the empty range (unless you are willing to do [i : i-1]), and they don't compose like half open ones: [a : b) + [b : c) = [a : c).
For some reason I am thinking in the terms of closed ranges if not specified otherwise/doing it for myself. [i:i-1] is how i think of empty ranges, and [i:i] if I want to capture the element i with a range. Also [a:b] + [b+1:c] corresponds more what I want, than [a:b) + [b:c).
I guess the majority of the people are not like this, but arguments like "just look at it how strange it looks" don't do it for me, because it looks natural, and the other way looks complicated.
I'm curious if you would also accept [i:i-2] as a an empty list, or in general anything where the right side is smaller than the left?
If I am working with closed ranges, [i:i-1] looks like the list [i, i-1]. Like [5:2] would be [5, 4, 3, 2].
With [b+1:c] I would feel like I needed to insert a check to ensure b+1 <= c. With the closed ranges the invariant "left is <= right" is maintained automatically. Though I guess it doesn't matter so much if you accept any list with left > right as the empty list.
The issues with compositionality become even more noticeable with floats. Then you would need [a:b] + [b + minimum_float : c], or something like that.
Then the delta of both bounds (N – 1) wouldn’t equal the length of the range (N). The inclusive-exclusive convention is used in order for `end = start + length` to hold.
Why does that matter when the starting point is 1? In that case, you don't need the delta because you have the length already. In the case of a 0-based range you also don't need the delta, though you will want to use the inclusive-exclusive convention so that you get the length "for free".
1 <= a <= N -- N = length, no delta
0 <= a < N -- N = length, no delta
You only calculate the length when dealing with other than 0- or 1-based ranges. There, the inclusive-exclusive convention is very handy as you point out. So if we're fixing the initial offset at either 0 or 1, then use the appropriate convention for that offset. If we let the initial offset float, then the inclusive-exclusive makes sense.
It matters whenever you need to process a proper subrange of an array, and that shouldn’t be different from when the subrange happens to be the whole range. It’s simpler if the same convention is used in all cases.
E.g. in Java, a typical example is that OutputStream has the following two methods, where the first can delegate to the second, and the second (which write the specified subrange of the array) can easily calculate the number of bytes to write:
int write(byte[] bytes)
{
return write(bytes, 0, bytes.length);
}
int write(byte[] bytes, int start, int end)
{
int length = end - start;
...
}
> If we let the initial offset float, then the inclusive-exclusive makes sense.
But you replied to someone using inclusive-inclusive for 1-based arrays and complained about the delta not matching the length. Which is a nonsensical complaint, you have the length already why would you need to calculate anything?
In case of the subrange you don’t have the length (or you conversely have the length but not the upper bound), and you don’t want to use different conventions based on whether you process the subrange or the whole range. I’m not sure how I can make it clearer.
The mental model of being an offset to a memory address is, I think, part of it.
I'd be surprised if the use of zero vs one actually made any difference, from a number of operations point of view -- from the compiler's point of view, the first element in the array is the first element in the array, no matter what we call it.
I mean in an extreme edge case maybe if you are computing an index, and it happens to be zero, then in your math maybe some identity related to zero could be exploited (go to element simple_integer*complicated_function(), where simple_number might be zero) but that seems a bit silly.
How would it not make a difference? If you calculate an index at runtime, to get access to the element, in 0 based would be pointer + index. In 1 based it is however pointer + index - 1 clearly there is an extra subtraction there?
In x86 you could probably hide it in the addressing but that does not mean it does not to be computed
One option, at least, would be for malloc and friends to return pointer-1, rather than pointer. That said, I'm sure a compilers expert could come up with a much better option.
This is primarily the actual answer. While most languages force you to see the array as an array, in C/C++ (in C++ I'm only referring to the memory allocated variable type, not an "array class") you can see it as a pointer and do your own math to address any part of it that you want. And the calculation for that memory address is idx * sizeof(the thing in your array). So the real answer here is that array semantics needed to match memory address semantics in the languages that provide that.
3. Less buggy for small systems since the Base Pointer address points directly to a record.
4. Macro expansion is often used to store indexes and other 'magic numbers' in Assembler, and this pattern originated either when Assembly was the primary programming language, or even earlier when humans directly punched out cards with machine instructions. Compilers in any remote sense of the luxury we have today did not exist or were not common.
I'm fine with 0-based, 1-based or anything-based arrays (I recall it being convenient solving 8queen with pascal), but for Rage-Over-A-Lost-Penny sake, music note intervals are always beyond my understanding.
Same pitched notes are called "interval 1" and there goes thirds, fifths, sevenths... All off-by-one in my base-offset-addressing mind... And then major vs. minor which creates all kinds of "aliased addresses"...
I'd really love a BASE-12 floating number representation. Like 4.00 for the middle C; Chords can be then represented by a tuple of such numbers -- major = [+0.04, +0.07] (some sequencers already do something like that and I'm far better at reading that kind of sequencer data than a sheet)
You do understand that besides thirds, fifths and sevenths, there really are seconds, fourths, sixths, ninths (same as second), elevenths (same as fourth) etc... as well, right? There even are intervals that are not named after a number e.g. the "tritone". The reason the 2nd and the 3rd note in a chord are called third and fifth is because usually chords are made with these intervals instead of dissonant intervals like seconds or fourths. It seems pretty clear you'd already know these things, so can you explain what's your issue with music note intervals?
It's that off-by-one nature of intervals that always bumps me.
The difference between note a and b is (a-b+1).
Calling an octave "an octave" feels to me like calling a numeric system with digits 0x0-0xf as "base 17"
By your logic, an octave/unison would be "seventh"/"zeroth"? (Note that "octave" literally means "eighth" in Latin.) I would think that could work too but the established terminology is not inconsistent. It's just 1-indexed. A lot of things are 1-indexed, in fact I think almost everything in spoken English is 1-indexed. A lot of mathematics, like number theory, is also 1-indexed. I think software engineers are a little too obsessed with 0-indexed things. I understand that things not being standard is annoying to us but pretending like this somehow makes music terminology broken is going too far.
I don't see how 0x0-0xf could be called base 17. 0xf is 15. Did you mean base 15? I think if mathematics terminology developed differently it could be called base 15. The same way binary is 0 and 1 but we call it base 2 because there are 2 digits, but we could totally call it base 1 too, who cares, it's all convention.
I found it interesting that the author phrased their critique so definitely, i.e. "it's NOT this and NOT that", yet concluded the article by saying their whole argument was speculative. That said, I do agree with their premise that it's going to be pretty tough to get a definitive answer as to why 0-indexing won out. And their criticism of others being so ready to declare results is apt, if not a little ironic in this case :)
> yet concluded the article by saying their whole argument was speculative
It's possible to have a speculative argument and also refute other arguments that claim certainty. Pointing out another answer as being wrong does not mean one needs to know the correct answer or claim to have a precise answer.
When he talked about BASIC having 1-based arrays, I had to check my memory against online AppleSoft BASIC documentation and indeed, AppleSoft, at least did have 0-based arrays.¹ I also had to check on Pascal which I haven’t written anything significant in for some 20 years and it turns out that by default, Pascal arrays are 1-based, but most of the code that I worked with (which tended to have its roots in Knuth-written Pascal) explicitly specified the range of indices.
⸻
1. As did all the other contemporaneous BASICs that I encountered. For added fun, when creating an array with DIM, you gave the highest index and not the number of elements so, e.g., DIM A(20) created a 21-element array. The 1980 Apple ][ manual I found which discusses both Integer BASIC and AppleSoft doesn’t admit that 0 is a valid index for an array, but elsewhere I saw it indicated as such which leads me to suspect that Integer BASIC has 1-based arrays.
Modern Basic (VBA etc.) still supports 1 based arrays, and there's even Option Base (0|1). This is a global directive for all arrays declared in the same file. And individual arrays can be declared with any starting index you like.
I may think that counting 0 as a natural number makes a lot of math more elegant, but clearly I’m just too dumb to rise to the level of wrong.
fruits = ['apple'] \\ Hello I have fruits
x = len(fruits) \\ Only one kind tho lol
fruits[x] \\ OK here is what I have
SUBSCRIPT OUT OF RANGE
I have always hated zero indexing for this reason (other than thinking in assembler where it is beautiful). It is of course useful in many contexts, albeit with tradeoffs; more importantly everyone is used to it and it's predictable. But I was absolutely thrilled to discover Julia defaults to indexing from 1 like a normal person.
Essentially what we have here is a mismatch between the tool (computer that does everything in binary) and the task (mathematical abstraction of quantities). Now it's completely understandable that people work within the limitations of the tool when there is no other choice, but that's the same sort of path dependency that creates technical debt.
If zero indexing were so great, mathematicians would have made it the default centuries if not millennia ago; but mathematicians don't want to do off-by-1 adjustments on all sorts of common operations because it makes things unnecessarily complicated. To be honest, I think this has become a moat to keep people out of programming even though it's a shallow one.
It could be interesting to test this, say by taking two classes of schoolchildren and teaching one Julia and the other Python (or...). By not having to take on the idea of zero-indexing at the same time as the concept of an array, the Julia group can get into collections using their intuitive understanding of the natural numbers. I expect that picking up language elements quickly will have a compounding effect and that at the end of the evaluation period the Julia group will be able to make significantly more complex programs than a control group.
> If zero indexing were so great, mathematicians would have made it the default centuries if not millennia ago
They did. For any context where the index contributes to the math[1] (i.e. is not simply a label convention like "the x (first) component") the index always start at zero. Polynomials, Fourier series, geometric series, Bessel functions (both their order and their series expansions), etc.
[1]: Except, of course, when a divergent 1/0 term shows up.
Also, considering that Python uses -1 for indexing the last element, instead of -0, which means the last element is not an offset when using negative indices.
Funny, just a few hours ago I asked myself the related question "Should indices start with 0 or 1?" (not for the first time). I pretty much switch my opinion as many times as I think about it.
The first answer nicely retells the dijkstra argument: integer ranges should be described using half open intervals, and [n, m) is nicer than (n, m]. Furthermore, [0, n) is nicer than [1, n+1), and that's that.
The second answer makes the observation that there is a difference between indexing and counting, and that even in daily life often the first element is indexed by 0.
Still, I rather like indexing the first element of a sequence with 1.
> Was C ever much more than a veneer on top of assembler?
Maybe in the very early days, but that necessarily had to end as soon as compilers started doing non-trivial optimizations. Today, C is very far from being "portable assembly".
The crucial fact is that if your language supports 0-indexing, you can do that if you like. OR, you can pretend it has 1-indexing, and do that instead.
If your language doesn't support zero-indexing, then you are stuck with 1-indexing, and that's that.
So, zero-indexing is the natural thing to put in a language, as it accommodates everybody.
"It's an offset from the beginning. The first element is 0 slots from the beginning." is what I was taught.
(Of course, it's more complicated than that, with pointer math etc etc, but that works as a general gist)
In the past, I've written roughly half a million lines of Lua. Arrays don't always start at 0. I've regularly shaken my fist at Lua and cursed it's wicked ways, but it's really damn useful in a lot of contexts.
Because zero is a concept that can mean "at the beginning"
The symbol changed over time as positional notation (for which zero was crucial), made its way to the Babylonian empire and from there to India, via the Greeks (in whose own culture zero made a late and only occasional appearance; the Romans had no trace of it at all)
Another meaning is "nothing" but we're talking about position, not nothingness
The mathematical zero and the philosophical notion of nothingness are related but are not the same. Nothingness plays a central role very early on in Indian thought (there called sunya), and we find speculation in virtually all cosmogonical myths about what must have preceded the world's creation. So in the Bible's book of Genesis (1:2): "And the earth was without form, and void."
I find 0-based indexing easier to work with when working with (dynamically sized) multi-dimensional arrays.
For example, with 0-based indexing getting an element at (x, y, z) can be done with
a[z * h * w + y * w + x]
But with 1-based indexing you need to substract one first from each component except the last:
The 1900's were the twentieth century. Except the year 1900 specifically was still the nineteenth century; the twentieth century started in 1901 and ended at the end of 2000, much to the chagrin of all the chumps who celebrated the millennium a year early.
You know what would make all this confusion go away? Zero-based indexing.
In English, the day of your birth, when you're 0, is also your birthday. So, technically speaking, when you're turning 29, you're having your 30th birthday, even though no-one says it like that. I always assumed 0-based indexing was something like that, similar to the example given about the ground floor being the 0th floor in some countries and the first floor in others.
Another example is if I'm getting directions from someone, and they say, "Walk 3 blocks that way", in my mind I conceptualize that I'm presently on the 0th block and I don't yet count it.
I don't think every day people disagree with these examples so much (well, maybe the birthday one, but that's because we use the same word for the literal day as well as anniversary), and they're all fairly intuitive. I never realized that 0-based indexing required such a deep background to be justified.
> In English, the day of your birth, when you're 0, is also your birthday.
That depends on whether you think of "birthday" as synonymous with "the actual day of my birth" or "a celebration of the day of my birth". I think most of us actual consider it the latter, which is why the first one is after you've been born for a year. If you look up the definition of "birthday" (one word, no spaces) you'll see that there is often some acknowledgement of this in the differing definitions offered.
> "Walk 3 blocks that way", in my mind I conceptualize that I'm presently on the 0th block and I don't yet count it.
You have to decide whether the person is saying something is on the third block from your position, or you need to walk about three blocks of distance. You probably do this automatically based on how close you are to the edge of the current block. If I'm 30 feet from the edge of a black, I'm probably not going to include the current block in that distance. I imagine you probably won't as well.
> they're all fairly intuitive. I never realized that 0-based indexing required such a deep background to be justified.
I think they're only intuitive because we're all context sensitive. The issue is when people don't have the same context, or the context no longer strictly makes sense when the terminology is used in contexts that make less sense.
As an offset, and in programming languages where it's easy to see it's an offset (C), it makes perfect sense. In languages where that's all hidden from you, and you're really just referencing the nth item in a list, the 0th item doesn't make a lot of sense. People in these different contexts will likely have different ideas of what "intuitive" means with regards to this.
In Mathematica/Wolfram Languege, the part 0 of an expression is reserved for the head of the expression: f[x, y, z][[0]] == f. Or g[f][x, y][[0, 1]] == f.
For me, it was 1 based index that makes sense to me.
I started programming in FORTRAN, then COBOL, then (Lord help me) RPG II. All had 1 based arrays. I got paid to program in RPG II eight hours a day, five days a week, for two and a half years.
A common bit of code was to create an array of month names. Then take an input date, say 20220825, and split that into $Year (first four characters) and $MonthNumber (next two) and $Day (next two).
$MonthIndex = strip the leading zero (if present) from $MonthNumber
Then, this code worked: $MonthName = @Month[$MonthIndex]
When I ran into my first language with 0 based arrays, and @Month[1] returned February, I was pretty much convinced you all had gone insane. December is an out of bounds condition now? Really?
Much later, a friend explained the stuff about multiplying the index by the array element length, and easily getting the memory location of the item in that array. It does make sense; but, I've always hated the code I've had to write that says $MonthName = @Month[$MonthNumber - 1]
Simple answer: because everything in computers starts at 0. Address 0 is the first byte of memory, etc.
You can now ask, "why does everything start with 0"? I guess because it gives you nice round (base-2) numbers. With 4 bits you can have 16 different "numbers", either 0 to 15 or 1 to 16. However, representing the number "16" in binary requires 5 bits, so you need an additional bit for the same number of elements. So representing 4 bits as 0 to 15 makes more sense
The correct starting ordinal is 0. A list with a 0th element has cardinality 1. This is a concept that exists in many places in english, where the naturality of starting with the zeroth element trumps inertia but english is wildly inconsistent so it has both.
Constructing the naturals without including 0 and starting with it is incredibly awkward. Addition doesn't even have an identity.
Look at a clock, is 1 at the top, or is it the next one after the top.
Look at a ruler. Does it start at 1?
Do you want to have to shuffle everything around when you go from whole numbers to halves? Do you start at 0.5 or 0? Depends if it is "halves" or the real number 0.5
This is a recipe for pain and off by ones any time you're dealing with time steps (which is why it's so stupid that matlab and fortran are 1 indexed by default, if anything there's a better argument for C to be 1 indexed than scientific languages).
Now index and compose a bunch of ranges. A half open range of two elements is incredibly stupid 1 <= i < 3.
So closed ranges go with 1 indexing. Composing and splitting them is incredibly awkward [a,c] is split into [a,b], [b+1, c] ... that's kind of okay, but not unique as you could use [a,b-1], [b,c]. Now do [d,e], [e,f] compose at a glance?
Half open ranges go with zero indexing and they just work.
I've spent so long working with both 0-indexed and 1-indexed (I work with Lua a LOT at work) that it makes almost no difference to me, and I can switch between the two with usually no hiccups.
Either one is something you get used to with practice. I do prefer 0-based, but 1-based isn't as big of a deal as many people treat it in the vast majority of situations, and almost every situation where you'd be using something like Lua in the first place.
0 is a number, and just because the west ignored it for quite some time (read: "Zero: The Biography of a Dangerous Idea") we have these odd built-in aversions to it but in the end it makes plenty of sense.
0th element is 0, so why don't we just do a better job of educating kids to start counting with 0 and thinking about 0.
in the end - as a CS person, i think it's good - but i also don't mind if you disagree with me. :)
Maybe, but then they'd just become discussions about whether `y` in the `x[y]` syntax "should" be an offset or an index. Which one gets the language priority?
...Of course, the answer is offset, because offsets are beautiful, composable, elegant things that harmonize with bitmasks, modulo, array slices, and everything else, while (one-based) indexes are nothing but "offset plus one", no more useful than "offset plus two".
0-based indexing is clearly the best option because it results in more elegant code and it matches what computers actually do.
In fact, it's so fundamentally more elegant that I've come to the conclusion that the real issue is that it's actually the mathematicians and language itself that got indexing wrong. Instead of "first" being associated with 1, it should be associated with 0. We should give athletes 0th place, and talk about the 0th man on the moon etc.
That's where the cognitive dissonance for the 1-based-indexing people is coming from. They can't deal with the fact that their "normal" way of indexing is wrong.
It's a bit like how pi should really be tau (2*pi), or the electron should really be positive. We got it wrong, but we're stuck with it because it's too much of a hassle to change it. Fortunately computing got it right! But now there's a mismatch with everyday life where people are used to the wrong thing.
- makes index == element wise offset from the first element
- there are both algorithm which are (very slightly) easier to implement with 0 and 1 indexing, AFIK more of the "bread and butter" algorithm are easier with 0 then 1 (e.g. indexing of C-style arrays used all the time, heaps etc.)
- 0 is not special => less bounds checks in typed languages (if you have an unsigned int index you only have to check the upper bounds, with 1-index you also have to check for 0)
- 0 is not special => with signed integer you can have negative indices (from the end) in which case all signed integers are valid (but potential out of bounds), with 1 indexing you have a "gap" in the middle which for some (rare) use cases can be very painful to handle
So independent of what the history was. It's just much more pragmatic to use 0-indexing in many common use-cases. Especially given that 1-indexing produces more special cases for very common use-cases makes it sub-optimal for both usability and performance.
Reminds me of a simple implementation of a Prisoners and Boxes riddle that showed up in my YouTube feed where I had to add a comment so others (and eventually future older me) will understand what I did.
>// if one prisoner fails all are dead (1st try -> tries = 0)
That being said, it seems to be a junior software engineer issue where the brain isn't wired yet for "0-based" indexing. I don't hear such complaints from more experienced developers who have simply learned to appreciate it because like in my example above it is more intuitive to me.
I have added "and eventually future older me" above because I don't do much coding these days, and my career trajectory looks like it's going to be even less in the future.
I can understand the rationale for 0 and 1. I can even understand the rationale for a arbitrary range (from -5 to +10) in current era where languages have iterated for many generations.
However I did not understand why they would have a range in the early days of programming. On one hand it would have been hard to make a language and its compilers and then you add arbitrary ranges?
Early programmers were mostly mathematicians and physicists, so the notation borrowed a lot from their fields, where it's common to index sequences with arbitrary numbers. And it's not particularly complicated to implement either.
What if there was no t0 for the universe, as Hawking and a few other physicists have argued? There was just the first plank second and no meaningful before.
Well then clearly you'd have to index at -1, for the time state before the point at which there is no meaningful before. I'm not sure why this answer didn't occur to you.
When you index starting at 1, you either have to add/subtract 1 internally, which sometimes but not always can be optimized away by a smart JIT/compiler, or you have to "waste" the first element in memory, trading memory for performance.
Lua always subtracts 1, while LuaJIT instead went with the latter approach.
For me, far more significant than anything else here is having to convert array indices into human readable text describing objects. And the objects are named by mechanical engineers who don't use 0 based indexing. So my code is full of
print("Finger {}'s motor has overheated".format(i+1))
I think that zero indexed is generally better, including that it involves adding the index number to the base address, and other properties that can be helpful.
However, it can also be useful for many purposes to allow arbitrary ranges, so a programming language probably should allow that; if the index of the first element is not zero, then the base address will not be the address of the first element (if the first element index is positive, then the base address may actually point into a different array, or a different variable). Then, if you have the base address and add the index, you will have the proper address of that element, whether or not the first index number is zero.
Zero is not always the most useful starting index; sometimes other numbers (which may be positive or negative) are useful. But, I think in general, zero is better.
By a general rule, in B the expression
*(V+i)
adds V and i, and refers to the i-th location after V.
Both BCPL and B each add special notation to sweeten such
array accesses; in B an equivalent expression is V[i]
C was designed to be as close to the machine as possible. Prior to C the only programming language for systems programming was assembly. If the compiler-generated code subtracted 1 from the index every time you indexed an array that would have been seen as not as efficient as assembly.
Maybe my thinking is just warped by the language, but I think 0 indexing is the right thing most-- but not all-- of the time. Looking at my code I have relatively few places where I need to -1 to convert a natural 1 index into a zero index.
It would be easy for languages to offer two array access syntaxes, one for 0 and one for 1 like array[i] for 0 index and array{i} for 1 index access, then the accessing code could use the right tool for the job. Unlike defining the indexing type as a property of an array you shouldn't have problems with accidentally using the wrong one at time of use.
... but to really know if it's a good idea you'd have to test it, and sadly very few concepts in programming are subjected to any kind of rigorous scientific study.
Array indices denote offsets from the base address. The first element is at the base address of the array, hence the offset is zero. It avoids a conditional when resolving the address, introducing it would slow down all forms of memory access across the language domain.
After programming in C for a while, I really began to have the feeling that arrays were physical things that took up space in RAM. When I think like this, starting an index at 0 is very intuitive, as 0 implies the origin point of the physical construct.
If you think of arrays just as a kind of abstract list, then beginning at 1 makes much more sense. No one who looks at a to-do list at home talks about the 0th thing on their list to do. But the spacial nature of arrays makes starting at 1 confusing. After I take 1 step, I am no longer at my starting point. And cycling through arrays in code for me has a very similar feeling to taking actual physical steps through a data structure.
I'd claim intuitively that having all bits 0 is a proper starting point and more efficient if there were any constraints on address, which was likely the case historically.
I'm happy to hear any dissenting opinions if this is inaccurate.
What? Is this some kind of trolling challenge? If you don't care about math, just say so. Most people don't. But making up a nonexistent convention is just silly.
A related question: why is a rectangle containing a single pixel defined as (x,y):(x+1,y+1) instead of (x,y):(x,y)?
Here's a discussion from some years ago that starts with some of Guido's thoughts on array indexing and moves on to the QuickDraw coordinate plane and points and rectangles, including how grid lines and points are infinitely thin/small, but pixels occupy the space between the grid lines.
The general term of arithmetic and geometric sequences seem simpler when indexing from 0 rather than 1. I do not think that '1' is more human focused for anything than '0'.
Its an absolute pain in my arse, i honestly don't know what problems i would have with an array system that started at 1, but since the length method sometimes starts at 0 and sometimes 1, (I'm thinking Perl vs JS) I'm always forgetting whether to put = or <= into for loops and when to add one for the read out or subtract 1 when looking at the array again. I mean in what universe should [2]+[3]=7?
The first element having the index zero may be natural or unnatural depending on the context. In mathematics, we count the rows in a matrix starting from 1, but we count the monomials in a polynomial starting (or ending) with 0. Generally, there is less need to use zero as the starting index: the Common Era does not have "the year zero" (there, -1 is followed by +1), etc.
Easy: I have an array with 256 elements. And I want to store the indexes of various elements of the array off in another data structure. So many of them, in fact, that it’s important that the index values each fit into a byte. So, I want the index values to be 0..255, and not 1..256, since 256 doesn’t fit into a byte.
You can still fit that information into one byte and convert it at compile time or runtime :)
In a minecraft clone I made, each chunk consists of 16 subchunks so that I only need 12 bits to represent the block position in the shader. Technically, the block position ranges from INT32_MIN -> INT32_MAX, but since each subchunk only consists of 16x16x16 blocks I can get away with storing the position in 12 bits and then passing in the subchunk's world position once as a uniform.
Arrays start at 0 because they're really just pointers. "[x]" is just shorthand for "+ x * sizeof". The first element is the one stored at the pointer address (0 sizeofs ahead of the pointer, the next element is stored 1 sizeof ahead of the pointer, etc.
In C, "a[x]" is effectively syntactic sugar for just "*(a+x)" (conversion rules for "+" means you don't need the sizeof). It also means you can reverse it and write e.g. "5[somearray]" if you really want to make developers want to throw rocks at you.
That's just convention though. There's no reason "[x]" couldn't have been defined to be "+ (x - 1) * sizeof". There would be no runtime cost to this, and the compile time penalty would be tiny even on ancient systems.
That answers something like "what are the technical benefits to zero-based arrays" but not "why are zero-based arrays such a strong convention in programming languages." For that you'd have to read the article I guess.
Perhaps also worth adding that the linked article (the one titled "Citation Needed") itself originally had hundreds of comments beneath it, many quite argumentative, which seem to be now missing - I guess lost in a site update. It was a very contentious article.
I'm sympathetic to its author's approach - we talk about indexing and it's a strangely fascinating topic but the history of it hasn't been all that well dug out. We do indeed tend to respond to articles like this by going "well obviously [x]-based is good because" - you can see that at work in this discussion here - but those are not necessarily the true historical reasons. But I think the author made a leap too far, the article was a bit too brittle, and it landed in a slightly too argumentative spot.
Because indexing happens with integers. Back then space was a concern, so you'd use unsigned integers, and those start with zero. Internally they'll all be using offsets, and its easier to not do a conversion, than to do a conversion.
0-based or any arbitrary offset of your choice so that your data model can more closely map to what you need it to. And use eachindex instead of making assumptions and you can work with arrays using 1-based, 0-based, or arbitrary-based indexing.
For this particular problem, having a choice even worse than 1-based indexing. Code become a giant PITA to read because you have to switch back and forth.
Looking at my own coding experience, I certainly favor 0 over 1. When I examine why, I can't help but see my aversion to & distaste of Visual Basic as a driver.
VB is the only 1-indexed language I have used.
VB is bad.
Therefore 1-indexed languages must be bad.
The index of an array is an integer, and integer values start at zero, so it seems like an obvious choice as to where the starting point should be. It's only outside of computer science that it seems to be counter-intuitive.
Actually in not all languages do arrays start at 0. In ColdFusion for example arrays start at 1. I've never been able to receive a reason as to why CF's original author, J.J. Allaire, did it that way.
Came here to say this about CF.
I work in Coldfusion but coming from a background of other languages I would always reference 0 as the start of an array. This was the source of countless errors for me until I got use to it.
My manager thinks it was done to make the language easier to understand for people just getting into programming.
Came here to say this about CF.
I work in Coldfusion but coming from a background of other languages I would always reference 0 as the start of an array. This was the source of countless errors for me until I got use to it.
The argument that "BCPL doesn't have pointers, so it's not about pointers" fails immediately because every language has pointers, but some of them don't expose them to the developer.
Lua's arrays are 1-indexed, and it is a terrific interpreted language! Array[0] is nil, makes sense. v5.1 has a solid JIT, and it's got a great 2D game engine called Löve2D.
Because back when computers were young, all bit representations were needed to represent something and having 000 not be used as a valid value wastes a huge percentage of available space.
In defense of starting at 1: if you're counting things, starting at 1 ensures that the last number you said is equal to the number of things you've counted so far.
I always assumed that it was because arrays and pointers would've been interchangeable in the start, and that adding a zero index times the object size gives you the first element
Welcome to pagination and virtualization where physical address is different than what your program think is. Also welcome to hypervisors where entire operating systems think they own an entire physical line, only to have something like Spectre/Meltdown say otherwise.
Wow, we are still talking about this, are we? Honestly, linear algebra texts should just give in and adopt zero based indexing for matrices at this point.
So what's the actual reason you'd want to have zero based arrays and such in your language (In 2020, not 1970)? In a language from scratch that doesn't care about being like previous ones, and doesn't care about micro optimization.
I've been annoyed recently by this obnoxious neckbeard behavior when they make things zero based for no reason (other than to conform to their supposedly existent philosophy), like Ratpoison and Screen selecting windows by a zero based index, which means you have to move your hand to the far right of the keyboard to select the first window with the 0 key, the far left to select the second window, with the 1 key, and one to the right to select the third window, with the 2 key.
I so happen to have been trying to work out a tangible explanation on paper of which system is better on trips a few weeks ago, and could not come up with one (and none are given in the article nor the top comments in this thread).
On a side note, the moment you say, "zeroeth", you can no longer be sure anyone knows what you're talking about because at any moment you may count in English or this pretentious UN*Xtard dialect of English. Seems like another pedagogical stumbling block held on to boomers and boomer-wannabes who just want to have their little counter counter counter culture or whatever.
This is false. Pascal arrays start at whatever you want. As for what is happening behind closed doors (aka linker), those start at zero, but with an offset kept separately (that's why C blew out of the water Pascal in terms of speed when looping through an array). And in Pascal you cannot create a dynamic array with anything other than 0-based.
You are confusing arrays with strings. Those used to start at 1, but since Unicode took over (~2007) strings are implemented as zero-based too (even though you can still think of them as 1-based when writing code).
Also in Delphi, when creating a cross-platform application, strings are treated as 0-based, with even multiple locations in documentation stressing this importance due to having enough differences between classic VCL and the new kid on the block (aka FireMonkey).
Except VB.NET which is only 0-based but you can just shut your eyes and imagine it's 1-based because you declare an array by specifying its upper bound which equals its number of elements in 1-based.
This is my take and when working directly with computer memory/addresses, 0-based is most convenient. For working with data/stuff at a higher level, 1-based is more natural and convenient.
Zero-based indexing is always wrong; an ordered collection of things does not have a zeroth thing.
Sometimes you have a data structure called an "array", which means "a bunch of things that are the same size next to each other in memory". You can store the address of the whole array as the address of its first element, and offset into it by an offset; clearly, the offset of the first element is zero, but it's not the "zeroth" element.
And finally, sometimes you have cargo cult behavior by people who think that the way offsets into arrays work is because "numbering starts at zero in computers!" or some similarly wrong rationalization. This is when you get stupid things like (nth 0 list) in a lisp.
I hate to break it to you but natural spoken language is not a fixed concept, words can have many meanings and interpretations, and those meanings and interpretations can mean different things to different people at different times.
Because you predominantly think of indexing as the first definition as shown here[1], does not mean everyone else does. Be careful when using absolutes, especially when talking about languages.
An ordered collection of things starts no things from the beginning.
A journey starts 0 metres from the origin.
If you count the number of _ you are along your journey and start with 1, then at the second metre you are at the 101st centimetre.
When someone says how many dragons have you slain? Just as you leave and haven't heard of any dragons yet, you don't say "I'm slaying my first dragon now"
Every construction of the natural numbers I have seen starts with 0.
The first hour of the day has passed when the clock strikes one. The hour preceding that is the same day. English calls it 12 because it was invented before we had a firm grasp of zero and english is inconsistent. 24 hour clocks call it 00.
1 based indexing can be correct, but 1 based ordered discrete sets map incredibly poorly to any ordered set with a different number of elements (such as the reals). Additionally dealing with ranges is less easy to make consistent.
This reminds me of arguments about male -> female and man -> woman (and human etc). Without offending anyone's delicate sensibilities - you may note that there is a short version and long version of those words, suggesting that the longer version is derivative of the short version.
However, when you examine the origin of the words, that is not the derivation. The derivation and path to English is quite complicated. You may feel a sense of righteous indignation or righteous repudiation on this topic. I would gently suggest deferring that feeling because I think the story is very interesting to follow.
That is not the end of the story though. The reason those pairs of words stuck around is almost certainly because it makes a consistent mental picture.
Bringing it back to the topic at hand - "Why we use 0" - because it works. There are many times when it has been evaluated, and at least to the people who design languages, it is more mentally appealing.
> "Bringing it back to the topic at hand - "Why we use 0" - because it works. There are many times when it has been evaluated, and at least to the people who design languages, it is more mentally appealing."
The linked post is not just about a technical decision, it's about a social history.
The conclusion of the linked post:
> "Lessons
Things can have more than one cause. Don’t trust easy explanations.
Always look for information that refutes your theory, not just information that supports it. Hoye stopped as soon as he had a satisfying answer and didn’t keep researching.
Don’t be a dick.
It is tragically easy to trick me into doing free research."
Whether or not the language presents the concept of "pointer" to the user is independent of whether or not it uses pointers internally. And if it exposes arrays as a concept, it has to implement them somehow.
The simplest possible implementation of arrays is having a start address and putting all elements next to each other in RAM. To get the address of a particular item, this layout naturally leads to the formula "base address + index * element size", with "index" being 0-based. If you want to expose other indexing schemes in your language, you'll have to add more logic to convert the user-visible index back to 0-based before you can obtain the address.
All of this is completely independent of the fact whether your language exposes pointers to the user or not.
Even the yacht story sort of hints at this:
> To keep people from having their jobs cut short by yacht-racing, Richards designed the language to compile as fast as possible. One optimization was setting arrays to start at 0.
If all indexing schemes were equal, why would this change even be an optimisation in the first place?