What I always find a bit missing in such articles is what the n is. The author writes "If you consider "addition" to be 1 operation" and this seems kinda intuitive and is correct if we talk about normal machine integers.
But adding two arbitrary integers might be somewhat linear in bit-width. And there we have it: with a fixed bit-width, this becomes a constant term.
So you might not want to talk about number of input terms as n, but also width of your terms (and python integers are arbitrary precision btw.).
So yeah, this is an upper bound on how many "steps" you do for every input, but often enough it's not really clear what a step is, especially if you have several "steps" that relate do different forms of data retrieval and organization (which often are culprits for bad performance if you're done optimizing the number of loops). Sometimes you can hide behind some guaruantees that your hashset has constant lookup. But did you factor in whether the hash function is actually constant (or even fast, for that matter)?
> often enough it's not really clear what a step is
The steps are basic CPU operations such as load/store and basic arithmetic operations performed on fixed width type. I.e. things a CPU can do.
> And there we have it: with a fixed bit-width, this becomes a constant term.
It makes sense because that is how the CPU works: arithmetic operations on fixed width types are performed in constant time in the CPU ALU. Adding anything below 2^32 requires the same number of cylces.
It is only confusing when you confuse basic CPU operations with basic python instructions (an arbitrary precision addition is certainly not a simple operation from the cpu perspective)
> The steps are basic CPU operations such as load/store
Only if you consider the result of a comparison as a "store" into the register. But again, comparing two objects might not be constant. The compiler I develop for my PhD, for example, uses (written in Haskell) somewhat nested sets and quite a few operations are dominated by the equality check.
> It is only confusing when you confuse basic CPU operations with basic python instructions
But the author did exactly that, when they considered addition to be constant, which is a good approximation for small integers, but wrong in python. And really, you do have to consider those intricacies, that's why I chose this example, as sets and maps in python are basically hashmaps, you can assume constant lookup, for example, if your hash function works in constant time. And again, this begs the question what the n is. Usually we would set n as the number of items in a collection for its retrieval operation. But you actually also have to consider the item being retrieved.
> But the author did exactly that, when they considered addition to be constant.
Ok, I agree the article is imprecise. It should say fixed width integer addition instead of simply addition since the latter can either refer to a simple ADD cpu instruction or to the much more complex '+' operator in python.
However, once you agree on this, the concept of step is clear enough: it is an ADD instruction (or LOAD+ADD+STORE if you wish, still constant time for fixed width types). Obviously, it does not mean that every occurrence of '+' in python can be computed in constant time.
This is exactly my point. In every somewhat lisp-like language, or even just considering operator overloading by any means (interfaces, for example), the concept of a step becomes unclear.
Say instead of the for-loop the author would use something like 'for idx in len(ls):' and then access items with [idx]. I think it's obvious that in order to know the runtime complexity, would would need to know what kind of access [] provides (linked list in linear time, array in constant time, treelist or skiplist in log time). That's why I said it's easy to hide behind implementation details. And if you do count them, with all intricacies, it gets quite complex.
We could now look at what the turing machine implementing that algorithm would do, as no "shortcuts" are allowed there. And the computational complexity is strongly bound to that kind of computation (Specifically it is unknown whether the number of derivation steps in lambda calculus translates to number of steps in a turing machine).
But adding two arbitrary integers might be somewhat linear in bit-width. And there we have it: with a fixed bit-width, this becomes a constant term.
So you might not want to talk about number of input terms as n, but also width of your terms (and python integers are arbitrary precision btw.).
So yeah, this is an upper bound on how many "steps" you do for every input, but often enough it's not really clear what a step is, especially if you have several "steps" that relate do different forms of data retrieval and organization (which often are culprits for bad performance if you're done optimizing the number of loops). Sometimes you can hide behind some guaruantees that your hashset has constant lookup. But did you factor in whether the hash function is actually constant (or even fast, for that matter)?