I think what bjoli is getting at is that in an ideal world with lazy sequences you don't have any iteration over any of the intermediate "collections" until some step occurs that requires the entire collection. I put collections in quotes because they aren't really collections, they're generators that produce an element on demand.
So you never really iterate over more than one collection; you only have one iteration where you successively ask each generator to produce a single element and then apply a function to it. For example, if you only asked for the first element of a lazy sequence formed by a series of maps, you would (in theory, in practice see my note about chunks) never iterate over any of the intermediate sequences and only ever examine the first element of each of them.
However, the act of asking a generator to produce an element (that is unwrapping a thunk) has overhead of its own and that's the overhead that a transducer would be removing (not iteration itself in the case of lazy sequences). This can have far more overhead than a procedure call because of bad cache locality (in the absence of fusion optimizations you're pointer chasing and potentially generating garbage for each thunk). Clojure tries to get around that by often (but not always) chunking its thunks, in which case we do have multiple rounds of iteration on smaller chunks, but never the entire sequence (unless the entire sequence is smaller than a chunk).
So you never really iterate over more than one collection; you only have one iteration where you successively ask each generator to produce a single element and then apply a function to it. For example, if you only asked for the first element of a lazy sequence formed by a series of maps, you would (in theory, in practice see my note about chunks) never iterate over any of the intermediate sequences and only ever examine the first element of each of them.
However, the act of asking a generator to produce an element (that is unwrapping a thunk) has overhead of its own and that's the overhead that a transducer would be removing (not iteration itself in the case of lazy sequences). This can have far more overhead than a procedure call because of bad cache locality (in the absence of fusion optimizations you're pointer chasing and potentially generating garbage for each thunk). Clojure tries to get around that by often (but not always) chunking its thunks, in which case we do have multiple rounds of iteration on smaller chunks, but never the entire sequence (unless the entire sequence is smaller than a chunk).