Under the hood I'm going to use Rust `Vec<T>`s to represent lists, so it'd be trivial to provide a FFI operation to append something to the end in constant time. However, we can treat the list like a linked list, and, using recursion, here's a tail-call-optimized implementation of append in linear time:
append = list value -> {
loop = acc rem -> match rem {
[] -> acc,
[head & tail] -> loop [head & acc] tail
}
loop value (List::reverse list)
}
I'm thinking of using something other than the `[a & b]` syntax, but that's besides the point. Note that `loop` does not close over any values other than itself, so it doesn't need to be inside `append`:
loop = acc rem -> ...
append = list value -> ...
The function really starts when we reach this line:
loop value' (List::reverse list')
I added `'` to indicate that this is the last occurrence of both `list` and `value` in this scope, which means that they're passed to `loop` without making a CoW reference. Let's look at `loop`:
loop = acc rem -> match rem {
[] -> acc,
[head & tail] -> loop [head & acc] tail
}
This is where things get complicated, depending on the definition of match. If it were defined as macro, it'd be laborious to write out how it all works by hand. But if we treat `loop` as a function with multiple heads, it's a bit easier.
Here, we have two arms. If the remaining list is empty, we return `acc`. note that each arm is independent, so this usage of `acc` is the last in the scope.
If the list is not empty, we match `rem` against `[head & tail]`. This is equivalent to, roughly:
[head & tail] = rem'
Note how this is the last usage of rem in this scope, meaning it's passed by actual value. This list is then split in-place, with head taking on the value of the first element of the list and tail taking on the value of the rest. Once passed to loop, note that all the values used are the last occurrence of the value in the scope:
loop [head' & acc'] tail'
This means that the actual value is passed down to the next level. Looking at this as a whole, we can see that nothing is ever passed by CoW reference, which means that this operation is performed in-place. And because it's done in a tail-recursive manner, `append` only uses constant space.
So to answer your question specifically: what if append is called with `xs` being a CoW reference?
append &xs y
When we reach the line:
[head & tail] = &rem'
`rem` is mutated, i.e. split in two, so a copy of it is made. This copy is it's own value, so head no tail are no longer CoW. we then call:
loop [head & acc] tail
Note that head and tail are now passed by value and not CoW - same goes for acc. Because nothing is CoW anymore, this tail-recursive implementation operates on the new list, mutating it in place.
In short: the first iteration in `append` makes a copy to the list, but in subsequent iterations the copy is used, and the list is mutated in place.
That was a big one, I'm practically writing the blog post out here, haha! Does this help clear things up?
> When we reach the line: [head & tail] = &rem' `rem` is mutated, i.e. split in two, so a copy of it is made.
Huh. That surprises me, that you consider destructuring a form of mutation. But if you do, then I see how this works out because it forces a copy.
However, this raises more doubts. If I write something like
first_two xs -> match xs {
[] -> [],
[head & tail] -> [head & first tail]
}
first xs -> match xs {
[] -> [],
[head & _] -> [head]
}
and call first_two with a CoW reference to a very long list, will it copy the entire list on the first destructuring, even though we really only want to copy two elements? For that matter, will a list search function, that is only supposed to read the list, called with a CoW reference actually copy the list?
If the answer is "well, lists are special", I guess you could consider the same questions on trees. In a functional language, insertion into a binary tree only copies nodes along one path in the tree. Would your language copy the entire tree instead, at potentially exponentially higher cost?
(I'm also slightly confused by your definition of append -- the accumulator seems to be a list, but then the initial call of loop would need to use [value] instead of value, no?)
Thanks for your excellent questions. They're really helping me fuzz through my approach - I wish I had people I could do this with more often, haha. :P
Anyway, something I realized after I posted that question is that assignment doesn't have to force a copy; for instance, this could just take a CoW slice reference to the original list:
[head & tail] = &rem'
The only requirement is that the value returned from a function is an owned value, not CoW. To avoid talking myself into circles, I'll show that `append` still works under this rule. It just moves the copy to the point where we construct the new list, rather that at the point where we de-structure it:
loop [head' & acc'] tail'
So in `[head' & acc']`.
For the code sample you provided, a reference would be passed down, rather than a copy of the entire list, so just the first two elements would be copied. Do you have any further questions regarding this? I'd like to be able to prove the soundness of this system, but I'm not sure going back and forth on example after example is the best strategy in the long run.
> In a functional language, insertion into a binary tree only copies nodes along one path in the tree. Would your language copy the entire tree instead, at potentially exponentially higher cost?
Most functional languages use persistent data-types, because all data in those languages are immutable. I haven't thought about how they'd work into Passerine, but if you mutate data that will be used later, a copy of that data will be made. So yes, mutating a tree makes a copy of it if the original tree is used later.
This is less of a problem then you'd think. It's common to write programs as series of transformations on data, so note that if you do something like this on a big million-element list
children
.map { child -> (child.name, child.age, child.hobby) }
.filter { (_, age, _) -> age > 10 }
.map { (name, _, hobby) -> (first_name name, hobby) }
.filter { (_, hobby) -> is_fun hobby }
.fold "" { (string, (name, hobby)) -> string + name + " enjoys " + hobby + "\n" }
.print
No copies will be made, because each step is the last usage of the data-structure produced in the previous step.
> (I'm also slightly confused by your definition of append -- the accumulator seems to be a list, but then the initial call of loop would need to use [value] instead of value, no?)
I wrote a long response, to this, but it ended up being a couple thousand words, so I'm trying to trim it down. (I went off the deep end, with stack diagrams, explaining how cycles get created, etc, etc.) TL;DR for now is that you can't have a list of references, each object is only mutably 'owned' by one other. Once I trim down the response, I'll post it - should be sometime later today.
Thank you again for your answers. I must admit that at this point I'm thoroughly confused by references, CoW, Passerine being pass-by-value, ownership (new information today!), and fairly advanced rules for when something needs to be copied. It probably makes sense to leave you to write that blog post where you can give a consistent view of the whole rather than iterating on small examples. I'm intrigued, and I'm looking forward to reading more, though I'm also getting a nagging feeling that there's a lot of ceremony here just to avoid garbage collection...
Here, we have two arms. If the remaining list is empty, we return `acc`. note that each arm is independent, so this usage of `acc` is the last in the scope.
If the list is not empty, we match `rem` against `[head & tail]`. This is equivalent to, roughly:
Note how this is the last usage of rem in this scope, meaning it's passed by actual value. This list is then split in-place, with head taking on the value of the first element of the list and tail taking on the value of the rest. Once passed to loop, note that all the values used are the last occurrence of the value in the scope: This means that the actual value is passed down to the next level. Looking at this as a whole, we can see that nothing is ever passed by CoW reference, which means that this operation is performed in-place. And because it's done in a tail-recursive manner, `append` only uses constant space.So to answer your question specifically: what if append is called with `xs` being a CoW reference?
When we reach the line: `rem` is mutated, i.e. split in two, so a copy of it is made. This copy is it's own value, so head no tail are no longer CoW. we then call: Note that head and tail are now passed by value and not CoW - same goes for acc. Because nothing is CoW anymore, this tail-recursive implementation operates on the new list, mutating it in place.In short: the first iteration in `append` makes a copy to the list, but in subsequent iterations the copy is used, and the list is mutated in place.
That was a big one, I'm practically writing the blog post out here, haha! Does this help clear things up?