I'm not super familiar with interior mutablility, but I don't think this is the ...

codebje · on Jan 9, 2020

The second notion, where it's clear that the memory space can safely be used, is why there's interest in linear and affine type systems. These systems describe types whose value can only be read one time, and for linear types, which must be read one time.

On the surface that can seem a bit stupid, but the practical outcome is that compilers can determine when an array's contents can be mutated in place rather than overwritten.

A variable you read over and over can be thought of theoretically as one you duplicate, then read one copy of. Since the act of reading the copy destroys it, a compiler implements this by merely reading the value directly.

Passing an array into a subroutine however means the compiler either copies it, or forbids further access after passing it in. If it's the latter, the memory space is available for re-use inside the subroutine.

Rust uses affine type theory for its borrow system to provide memory safety. Function language research is looking at linear and affine types for a variety of applications including performance.

gowld · on Jan 10, 2020

For the read case, do you mean that in nonbranching code, a read can be considered a destroy+create that cancel out? How is that legal if the reads are also writes into other variables, or copied into an array/tuple and returned to a caller? That seems a clear violation of linearity.

Also, that doesn't seem true for branching structures, but I guess technically in an immutable language loops iterations are recursive functions.

masklinn · on Jan 9, 2020

That's also the impression I have, the article really has nothing to do with Rust's interior mutability. And Rust wouldn't have the issue in the first place because whether it takes the parameter by reference, mutable reference or value would clearly explain the "contract" to the caller.

It does feel a bit closer to Clojure's transients which I guess it's what it's referring to.

However there are also a number of differences with Clojure:

* Clojure can not completely prevents the caller modifying callee data (at least using "normal" clojure collections, which are persistent)

* creating and reifying transients is O(1)

* the operation is a fairly trivial mapping of #(/ (- % low) (- high low)), which is likely what you'd have done in Clojure

eyelidlessness · on Jan 9, 2020

This definitely made me think of Clojure transients. For reference: https://clojure.org/reference/transients

> If a pure function mutates some local data in order to produce an immutable return value, is that ok?

The idea in Clojure is that data is immutable by default but there are function-local mutation APIs specifically for optimization. The contract between caller and callee is still the same, a function never mutates your data, but transients provide improved performance and memory characteristics invisible to the caller.

gowld · on Jan 10, 2020

What is callee data that a caller can access while the callee exists? Is this multithreeaded code?

masklinn · on Jan 11, 2020

I don't know if it was autocorrect screwing up or brainfart / fucky fingers (likely the latter I fear), but what I meant to write that clojure can completely prevent the callee from modifying caller data, since it uses persistent collections and persistent collections are naturally immutable.

Sadly by the time I realised the screwup I couldn't edit the comment anymore (one of the really annoying things about HN alongside stripping random characters from comments and the half-assed markup).

edflsafoiewq · on Jan 9, 2020

Yeah, this is just "don't create large temporaries unnecessarily". I'm not sure it warrants a name.

codebje · on Jan 9, 2020

It does if you want research done so compilers can implement results in this space and optimise away unnecessary copies without introducing errors.

millstone · on Jan 10, 2020

This class of automatic, best-effort optimizations is actually bad!

Say the compiler discovers an optimization like "we don't have to copy this giant array". Now either you care or you don't. If you don't care, it was wasted work. If you do care, now you have some fragile, implicit set of preconditions to maintain the optimization. How do you document these, how do you test it?

Instead, in a high performance language, we should:

1. Make the language do the algorithmic heavy lifting. Rust's annoyingly explicit affine type system enables its deterministic memory management and various safety guarantees. Its generics are explicitly monomorphic, which enable aggressive inlining. etc.

2. The compiler's optimizer sweeps up just the details. Register allocation, constant propagation, inlining, all the classics. But not inter-procedural stream-fusing tail-calling Haskell-heroism.

The idea is to make the important optimizations explicit, and allow the compiler to make things faster on the margins and avoid the obviously wasted work.

gowld · on Jan 10, 2020

Explicitness is nice when you are building spacecraft, be but it has a real human cost. I'd rather not write all that stuff, and let my static or dynamic analyzer warn me about potential hotspots non-optimizable code.

Best-effort is fine if it's run an ecosystem where code changes are tested before deployment, which is everywhere performance matters.

logicallee · on Jan 10, 2020

Let's say someone includes a line

48362/37277295;

That's a no-op. (Some rational value you're not using - it's not even assigned to a variable name).

The compiler might warn you about it.

But do you think it should be allowed to just remove it - not even create the op?

Regardless of what the standard says, I'd expect the answer to be "sure."

What do you think?

DecoPerson · on Jan 9, 2020

I also see this as a bit of a premature optimization. The triple memory usage one may be significantly faster than the in-place one -- and the memory usage is temporary so unless you're in a constrained environment that's not a concern.

Avoiding unintentional or unexpected side effects is a good thing. Pure functions are great. Do whatever you can do keep your visualization helper thing pure and avoid bugs.

Then, once it works, if you have performance issues, do some profiling.

If the profiling shows this helper method as a hotspot, then it's worth optimizing.

At that point, you'd want to follow standard Python, numpy, and pandas optimization strategies -- which I'm not familiar with.

Aimless optimization like this is a waste of time. Good program structure (data is available when you need it without tonnes of calls), the right choice of data structures and algorithms (hashmap vs list), the right choice of underlying tools (Julia vs Python) and targeted optimization (profiling) are the ways to go.

kllrnohj · on Jan 10, 2020

The slowest thing CPUs do by far is access memory. I don't see any possibly way that, for the given example, using more memory would ever be faster. At best 3*N all fits in cache and it's not as bad, but even then it's still at least one more cache miss which means it will still be slower even if it's not by much. Temporary memory is the worst type of memory. It's cache misses & pipeline stalls for no point.

"Premature optimization" is not an excuse to write obviously bad code. Fixing this simple thing now is much faster, and simpler, than digging a big hole, jumping in it, and then hoping you can find something to pull yourself back out later.

In the same way it's important to scatter about asserts and verify parameter arguments do in fact conform to the documented contract, it's also important to always keep an eye out for openly wasteful code.

davedx · on Jan 10, 2020

> it's also important to always keep an eye out for openly wasteful code.

No, that's the point. It's really not. If your "openly wasteful code" does its job well and is readable and maintainable, it DOES NOT MATTER that it is "wasteful".

If, on the other hand, the wasteful code causes your program to execute in 60 minutes instead of 10 seconds every time you run it, then it does matter and you should optimize it.

Why is this such a difficult concept?

kllrnohj · on Jan 10, 2020

If you're trying to use Knuth's quote from 1974 as your justification for that concept then you've misunderstood his argument and should read the full thing, not the single phrase.

This article is not advocating premature optimization as Knuth was discussing nor does the article's advice result in the evil that Knuth was warning against.

If you have some other evidence or paper justifying why we can ignore obvious inefficiencies please provide a source. That's a rather radical claim to make, though, which may be why you find people struggle with the concept.

kardos · on Jan 9, 2020

My experience finds that the in-place optimization can indeed speed up numpy code. Even if you're not memory constrained, fewer arrays in play means less memory bandwidth needed.

>> Many NumPy APIs include an out keyword argument, allowing you to write the results to an existing array, often including the original one.

I'd wager that saving memory or memory bandwidth is the impetus for expanding the API to include this option.

philipkglass · on Jan 9, 2020

The article starts with "You have a large chunk of data — a NumPy array, or a Pandas DataFrame — and you need to do a series of operations on it."

So it's suggesting to use these techniques when you already know that you are working with a large chunk of data, either because it was obvious from the beginning or because profiling highlighted it. In scientific computing it is common to store gigabytes of data in a single array. In fact it is common that even the compromise-solution suggested in this article, which leads to 2x memory instead of 3x, would be too much extra overhead.

But the compromise solution is nice if you can afford 2x memory overhead, so I upvoted the article.

shpongled · on Jan 9, 2020

I think the easier way to think about Rust's interior mutability is via an example: (Cell is a mechanism for interior mutability)

  struct S {
     a: Cell<u64> 
  }

  struct T {
     b: u64 
  }

To mutate field `b` on `T`, we need a mutable object or reference. However, we can mutate field `a` on `S` via an immutable reference. Typically this is used to encapsulate some kind of interior state that isn't visible outside of the struct's methods, and helps in certain situations where using a mutable reference might cause some hair pulling due to the borrow checker

chewbacha · on Jan 9, 2020

> B points to C mutably

I think you mean that B points to D mutably because C is mutable. The pointing from B to C is locked in place because of the immutability of B. However, C is free to change it's pointers to some other D.

vnorilo · on Jan 9, 2020

Reminds me of copy-on-write. Operation can reuse input buffer if it holds the sole reference to it. In the scenario you described, you manually ensure and write this behavior for the rest of the pipeline.

itamarst · on Jan 10, 2020

Original author here: to reduce confusion I changed the article to use a new term, "hidden mutability".

DougBTX · on Jan 10, 2020

Thanks, that seems like a much better term.

With Rust-style interior mutability, any code accessing the value will see that it was mutated, but here the caller still sees a purely functional call, since the mutation is hidden.

Izkata · on Jan 10, 2020

I've heard the term "immutable facade" before for this type of thing (though granted, usually while talking about a class interface rather than standalone functions).