Hacker News new | past | comments | ask | show | jobs | submit login
Finding and fixing standard misconceptions about program behavior (brownplt.org)
77 points by vector_spaces 5 months ago | hide | past | favorite | 21 comments



> We also believe that the terms “call-by-value” and “call-by-reference” are so hopelessly muddled at this point (between students, instructors, blogs, the Web…) that finding better terminology overall would be helpful.

Maybe that shouldn't even be exposed to the programmer. The programmer-level questions are, is it copyable, is it mutable, and is it alias-free? Whether it's passed as a copy or a pointer is really an issue for the compiler. If you're passed a read-only copy of something guaranteed to not be aliased, you can't tell the difference from a reference. Some Modula compilers made that decision automatically, based on object size.

Rust compilers have the info to do this. I'm sometimes asking myself whether I should pass, say, an array of 3 32-bit floats in graphics code by reference or by value. The compiler knows better than the programmer what the hardware can copy fast, and that may differ with the platform.


> Maybe that shouldn't even be exposed to the programmer. The programmer-level questions are, is it copyable, is it mutable, and is it alias-free? Whether it's passed as a copy or a pointer is really an issue for the compiler.

The problem here is the definition of "call by reference". In C++ that means being able to change the value outside of the function taking the reference.

    void setByReference(float& v) { v = 123; }

    float v;
    setByReference(v);
    cout << v;          // prints 123
That feature of being able to pass by reference doesn't exist in say, JavaScript. you can only pass by value in JavaScript. The types of values in JS are undefined, null, boolean, number, string, reference-to-function, reference-to-object. You can never pass anything by reference, only by value.

And that's where it gets confusing. If you have a variable who's value is a reference-to-object you pass the value. The value being "reference-to-object"

To re-iterate

    const n = 1;    fn(n);  // call by value, type of value = number
    const s = 'abc' fn(s);  // call by value, type of value = string
    const o = {}    fn(o);  // call by value, type of value = reference-to-object
In C++ though, you pass the a reference to the variable itself (in the example above). That's called call-by-reference.

I can see why the OP feels it's muddled.


Yeah, this is one of those things that while it may be more technically correct causes a lot of unnecessary confusion. I remember being confused by this as a C++ programmer learning Java when resources claimed that Java was always pass by value where the actual behaviour in almost all cases (due to Java going almost all-in on objects) is what a C++ programmer would expect from pass by reference.

I still see even relatively experienced programmers who don't understand this, particularly working with Unity where a lot of programmers came from C++ to C# and don't realise for example that a C# function that takes a List 'by value' and modifies it is actually modifying the same instance that was passed in by the caller.


No, Java really is pass by value. You can rely on this in Java:

  String s = "hello";
  foo(s);
  assert s == "hello";
In C++, if the function takes a non-const reference, you can't.

Yes, Java always passes pointers to objects. But you can pass pointers by value. And passing a pointer by value is not the same as passing by reference!

I think the origin of the confusion around a function taking a list by value and the like is the implicitness of pointers in Java and its cousins. This Java method:

  void foo(List<String> strings)
Is the equivalent of this C++ method:

  void foo(shared_ptr<List<string>> strings)
True systems languages make the pointers explicit.


That's only because strings are immutable in Java. It's not true for reference types in general.

In C++ passing a pointer by value is effectively the same as passing a reference, the only real difference being that the syntax for accessing the underlying value is more implicit for a reference.


No, in Java, this is true for reference types in general. The method receiving the pointer can mutate the object, but it can't change which object the original variable points to.

This is also true when passing a pointer variable by value in C or C++. It is not true when passing a reference in C++ - the receiving method can change which object the original variable points to.


Ok, that's not really what your example showed though. You seem to be relying on string interning to have two different "hello" literals refer to the same underlying object and therefore be equal? Coming from other languages, and specifically C++, I tend to see `==` as value rather than reference equality so that wasn't immediately obvious to me.

The equivalent code in C++ has different semantics but a function that takes a non const reference in C++ cannot change what the reference refers to (references are immutable in that sense in C++, they can only ever refer to one object). What a non const reference allows in C++ is for the called function to change the value of the object referred to and since strings are not immutable in C++ that means that the value of string s could change, not the object identity.

With pointers to pointers, or references to pointers in C++ you can further change the object pointed to / referred to but not with references (there's no such thing as a reference to a reference in C++).


> The compiler knows better than the programmer what the hardware can copy fast

This is true, but aliasing makes it extremely difficult to automatically decide between reference and copy. For example, suppose you have a function that takes your array, and also a mutable slice:

  fn f(a: [float32; 3], out: &mut [float32])
And now you call it with the same argument twice:

    let a = [1., 2., 3.];
    f(a, &mut a)
The compiler cannot choose to pass a by reference here, since that would create simultaneous immutable and mutable references, which is forbidden by rust semantics


The problem is that the existing language concepts that implement these ideas are ambiguous in many languages. Does int *x imply that x is an optional value, or that it’s intended to be shared with/from others, or both of the above?


> The way we informally talk about programming concepts (like “pass a variable”), and the syntactic choices our languages make (like return x), are almost certainly also sources of confusion. The former can naturally lead students to believe variables are being aliased, and the latter can lead them to believe the variable, rather than its value, is being returned.

There's the problem right there: any informal talk about programming concepts is bound to lead to confusion because informal language is open to interpretation (no pun intended). That's why we have formal definitions for these terms, as well as language specs which define how pass-by-value and pass-by-reference (if applicable) and pass-by-address (if applicable) work in a given language.


Deep value equality can also be expensive to check, so some languages default to reference equality even for immutable values.


I used to think having both == and .equals in Java was specifically dumb, but the more I think about it, the more I dislike this kind of sneaky reference equality in the general case.

Comparing references isn't even an optimised version of comparing values, because when the references don't match but the values do, equality will be false.

I struggle to think of when reference equality would be useful to me. Like if I were writing a standard for-loop over an array, it's like I'd be comparing 'i' to 'i'.

If == gives me a fast 'yes' response, I feel like I (or my algorithm) should have already known they were going to be the same.

If == gives me a fast 'no' response, then do I not actually care about the value?


I think Python got this right, where "x is y" is identity (not often used but still concise) and "x == y" is value equality (a method you can override).


This is great. It's good to see someone attempting to identify a mistaken mental model and then correcting it.

In both formal and informal education, it is rare to see teachers/parents/mentors identifying misunderstandings and correcting them. All too often the one teaching reiterates the lesson material instead of asking the learner about their understanding and correcting misunderstandings.


I self-taught basic algebra with a "choose your own adventure" book that advanced to the next topic for correct answers and went through pages explaining the common misconceptions then returned to the question for incorrect.

(to be fair, class sizes were ~30 when was in school, so asking the learner about their understanding and correcting misunderstandings would not have been likely: assuming 40 minutes lecture time for a 50 minute class, that's 20 seconds per student, and at most ~2 minutes if no lecture time whatsoever)


I get that the syntax options are just to adapt to different language users but it’s a little bit confusing that some claims are actually wrong in the specific language. In the first test, they state that adding two bools gives an error, even though this is perfectly valid in Python.


Nice, but needs some more attention to variant (but equivalent) responses.

For example, for one of the questions in the first module I selected "Other", then typed "Error" (in the free-form response box). The answer it was looking for was "error" (lower-case e).

Either it should accept both, or if you're really insistent on distinguishing the two, it should be made more clear at the beginning that system messages (rather than explicit results) are going to be case-sensitive. Counting "ABC" wrong when the expected answer is "abc" would be fair game. Counting "Error" wrong for a pseudo-language that hasn't even been formally defined is not, or so I see it.

Looking at some of the languages on which this is claims to be based:

In JavaScript, typing abc when it's undefined produces: Uncaught ReferenceError: abc is not defined at <anonymous>:1:1 (anonymous) @ VM33:1

(note capital E in ReferenceError)

In Python3 you get:

NameError: name 'abc' is not defined. Did you mean: 'abs'?

(again, note capital E in NameError)

In Racket 8.9 you get:

abc: undefined; cannot reference an identifier before its definition

(no 'error' of any kind!)

I didn't install OCaml, because I need another "package manager" like I need a hole in the head.

Java: does not compile, although to be fair the compiler error message does have an 'error with lower-case e' in it.

C#: Yeah, not gonna install that either.

This may seem extremely nit-picky (and it is), but when you're teaching people who've never programmed before, and who may have never even encountered the concept of case sensitivity before ("But Google doesn't care even if I type in all caps!"), nit-picky is the way it needs to be.

As I said, I do like this, but it needs more attention paid to parsing free-form responses. The problem is not unique to this system, of course. The free-form response is where most systems of this general type tend to fail.


    > In Racket 8.9 you get:
    > abc: undefined; cannot reference an identifier before its definition
    > (no 'error' of any kind!)
But that is an error.

If you try this in the terminal, you will see that the message is colored red to show it is an error.

If you try it in DrRacket, you will see the message written in red text with icons leading to the stack trace.

If you try it in Emacs, you will see the message in red indicating it is an error.


There was an example directly before that question which shows an example output when dividing by zero. There you can see what format they expect the error to be.


These are supposed to be beginning students. You can't just assume that they know that there's even any difference between "Error" and "error".


Oooh, this is awesome. I love the formalizationd presented here, and the formalization of a notional machine into something more powerful and executable. Can't wait to hear more about this. Great name too! SMoL!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: