Hacker News new | past | comments | ask | show | jobs | submit login
Patterns with Rust Types (shuttle.rs)
116 points by mre on Sept 1, 2023 | hide | past | favorite | 55 comments



As was already mentioned in the Reddit thread on /r/rust (https://old.reddit.com/r/rust/comments/wb10zi/patterns_with_...): It's weird to take a platform-dependent integer for a database ID. Use `u32` or `u64` instead of `usize`.


I find `usize` in Rust is like the default unsigned integer, used for nearly everything and especially everything index-related.

I do think in practice using `u64` instead of `usize` is meaningless, since there are so few 32-but systems today. The nice thing with the newtype pattern though is that if for whatever reason your program is going to run on a 32-bit instance and you need 64-bit IDs, it’s a 1-line change.


I tend to take the opposite approach in my Rust code. Unless I have a specific reason to use `usize` (e.g. using a type to index a standard library type that requires it), I always try to pick an exact size. It's pretty rare that I run into something where I'm either completely fine limiting the size to 32 bits or explicitly need to support something larger than that. Moreover, I'd still have to consider what happens at the limits of the integer type even if I did use `usize` for more general cases to make sure that I properly handle overflow (generally by using saturating or checked arithmetic in my use cases). Coincidentally, using usize only for stuff like indexing vectors and slices makes it much more rare I end up needing to explicitly deal with overflow for it because any time I have to make something dynamically sized, I already need to ensure that I'm putting reasonable limits on how much I'm willing to allocate, and that limit is inevitably much smaller than usize::MAX.


  Moreover, I'd still have to consider what happens at the limits of the
  integer type even if I did use `usize` for more general cases to make
  sure that I properly handle overflow (generally by using saturating or
  checked arithmetic in my use cases).
That's one of those things I don't particularly like about rust. The most terse means of converting between numeric types doesn't do any bounds checking.

  use std::convert::TryFrom;
  
  fn main() {
    let a = usize::MAX;
    let b = a as u32;
    println!("{a} == {b}");
    let c = u32::try_from(a).expect("this will always fail");
    println!("{a} == {c}");
  }


Yeah, ultimately I think having `as` conversions on numeric types was a mistake, and I suspect that's not a super uncommon opinion.


That's why clippy has as_conversions


> there are so few 32-bit systems today

Some of the most widely used MCUs for embedded Rust are 32-bit. But typically you want to be explicit about your sizes when doing embedded anyways.

I think using usize is mostly a convenience that lets you fit values into standard library functions.


I see usize as the "memory indexing type" for the current platform I run on, storing it or sending it over some middleware should be a rare fringe case IMHO.


It was explicitly renamed away from `uint` to make it clear that it's not the default integer. Rust tries to encourage using explicit bit widths.


Builder patterns are also super helpful for libraries. Allows you to create APIs with lots of optional fields without having call sites that pass in a dozen None values.

There's also the type state pattern where you have a struct take a generic so you can allow certain methods depending on the generic type. Like say you have a struct with a field that you want to be optional, but you know statically that at a certain point in the code it'll be set.

You can do:

    struct User<IdType> {
       id: IdType
    }

    impl User<()> {
       fn create() -> User<u64> {
         ...
       }
    }

    impl User<u64> {
       fn id() -> u64 {
         user.id
       }
    }

That way you can have some methods where you can be certain that the ID is set and some methods where it's not set.


I love that structs can be strictly non-copyable.

This way I don't need to heap allocate objects to ensure their uniqueness. I can still modify them in place without worrying there will be any out-of-date copies that weren't updated. Or I can give out simple ints or even zero-sized structs that can be handles/tokens that grant exclusive or single-use access.


Unfortunately I haven't been able to find a way to make structs completely noncopyable except via Pin. The compiler is reference tracking, and if it's sure that there aren't references it'll copy the value around. So for instance full RVO isn't guaranteed, and you can catch the bit pattern of &self changing out from under you even in a struct that doesn't implement Copy.

I'd kill for a proper copy constructor to cover the edge cases.


You're expecting a stable address, but that's a different thing.

Rust will move structs to a new address if it needs to, but still enforce that there's semantically a single instance (its old address becomes inaccessible).


You’re complaining that something else is copyable.


I'm complaining that the object I've built will have literal memcpy(3) called on it.


At a certain point, the compiler can’t save you and you have to actually know the limitations of the data you’re working with.


I do know how, as I listed how to in my original comment: using Pin. It's just been very unergonomic in a couple cases because I don't actually care in some of those cases of the struct is copied, I just want know when &self's bit pattern changes and there's no way in rust to know that.


Does anyone know of any sites that list out articles like this by language? I always find it tough to locate more advanced or "meta" articles like this that aren't just explaining basic concepts.


The no-foreign-traits-on-foreign-types rule and the terrible ergonomics behind the newtype "pattern" combine to make one of the worst development experiences I've ever seen in a language. It basically means you can only use one large library, ever. (With caveats.) For example, if I'm writing a game, I include a math library and pull in a Vec3 type, then add Vec3 as a member to all my objects, and then later I want to pull in a serialization library that defines, for ex, a Serializable trait, I can't, unless:

1. I write my own Vec3 class, not fun.

2. I write my own serialization library, could be fun depending on who you are but surely a waste of time.

3. I make a newtype, implement Serializable on it, find-and-replace all Vec3s with my own SerializableVec3, fix about 500 errors one by one by adding .0 everywhere, implement wrappers for all the Vec3 methods I want to use, rinse and repeat for every other foreign type I want to serialize.

4. Hope the math library and the serialization library know each other and the math library implements all the traits you need. In practice this is what happens, but this makes a weird situation where serde is the de facto standard/monopoly because it is literally impossible to use anything else. Which may be fine for now, serde is the standard precisely because it's so extensible, but if you ever want to use another library with traits, well you're SOL and it's back to step 3.

How they managed to ship a language with dev ergonomics this bad is beyond me. Especially when the solution is so simple. Allow foreign traits on foreign types for executable projects only. Put it behind a compiler flag for all I care. -fallow-foreign-traits-I-dont-care-if-this-breaks-things. Just let me implement my damn traits, the current state is ridiculous.


I won’t deny that the newtype pattern leaves a lot to be desired, but I think I’m practice this isn’t nearly as big a deal as you make it out to be.

All that aside,

> Especially when the solution is so simple.

This is completely unwarranted and presupposes incompetency or apathy on the Rust dev team, either of which couldn’t be farther from the truth.

Designing programming languages is hard and full of tradeoffs. While we can of course disagree about those tradeoffs, “simple” changes like these are never actually simple in practice and involve complex sets of tradeoffs that invariably have been discussed to obscene lengths. Even “simple” workarounds often commit to implementation guarantees that language designers are hesitant to make until they’re certain it won’t box them into a corner in the future.

Put bluntly, anyone who asserts that a programming language design change is “simple” is only highlighting their own ignorance on the subject.


I didn't mean any offence when I called the change simple, though I do now see how it can be interpreted that way. When I call it simple it's because I am ignorant about programming language design changes, because, well, I don't really care. I'm not a PL designer. The change is simple to me because it: (a) seems arbitrary and easily changed, (b) solves a class of problem that I've never run into in other languages nor can ever really envision happening in a realistic scenario, and (c) if there are complex sets of tradeoffs involved in enabling this feature then they haven't been communicated well to the end-user at all. Especially with how bad the current state of affairs is, my calling the change simple should be a sign to the Rust team that if this really is the best possible decision for the language, they should put a clear statement in the compiler saying why. Because as it stands, from my end-user perspective, I still haven't seen a great reason why this shouldn't be implemented.


Another example: https://crates.io/crates/deepsize

You can implement that for your types. Deepsize crate implements the trait for a few popular libraries.

You can derive the trait in your structures and it will work for the few structures that happen to contain only your structures and the few hand picked structures supported by the deepsize crate.

And then? What about the 99% of the other cases? Adding a trait impl is acceptable. Wrapping all the other types not much


I agree with you. I'd like to see Rust change to allow defining traits on foreign types, somehow.

I've discussed this a number of times with folks on the Rust community Discord. There are unfortunately obstacles to making this happen. One obstacle is that if you implement a trait for a type, then that implementation applies to it everywhere, including in all other library code using the type. By implementing a trait you change its behavior in surprising and likely conflicting ways. Additionally, if you were to implement `Serialize` for some struct, then that would also directly conflict with the struct owner trying to do the same thing.

My proposed solution to this problem is to conceptualized trait implementations as something that can be `use`d. The basic idea is that, if I implement `Serialize` for `Foo` (and `Foo` comes from another namespace) then the type-trait implementation remains private to mine (or perhaps, private to my namespace). That's not how Rust works today, but I'd be curious whether it could work.

I realize this will create a new set of challenges. It will mean `Foo` comes with one set of behavior everywhere else, and another set of behavior in my namespaces where the trait is implemented for it. I don't know enough about Rust to foresee what kind of problems this would cause, but it seems tempting to explore. Usually the kinds of traits that you'd want to implement are not ones that will cause problems for other code.


Sometimes I just wish that it were possible to do something akin to "pub(crate) impl Trait for Type", that would a) only make the trait implemented for the type in the current crate b) possibly override any other impl of the trait for said type.


I can see the convenience of sort of importing implementations of traits into scope. One difficulty is that traits are used for core functionality, like dereferencing and comparison operators. It would be inconvenient to have to import the implementations to do these basic things and confusing if some commonly used types could have their basic behavior dramatically changed. Maybe that could be solved by choosing some traits that are more central and preventing them from being switched out by importation, so you couldn't for instance change what `==` does to a pair of `f64`s.


What if the author of the trait explicitly marks it to allow that?


This isn’t an issue with the “newtype” pattern, it’s an issue with orphan instances. Newtypes have other uses (e.g. contracts, type-safety for integers and other data which has specific meaning). This is just using the newtype pattern to get around the fundamental issue, which is that you can’t define a third-party instance for a third-party type.

Personally, I’d like Rust to have “selectable” implementations, which are named and must be explicitly imported, and explicitly “applied” if there are multiple in scope. Selectable implementations always override regular implementations, so you can also replace the library’s provided implementation if it’s buggy or not what you wanted. I don’t even think there’s an RFC for this though…

Aside: newtype isn’t even Rust-specific. It goes to show how popular Rust is and how much Rustaceons love type safety, that when you search “newtype pattern”, the first results are all Rust. The keyword “newtype” comes from Haskell, which also has the orphan rule and associated issues, but at least lets you disable it with a GHC rule. And a zero-cost wrapper is something you can do in Swift, Kotlin, C++, and even C, and the general newtype pattern (although not necessarily zero-cost) is something you can do in practically any typed language, even untyped ones like JavaScript if you consider runtime exceptions ok (use a struct with the custom type name as the field name).


> I make a newtype, implement Serializable on it, find-and-replace all Vec3s with my own SerializableVec3, fix about 500 errors one by one by adding .0 everywhere, implement wrappers for all the Vec3 methods I want to use, rinse and repeat for every other foreign type I want to serialize.

You could also implement Deref (and, if appropriate, DerefMut) on the newtype, though there seems to be controversy over this pattern; for the specific case of a newtype that exists for the specific purpose of enabling foreign trait implementation, it seems to be a fairly straightforward and effective way of dealing with things.



The anti-pattern described as Deref Polymorphism is not the same as using Deref with the newtype pattern, in order to allow using the wrapped type transparently. In the latter case, the Target type of the Deref trait is always going to be perfectly clear. In case of the antipattern described on the linked page, it is not perfectly clear what the Target type is going to be.

In short, a Deref impl for some type T signals that T represents some level of indirection, and following/dereferencing that indirection can always be done in an unsurprising and trivial manner.


You're missing:

5. Write a function `serialize_vec3(vec: Vec3) -> String`, and call it in the places you want to serialize a Vec3. The body of the function can convert it to a new type that #derives Serializable, or it can implement the Serializable trait directly.


It's definitely annoying. But to say it's

> one of the worst development experiences I've ever seen in a language

is pure hyperbole. All languages have issues at least as big as this, and most have far far bigger issues.

Name a language and I'll tell you a much worse issue.


Okay, I was being a bit hyperbolic and I'd rather not go back and forth, but I do really hate the newtype pattern. I'm not sure why they don't introduce something like what Go has, like `type UserID = uint32`.


Rust has that, it's called a type alias: https://doc.rust-lang.org/reference/items/type-aliases.html

I'm not familiar with that Go feature, but in Rust it's just introducing another name for a type. I've usually seen it used to introduce simpler names for complex types.


Type aliases and new types are two subtly different features. The new type pattern allows you to provide new impls for the underlying type (it is a new type after all) while the type alias is pure sugar where type T = U allows the string T to be substituted for U in any place where T is imported. If U is a foreign type you cannot provide new trait impls for it.


> Name a language and I'll tell you a much worse issue.

Not sure if the offer was only open to OP but I'll bite. How about Java?


1. Java does not allow to implement new interfaces for classes without modifying their signature - much more limiting than Rusts foreign trait rule. You cannot make a foreign class implement your new interface.

2. Java does not allow to create wrappers without incurring a significant memory overhead. A wrapper will consume at least 32 bytes of memory on 64-bit systems just for itself, when in Rust this can be 0.

3. Even if you define a wrapper, it would be way less ergonomic than in Rust - there is nothing like AsRef / Deref / From / Into etc machinery in Java.


I haven't used Java for some time, but when I did it didn't have non-nullable types. I think collections were also type erased which is pretty bad.

JNI is a nightmare.

Strings are UTF-16.

Dealing with installing the JVM, whatever the hell "classpath" is, tweaking GC parameters etc. is a big downside.

Despite that it's definitely one of the saner languages out there.


Boxed Integers will compare with == for values -128 to 127 but not other valid ints.

    Integer.valueOf(5) == Integer.valueOf(5)
    true

    Integer.valueOf(200) == Integer.valueOf(200)
    false


That's a quirk of the language, sure, but it's barely an issue, and definitely not a worse issue. Don't get many wrong, Java has many issues, but this barely qualifies.

You don't compare non-primitives (which boxed integers are) with == in Java, you use the equals method.


You definitely shouldn't compare them that way. But it still allows you to. That becomes a fairly big issue where you have the behavior working when you test it out and suddenly it doesn't work when the values become larger. Sure, an experienced Java developer will know this. How about a developer that is new to the language? Not as likely. I've personally lost over 6 hours to that one years and years ago.


CPython has the same thing, although with reverse syntax: you can use is to compare -5 to 256 rather than ==. I assume Java is also doing some caching of that int range for speed reasons.


What I really like in the space is what OCaml and Smalltalk do, where ints are full objects unto themselves with essentially a bit in the object header that says "interpret the rest of this object header as an int and if you need to do object style ops, you don't need a pointer to the class as the class is int", so there's no boxed versus unboxed int distinction in the first place. So primitive types versus class types exists at the VM level, but not the language level.

Obviously Java doesn't have the luxury of that decision at this point. But a lot of goofy parts of languages are string of fairly rational choices with unintended consequences.


You're not supposed to, but I've certainly found code out there that does use == to compare boxed integers because the test cases they used happen to work.

And in fact when I googled to find the exact range where it switches from cached integers to truly creating new objects, the top link was at best highly misleading. https://www.tutorialspoint.com/check-two-numbers-for-equalit...


.valueOf() will sometimes give different instance, sometimes not.

If you want a instance, I recommend new :)


Well, unfortunately autoboxong will silently call valueOf(). So

    Integer val1 = 5;
    Integer val2 = 5;
    Integer val3 = 200;
    Integer val4 = 200;
will have you ending up with val1 and val2 being equal with ==, but val3 and val4 won't be.

The general reply is 'you're holding it wrong', but that doesn't make it any less absurd. That's what the PHPers say about "123" < "456A" < "78" < "123".


IDK it makes sense to me.

Unless you call `new` there's no guarantee that you have a different object.


I strongly disagree that these are big problems in practice. There are macros you can use to make newtypes more ergonomic to use and reduce boilerplate, etc. The advantage it gives is forced adherence to many implicit assumptions of library authors, which means that it's a lot easier (read: preventing subtle bugs) to compose different libraries. Meanwhile, allowing it might split and completely break the ecosystem.


Can you give examples of those macros? Very interested in using them.


The effort involved in (3) can be substantially reduced with some AsRef/AsMutRef/Deref/etc implementations on the wrapper type.

If I had to add 500 instances of ".0" to my code, I'd ask myself if I'm taking the right approach to the problem I'm solving. .0 basically means "I don't care about the abstraction the newtype provides, I need the thing inside of it." It will be necessary from time to time, sure, but 500 times? Maybe instead, I can make something like

    struct Wrapper<'a>(&'a Vec3<_>);
Then, I only put a Vec3 inside the Wrapper when I actually need to go serialize something.

"Allowing foreign traits on foreign types for executable projects only" relies on the assumption that every Rust file is either part of a library, or part of an executable. Never both. This assumption is already false because hybrid crates exist, but it's particularly faulty when you consider that the dichotomy of executables and libraries can be extremely blurry in some domains, like in the case of loadable kernel modules or embedded firmware. A systems language cannot make validity choices based on assumptions about underlying ABI/binary formats without kneecapping the language's usefulness.

The monopoly in serialization was inevitable regardless of their design choices about trait coherence. It's far more sane for everyone to agree on a single implementation rather than have 5 different feature flags so everyone can choose their favorite serialization lib. Imagine if you imported a library and discovered that it includes a bunch of functions that return johns_cool_library::Vec instead of std::Vec. Do we complain about std having a monopoly on vectors and strings? No.


The majority of Rust code isn’t kernel extensions or embedded firmware, and its not like those use cases suffer any downsides by allowing the majority some more leeway. Therefore this strikes me as perfect being the enemy of good, but I guess that is Rust‘s culture.


Most languages don't even have the anatomy of features that lead to this problem in the first place.

The real solution is to open a PR on your math library and add the derives yourself, or fork it. Which is coincidentally what you would have to do in almost any other language, since you can't usually derive an interface for a class when you don't own the interface and the class.


I think I'd implement Serialize on a different type and implement conversions from the foreign type to my serializable type.

That would mean that the locations where I want to serialize the foreign type would have to be marked with `.into()`, but that would be the sole maintenance overhead.


Would phantom types help in this case ? I know that rust supports it


3. should be one click in your IDE.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: