Hacker News new | past | comments | ask | show | jobs | submit login
Untapped potential in Rust's type system (jakobmeier.ch)
201 points by lukastyrychtr on June 14, 2021 | hide | past | favorite | 33 comments



Note, you have to be quite careful with typeid - you can end up with types not matching because you have two different versions of a crate.

Normally this would be a compile time error, but using typeid can turn this into a runtime error.

https://docs.rs/assert-type-eq/0.1.0/assert_type_eq/ allows you to assert that two types are the same, forcing a compile time error for this scenario.


Off-topic mostly, but is this sort of "single line" function/macro typical for Rust crates? This seems like something I would expect from a Node library, not a serious programming language community that cares about robust software.


Macros can be quite hard to understand, so having a simple crate wrapping it with extensive documentation and tests can be quite beneficial. And yes, this also applies to node. The attack is entirely unnecessary.


This crate has no tests, and no documentation beyond a usage example; it would do much better as a copy/paste than an entire dependency, in any ecosystem. It would be great to have as part of some extended "utils" package if it can't be included in the standard library (something like C++'s boost or Java's Apache Commons).


I think this is a valid question. C++ provides std::is_same<U, T>, why do Rust users need an external dependency (especially given the fact that the type system is one of Rust's major strengths)?

My best guess, going by a comment in the source code, is that this macro will not be necessary in the future, and therefore makes less sense to put in the standard library:

> Until RFC 1977 (public dependencies) is accepted, the situation where multiple different versions of the same crate are present is possible.

https://github.com/rust-lang/rfcs/pull/1977


Rust is pretty conservative with the standard library, and macros have a really high bar historically, because they were not namespaced.


Well, being conservative[a] with the standard library is why JavaScript’s is so small (leading to things like the `is-array` package)

[a]: Not so much conservative, but just not caring


That isn't a single line macro. Did you mean a single macro crate? Most crates in rust have more than a single macro/function, but some do. Regardless, it's not at all related to how "robust" or not the language ecosystem is.


It's 17 lines of code (11 if you don't count the white space). If you find it somewhere and copy it to your utils package, that's great! If you add a dependency for every little utility you may need, you're going to end up with huge space for ecosystem attacks, as we're seeing in the Node ecosystem. Especially for something like this, where people can add it as a dependency because they don't want to understand macro code (there is another comment saying just that).


Take a look at the code: https://github.com/Metaswitch/assert-type-eq-rs/blob/master/...

I'd rather not write that myself.


I've looked at it, that is why I knew it is very little code. It's definitely something I'd copy off StackOverflow if I really needed it, but to include it as a dependency is just weird.


It is relatively common to have narrow scope crates in Rust.


An interesting, yet somewhat unrelated discussion could be had about cases where one might want the type ID to not change when the type changes structurally. There are many structures in many ABI interfaces that are structurally different between releases, yet still binary compatible. This particular characteristic has always been defined adhoc, and I do wonder what a more formal system might mean. This almost feels like the issue of identity in distributed systems.


> There are many structures in many ABI interfaces that are structurally different between releases, yet still binary compatible.

These should be defined in the API as simple newtype wrappers over some basic binary-level type (generally either unsigned binary word or u8 array) with conversions to and from safer higher-level types defined in Rust code, leaving it to the compiler to optimize these to no-ops whenever possible. This ensures that "binary compatible" types also keep the same structural identity, and conversely, that binary incompatible types are automatically detected as well.


Code generation probably works the best here, although it makes the build system more complex. You would have the typeid mappings as an actually existing file that you can commit into your git repo.

Or if your language supports metaprogramming (something like Nim, Zig, or Jai), you could write compile-time code that maps each of your annotated types into deterministically-determined ids. (An example of this in Nim: https://gist.github.com/PhilipWitte/dd6c670fca3baf573490)


I am currently working on a configuration manager (to get rid of Ansible for my needs). To support running code as different users, I spawn multiple processes. I want to be able to call functions across processes without limitations, and be forward-compatible with a system where the processes could actually be on different machines.

I basically had similar goals and implemented it with a similar design as the one described in the last part. I had the same interrogations about what should should be captured by a universal type id. In my own code, I decide to punt the discussion and require the user code to pick a globally unique name. I planned initially on using Typetag [0], a library achieving the same result as `UniversalId` from the article. The reason why I couldn't use `Typetag` in the end is the lack of support for generics. Generics are a tough problem to deal with when deriving an id because it is not clear how they should influence it.

In my case I have a common pattern where structs are defined as:

    Message<S: AsRef<str>> {
        message: S,
    }
This allows me define methods on both borrowed (`Message<&str>`) and owned (`Message<String>`) versions without issues. When remaining inside the process, the borrowed version is passed around and when sent across the process boundaries I can deserialize it to an owned version. This pattern prevents me from using Typetag and I am still not sure how it should be solved.

A related problem is also how the registry is built on the receiving end. In my project the registry is built manually, similarly to the example in the article. There are also crates such as Inventory [1] and Linkme [2]. Which allow to mark types at their definition point and then collect them in order to register them.

[0]: https://github.com/dtolnay/typetag

[1]: https://github.com/dtolnay/inventory

[2]: https://github.com/dtolnay/linkme


I've written a lot of structs that have "maybe borrowed" contents. I usually reach for std::borrow::Cow for this. It's not exactly the same: we need to branch at runtime instead of having multiple generated codepaths.

    struct Message<'a> {
        message: Cow<'a, str>,
    }


I agree that in this case I can move to branch at runtime (the runtime cost is irrelevant for my use case), I still feel that it illustrates how it complicates deriving a universal type id.

The more general issue is that removing generic requires the lib to pick the implementation and remove choice from the consumer. Sometimes it's not that important, sometimes it matters more. Another example I have in my project is that I have a trait describing a `User` (with e.g. `get_name`) it may be implemented as a `LinuxUser` or `WindowsUser`, each providing extra fields. How do you generate a universal id for a struct generic over a `User`?


I've always wondered if it's possible to use Rust' ownership chains to attribute memory usage? I.e. calculate total bytes retained by an object?


At my work, we have developped `loupe`, a Rust crate that does precisely that, https://github.com/wasmerio/loupe/. I'm the main author of it.

`loupe` provides the `MemoryUsage` trait; It allows to know the size of a value in bytes, recursively. So it traverses most of the types, and its fields or variants as deep as possible. Hopefully, it tracks already visited values so that it doesn't enter an infinite loupe loop.

We are using it inside Wasmer, a WebAssembly runtime.


I don't think ownership really lets you do anything more than what "sizeof" tells you in C. In particular:

- Some types have dynamic memory allocation (Vec, HashMaps etc...), so those would have to be computed at runtime and can change at any moment.

- Some types have shared ownership (Rc/Arc), so it's unclear how you would measure memory usage then.

- Some types, especially in foreign interfaces, will effectively just hold a pointer to some black box data, you'd need a special API to figure out how much memory it hides. For instance what's the memory usage of a database handle or a JPEG compression library context?

- When you care about memory usage things like fragmentation are usually very important, and the amount of memory used by a given object can be misleading. If you have a string that takes up 12 bytes but it's the only object left in the middle of a 4KiB page, then just counting "12 bytes" for this object is misleading because you have a huge fragmentation overhead.

The only advantage of the borrow checker is that safe Rust forces you to make ownership relationships explicit but not all Rust code is safe and there are many escape hatches that muddy the water (like Rc/Arc mentioned above, but also threads and a few other things).


> - Some types have dynamic memory allocation (Vec, HashMaps etc...), so those would have to be computed at runtime and can change at any moment.

I thought that's exactly the question that was being asked? Something like a htop for a rust program. This could be helpful in improving code efficiency.

> - Some types have shared ownership (Rc/Arc), so it's unclear how you would measure memory usage then.

Same way we measure the memory usage of programs running on systems that support shared libraries and mmapped data. Seeking perfection here is the obstactal; the goal is to attribute memory usage to an object (and preferably, a function call) so that we can improve its characteristics.

> - Some types, especially in foreign interfaces, will effectively just hold a pointer to some black box data, you'd need a special API to figure out how much memory it hides. For instance what's the memory usage of a database handle or a JPEG compression library context?

It's true that if you integrate with external systems, you don't get the benefits of the system you chose to make your home. This doesn't mean you should try to get as much benefit as possible.

> - When you care about memory usage things like fragmentation are usually very important, and the amount of memory used by a given object can be misleading. If you have a string that takes up 12 bytes but it's the only object left in the middle of a 4KiB page, then just counting "12 bytes" for this object is misleading because you have a huge fragmentation overhead.

This is interesting to me, and I don't know much about it. It sounds like it's not an actual limitation to the utility of the tool, but it should certainly guide how it is built and how its results are interpreted.

> The only advantage of the borrow checker is that safe Rust forces you to make ownership relationships explicit but not all Rust code is safe and there are many escape hatches that muddy the water (like Rc/Arc mentioned above, but also threads and a few other things).

In conclusion, I don't think that renders the activity pointless. The fact that hazy information is hazy and requires careful interpretation doesn't make it useless. People find test cases and static type checks useful even though they don't answer the question "is my code correct". And we're all the time relying on half measures that answer parts of the questions here (I once developed a program and used the load averages from uptime to tell me if it was performant enough :/). The pursuit of perfection is the enemy of improvement. It might be that there is some fatal flaw, but I don't think you've mentioned any here.


I don't think the ownership system gives you that directly. But I think you could make a trait with a #[derive] macro which could add up the owned memory of an object. And for complex objects, recursively calculate the owned memory for all of an object's fields.


Shouldn't that be possible with any memory management system? I believe Java has a tree behind the scenes in memory for management.


The Java model ties one object to another. The rust model ties a consumed memory to a function. A debug log output that shows how much memory this function call required, in the same way it shows how much time it uses?


It didn't use ownership, and I don't think that's the paradigm, but I did something like this in D by tagging allocating members with a user defined attribute which counted in bytes the allocated memory the summed it for the whole tree (structure).

Not worth the effort but it can be done in not much code


I think the plan is to support this kind of thing with per-collection allocators - https://github.com/rust-lang/wg-allocators


Heh, this sounds just like the global event system I made for my D window manager: https://github.com/weltensturm/flatman/blob/master/common/ev... (which has some extra jingles with method annotations and event filters for convenience)


Interesting article, but I think the key to writing idiomatic Rust is not to stretch what the type system can do but rather be happy at what It can express and avoid unnecessary abstraction. The compile-time guarantees that we have to prove in Rust, also serve to give a hint for when not to abstract.


I think pushing the boundaries of what is idiomatic can sometimes be valuable. Look how much idiomatic C++ has changed since 1995. Or what is idiomatic in JavaScript now.


"sometimes" is doing a lot of heavy lifting here.


Sometimes life hands you lemons, and sometimes it hands you Box<dyn Fruit>.


This article is about curiosity, not best practices.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: