Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: A first project in Rust – in-memory order book (github.com/connorwstein)
91 points by cstein2 on Feb 22, 2020 | hide | past | favorite | 33 comments



Heh, I also released my first project in Rust yesterday. An ncurses gopher client. I probably missed many patterns and idioms as well. Starting with Rust is quite hard, even as an experienced developer. I spent hours "fighting" the compiler. Code available at https://github.com/jansc/ncgopher/


Could you talk a bit about the kinds of things you had to fight the compiler on?

Can you identify any idioms or approaches that seem normal or natural in other languages, but which are problematic in Rust?


Two examples: The application has two main components, a controller and a struct containing the ui implementation. Callbacks in the cursive library are called with a reference to Cursive, but I had a hard time figuring out how to pass the controller to the callback. In the end I decided to use a message channel (mpsc::Receiver and mpsc::Sender) to let the two components communicate. After some struggling I saw that cursive has a function to set user data which is accessible inside the callbacks. So I pass the channels through userdats. Not sure if this is the best architecture. I never managed to pass a reference to the controller to the ui callbacks. The Compiler complained about undefined lifetime of the reference, though I knew that both outlive the application runtime.

A similar situation occurred in the asynchronous communication with gopher servers. I use a thread to fetch a resource from another server while keeping the ui responsive. Passing the result from the thread back did not work out in the beginning.

I don’t think the idioms are so different in Rust, but when passing data between components, threads and the like, you have to be very specific in your code, using Arcs, RwLocks, Mutexes, etc.

On the bright sight, when the code compiles, it usually worked as intended. Using C or C++, there would be more race conditions, null pointer exceptions and more.


Feedback welcome! I'm sure I'm missing loads of Rust patterns and idioms...


Some comments:

Async/await syntax is poorly documented across the ecosystem but has landed now and is useful, without it codebases will look outdated soon enough, hyper itself has some of the best docs

Lazystatic is popular way to use globals like that yet don't really bother with it anymore. The paradigms are a bit different for a language, you are discouraged from such things for a reason, there's certainly ways around using globals.

You don't need extern crate with 2018 edition and macros are automatically imported


Instead of lazy static, use once_cell’s Lazy type[0]! It is a replacement that does away with the macro-based approach, possible thanks to improvements in const fn. This leads to improved IDE experience.

[0] https://docs.rs/once_cell/1.3.1/once_cell/


Thanks! What would you suggest instead of a global/lazy static in this situation though? I recall struggling to pass some dynamically allocated structure into the hyper service instantiation.


If this is intended as anything more than a toy, you don't want to store the full OpenLimitOrder in the book data structure.

The symbol should be allocated once on the book. The side is allocated twice in each set of vecs. The price is defined in the vec itself. The queue should be a vec of IDs and open size. The real struct should only be constructed lazily as absolutely needed.


I suspect it will remain a toy, but agreed on those optimizations


I don’t do Rust but that was nicely readable. One comment: using Enums for stock symbols presumably forces a recompile on each new symbol. What would be the idiomatic representation in Rust for things like that? Some kind of interned string?


A string works. If a numerical/in-place representation is wanted for efficiency, you could rely on the fact that stock symbols are always less than 8 characters, so we can parse it into an array of 8 bytes.


Is wrapping a String in a semi-opaque newtype (or a type alias at least) good to encode the semantic meaning of the string type?


Actually they are definitely not always <= 8 characters, but usually they are.


Great stuff! Since we're working on things like that on the daily, you have inspired me to do something like that in elixir :)


Why use a Vec instead of a B-Tree?


I was imagining inserts at an unoccupied price level would be relatively rare in normal order book operation, so a Vec just seemed simpler


They aren't that unusual on an absolute basis but relative to the number of orders in a stock they are quite rare (iirc - it's been a while since I was directly involved in this).

This is a cool project btw - I'm looking for my first chance to really sink my teeth into rust so found it interesting. A few non-rust observations (you probably know all this but kept your orderbook simple which is totally cool but just in case):

An actual order book has a "tick ladder" which specifies which price points are valid for a given ticker on a given venue. This means you actually know all the possible prices ahead of time. If you ever wanted to productionize this you could probably take advantage of that and make a custom data structure that used the fact that you had all the keys preset and did the insert and lookup in O(1).

Secondly in real life the matching logic is quite a bit more complex because of order flags and venue logic. If you're just covering US venues you don't need to worry about venues because of Reg NMS but anywhere else you'll want the order books to actually by venue and symbol not just symbol.

Order flags interfere with matching in a number of ways. Firstly a normal limit order will partially match up to the limit and any remaining quantity will sit on the book waiting to be matched. You can flag the order as "fill or kill" to say fill the whole order or nothing and you can flag it as "immediate or cancel" to say don't put any remaining quantity passively on the book. There are many other order flags but those are probably the most important ones for matching.

Secondly on many venues orders can have an iceberg order type which means that only the "tip" quantity gets shown on the book but they will actually match up to a larger quantity at that price.

Finally you can add a "minimum executed quantity" on any passive orders to prevent pinging (information leakage). This will only match with incoming orders bigger than the MXQ, so messes with the normal price, time ordering. Orders behind that order could match first potentially. MXQ is more important for dark venues or order types so maybe you don't need to consider it.


Thanks for the info! I definitely did not know all of that :). I do work at a crypto exchange, but on the custody/stablecoin side, not the exchange team.

Can you elaborate a little on the "pinging" process to extract information? Sounds interesting


You are most welcome. I'd be somewhat surprised if crypto exchanges supported all the same stuff I mentioned - it's based on my history on equity exchanges and I'd expect crypto to be different.

Anyway the way MXQ and pinging works is imagine you have an orderbook where passive orders are not visible to other participants (ie the venue is dark or you have a grey venue where some orders can be flagged as dark). It's possible fro participants to pay to learn what orders are in the book by making some very small aggressive orders and seeing where they match. In the world of equities this would be buy 1 share agressively, sell 1 share agressively with a limit in each case. This is called "pinging" because it's like ICMP ping discovering whether hosts are alive. The concept in crypto (if it applies) would be the smallest quantity the exchange allows you to buy or sell.

Assuming the venue supports it, you can prevent your dark orders matching these pings by setting an MXQ. If your MXQ is 100 say, they would need to buy 100 shares for your order to match. You would set the size to a point where you don't mind giving up the information about where your order level is because the match is meaningful enough.


Also, why use Uuids instead of a pair of user_id: u64, order_id: u64?


Leaking the number of orders seemed like a bad idea


Are you planning to hook your order book up to a data feed like, for example, binance[1] to see how your book performs?

[1] https://github.com/binance-exchange/binance-official-api-doc...


Yeah I was planning on playing around with the performance/optimizing it. Didn't know about that binance data feed though, thanks.


Not the OP, but presumably to minimise the amount of jumping around in memory. Best orderbook is a small matrix, if you can get away with it (for trading you often don't need the whole orderbook, or at least don't need the whole thing on the hot path, but for an exchange obviously there's less scope for approximations).


Assuming you're doing some sort of strategy where the order book is important, like market making or hedging another position, it seems kind of dangerous not to keep a full book, no? If the market starts to move and your limited book starts to represent a lack of bids or offers, that doesn't seem to be a great idea at first glance.


In a product with a big tick size, to react to any single event you'll probably only need to know what's happening on the top two or three price levels (as it's super unlikely to move more than that in a single market data update). Then after reacting you can update your big, slow book and generate a new fast, approximate book from that, in preparation for the next market data update.


I agree that can make sense for something like ES, but what about crypto? ;)

Edit: maybe non-penny-pilot options trading > $3 are a better example of a product with a large tick size.


For crypto the network overhead and variance in the exchange probably makes the performance of the order book implementation irrelevant, unless it's horribly slow, haha.


Some exchanges are doing colocation and are investing resources to greatly improve matching engine performance. It's become a little better than when they were all written in Ruby and hosted on a cheap cloud service.


Ah that's cool. Still I'd be surprised if there's many that are deterministic enough that a microsecond or two of response latency makes a big difference.


Why a B-tree if it's supposed to be in-memory? A heap would be easier to implement and more efficient.


Rust provides a built in B-tree[1] and it offers this over a binary search tree to avoid pointer chasing. With respect to heaps, are you thinking that a BinaryHeap[2] could be used here? I don't see how such a structure allows you to arbitrarily remove/update orders in the book. Looks like you can just insert arbitrary orders by price (using push(x)) and then get the best bid or offer (using pop()). Am I missing something? If the API was changed to be more like a binary search tree, you don't think the pointer chasing would reduce performance for larger books?

[1] https://doc.rust-lang.org/std/collections/struct.BTreeMap.ht...

[2] https://doc.rust-lang.org/std/collections/struct.BinaryHeap....


I missed the fact that it was supposed to be more than just a priority queue. I would have said binary search tree instead of heap, but it makes sense now I know that B-tree is built in.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: