Question is how to do these in Rust. An example might be a browser DOM: each node has a parent pointer, a list of child pointers, left and right sibling pointers, maybe a CSS node pointer, etc. Inserting or deleting nodes has to repair the pointers to and from the neighboring nodes as needed.
I know this is doable since obviously Servo (Rust’s initial driving application) has to do it. I hope the answer doesn’t involve the word “unsafe”. But I am quite unclear about how to create such a structure under Rust’s rules of pointer ownership, and also how to reliably reclaim storage when nodes and trees/subtrees are deleted. Plus there will be thread safety rules that should be statically enforced if possible.
I’ve heard that doubly linked lists in Rust are handled by an unsafe library, yet this seems even worse. Thanks.
Basically, if your tree is static or you only add nodes, the easiest option is to store all nodes in a Vec and have the child/parent links be indices. I recommend the typed-index-collection crate if you go that route.
If you need to move/delete nodes a lot then either Rc<Refcell< or you can wrap the Vec in something that takes care of the admin of fixing up pointers.
E.g. see this crate https://crates.io/crates/orx-tree
If you want to get really fancy you can even do things like compacting GC. You’re basically implementing a memory allocator at that point.
Note that you might say “what’s the point of Rust if I just have to implement a memory allocator to bypass the borrow checker?” but that’s silly. You still get the benefits of Rust for the rest of your code, and Rust still prevents things like type confusion and UB.
Wow, this crate looks like the most feature-rich tree crate I’ve ever seen!
It seems very underrated (only ~1000 downloads and one star on GitHub (by me)).
Thank you for the suggestion!😊
The safe, fast and easy way to do trees is by using Rc<RefCell<T>>. Rc/Arc allows data to be owned multiple times. You want this because this way a node can be referenced by its parent and its child at the same time. However, Rc makes the inner type inmutable. And you probably will want to mutate it in a tree, that’s what RefCell is for. With RefCell you do the borrow checking at run-time instead of at compile-time. This allows you to mutate T even though Rc only gives you an inmutable reference. This is called interior mutability.
RefCell doesn’t eliminate the borrow checker though, you must still follow its rules. If you try to get 2 mutable references to the inner type of RefCell, it will panic.
I know you don’t want to read unsafe, but you gotta hear about the alternative. Just use pointers. Pointers don’t have the borrow checker to restrict them. And self-referencing structures with interior mutability are not easy to borrow-check automatically. You can have the raw pointers as private fields of the struct so the code that is actually unsafe will be a few very small functions.
Here’s why the other options worse than pointers:
Rc<RefCell<T>> will clutter your code with boilerplate and it’s a pain to deal with. Pointers are not too ergonomic in rust (mainly because there is no
->
operator), but they need way less boilerplate. Also, you already need to manually check the mutability rules, why not all the rules.Another option that I’ve seen is “have a hashmap with all the nodes, and just store the id of the node instead of a reference”. This is the same as “have all the nodes on a Vector and store the index”. Think about this for a second. You have a pool of memory and a number that identifies what part of that pool is the memory you want. Seen it yet? That is exactly what a pointer is! If you do that, you’re just disabling the borrow-checker anyway. You just created your own memory allocator and will have to manage your memory manually, at that point just use pointers, it will be the same except with fewer boilerplate and indirection.
Thanks, that is interesting about using a refcell. It hadn’t occurred to me that a refcell was considered immutable for purposes of Rc. What about the issue of refcounting cylic structures though? I thought the route around that was weakrefs per someone else’s suggestion.
By pointers, do you mean unsafe pointers that aren’t borrow checked? I guess that’s not really worse than writing a little bit of the program in C so it may be the right approach pragmatically. So I’m again wondering about Servo. I guess eventually Rust will get verification tools that let you check the soundness of “unsafe” code too. So the discomfort might only be temporary.
Sure, hashmap or numeric indices are abstractly equivalent to pointers, but Rust is supposed to be a low level language that can deal with machine primitives, and pointers are machine primitives. Simulating them through extra levels of tables is unsatisfying. Also, the uses of those pseudo-pointers are no longer checked by the compler, so your program is vulnerable to many (not all) of the same old-fashioned pointer errors that Rust was supposed to rescue us from.
RefCell is neither mutable nor immutable. It’s a type like any other. What is special about RefCell is that it has a method like:
fn borrow_mut(&self) -> &mut T
Which means you can get a mutable reference to its contents by only having a normal reference to the RefCell.
By pointers I mean raw pointers. The pointers themselves are not unsafe. They are just normal pointers like you would have in C.
Rc can be used to generate weak refs. Which is what you want for your tree.
I don’t know about servo. So I can’t tell you much about it.
Don’t hope too much about the temporary unsafe thing. It’s not practical (maybe impossible) to make a safety checker that checks the entire program. The practical way is to make safe abstractions using unsafe code. For example rust itself is built upon plenty of unsafe code, however, it provides to you the appropriate abstractions so what you do is safe.
In this case. You can have a bit of unsafe code on your tree, so that the users of that tree have a safe API to do something that you needed unsafe code for.
For example one of the cases where you cannot automatically check the safety is in dereferencing a raw pointer returned by a FFI function call to a dynamic library. Your automatic safety checker would need to be able to read the library’s documentation. And at that point it’s not a real safety checker because the documentation may lie or have bugs.
One way of solving this is to structure all of your nodes into a HashMap with the node ID as the key and the node type as the value. The underlying node type could have node IDs for referencing purposes. You lose the ability to reference the parent/child/sibling directly, but you avoid direct circular dependencies. That said, now you need to manage dangling references for when the node is removed from the main HashMap collection.
One way to do this is to use reference-counting pointers such as
std::rc::Rc
orstd::sync:Arc
. The parent node can hold a strong reference to each child node and each child node has aWeak
reference to its parent.One way of solving this is to structure all of your nodes into a HashMap with the node ID as they key and the node type as the value. The underlying node type could have node IDs for referencing purposes. You lose the ability to reference the parent/child/sibling directly, but you avoid direct circular dependencies. That said, now you need to manage dangling references for when the node is removed from the main HashMap collection.
Doubly linked list is one of std’s collections. And safety in Rust is built on top of unsafely, because there is no way around that.
Did you try to actually look up literally anything before asking?! Because simply checking out
std::collections
docs would have given you some answers.