Rust Pointers for C Programmers
I’ve been eyeing Rust for about a year now. Here and there, I tried to use it to make a silly little program, or to implement some simple function in it to see for myself how ergonomic it really was, and what sort of machine code rustc spit out. But last weekend I found a need for a tool to clean up some preprocessor mess, and so instead of hacking together some combination of shell and Python, I decided to write it in Rust.
From my earlier attempts, I knew that there are a lot of different “pointers” but I found all the descriptions of them lacking or confusing. Specifically, Rust calls itself a systems programming language, yet I found no clear description of how the different pointers map to C—the systems programming language. Eventually, I stumbled across The Periodic Table of Rust Types, which made things a bit clearer, but I still didn’t feel like I truly understood.
During my weekend expedition to Rust land, I think I’ve grokked things enough to write this explanation of how Rust does things. As always, feedback is welcomed.
I’ll describe what happens in terms of C. To keep things simple, I will:
- assume that you are well-versed in C
- assume that you can read Rust (any intro will teach you enough)
- not bother with const for the C snippets
- not talk about mutability
In the following text, I assume that we have some struct T. The actual contents don’t matter. In other words:
struct T { /* some members */ };
With that out of the way, let’s dive in!
*const T and *mut T
These are raw pointers. In general, you shouldn’t use them since only unsafe code can dereference them, and the whole point of Rust is to write as much safe code as possible.
Raw pointers are just like what you have in C. If you make a pointer, you end up using sizeof(struct T *) bytes for the pointer. In other words:
struct T *ptr;
&T and &mut T
These are borrowed references. They use the same amount of space as raw pointers and behave same exact way in the generated machine code. Consider this trivial example:
#[no_mangle] pub fn raw(p: *mut usize) { unsafe { *p = 5; } } #[no_mangle] pub fn safe(p: &mut usize) { *p = 5; }
A rustc invocation later, we have:
raw() raw: 55 pushq %rbp raw+0x1: 48 89 e5 movq %rsp,%rbp raw+0x4: 48 c7 07 05 00 00 movq $0x5,(%rdi) 00 raw+0xb: 5d popq %rbp raw+0xc: c3 ret safe() safe: 55 pushq %rbp safe+0x1: 48 89 e5 movq %rsp,%rbp safe+0x4: 48 c7 07 05 00 00 movq $0x5,(%rdi) 00 safe+0xb: 5d popq %rbp safe+0xc: c3 ret
Note that the two functions are bit-for-bit identical.
The only differences between borrowed references and raw pointers are:
- references will never point at bogus addresses (i.e., they are never NULL or uninitialized),
- the compiler doesn’t let you do arbitrary pointer arithmetic on references,
- the borrow checker will make you question your life choices for a while.
(#3 gets better over time.)
Box<T>
These are owned “pointers”. If you are a C++ programmer, you are already familiar with them. Never having truly worked with C++, I had to think about this a bit until it clicked, but it is really easy.
No matter what all the documentation and tutorials out there say, Box<T> is not a pointer but rather a structure containing a pointer to heap allocated memory just big enough to hold T. The heap allocation and freeing is handled automatically. (Allocation is done in the Box::new function, while freeing is done via the Drop trait, but that’s not relevant as far as the memory layout is concerned.) In other words, Box<T> is something like:
struct box_of_T { struct T *heap_ptr; };
Then, when you make a new box you end up putting only what amounts to sizeof(struct T *) on the stack and it magically starts pointing to somewhere on the heap. In other words, the Rust code like this:
let x = Box::new(T { ... });
is roughly equivalent to:
struct box_of_t x; x.heap_ptr = malloc(sizeof(struct T)); if (!x.heap_ptr) oom(); *x.heap_ptr = ...;
&[T] and &mut [T]
These are borrowed slices. This is where things get interesting. Even though it looks like they are just references (which, as stated earlier, translates into a simple C-style pointer), they are much more. These types of references use fat pointers—that is, a combination of a pointer and a length.
struct fat_pointer_to_T { struct T *ptr; size_t nelem; };
This is incredibly powerful, since it allows bounds checking at runtime and getting a subset of a slice is essentially free!
&[T; n] and &mut [T; n]
These are borrowed references to arrays. They are different from borrowed slices. Since the length of an array is a compile-time constant (the compiler will yell at you if n is not a constant), all the bounds checking can be performed statically. And therefore there is no need to pass around the length in a fat pointer. So, they are passed around as plain ol’ pointers.
struct T *ptr;
T, [T; n], and [T]
While these aren’t pointers, I thought I’d include them here for completeness’s sake.
T
Just like in C, a struct uses as much space as its type requires (i.e., sum of the sizes of its members plus padding).
[T; n]
Just like in C, an array of structs uses n times the size of the struct.
[T]
The simple answer here is that you cannot make a [T]. That actually makes perfect sense when you consider what that type means. It is saying that we have some variable sized slice of memory that we want to access as elements of type T. Since this is variable sized, the compiler cannot possibly reserve space for it at compile time and so we get a compiler error.
The more complicated answer involves the Sized trait, which I’ve skillfully managed to avoid thus far and so you are on your own.
Summary
That was a lot of text, so I decided to compact it and make the following table. In the table, I assume that our T struct is 100 bytes in size. In other words:
/* Rust */ struct T { stuff: [u8; 100], } /* C */ struct T { uint8_t stuff[100]; };
Now, the table in its full glory:
Rust | C | Size on ILP32/LP64 (bytes) |
|
---|---|---|---|
Value | let x: T; |
struct T x; |
100/100 |
Raw pointer | let x: *const T; let x: *mut T; |
struct T *x; |
4/8 |
Reference | let x: &T; let x: &mut T; |
struct T *x; |
4/8 |
Box | let x: Box<T>; |
struct box_of_T { struct T *heap_ptr; }; struct box_of_T x; |
4/8 |
Array of 2 | let x: [T; 2]; |
struct T x[2]; |
200/200 |
Reference to an array of 2 |
let x: &[T; 2]; |
struct T *x; |
4/8 |
A slice | let x: [T]; |
struct T x[]; |
unknown at compile time |
A reference to a slice |
let x: &[T]; |
struct fat_ptr_to_T { struct T *ptr; size_t nelem; }; struct fat_ptr_to_T x; |
8/16 |
A word of caution: I assume that the sizes of the various pointers are actually implementation details and shouldn’t be relied on to be that way. (Well, with the exception of raw pointers - without those being fixed FFI would be unnecessarily complicated.)
I didn’t cover str, &str, String, and Vec<T> since I don’t consider them fundamental types, but rather convenience types built on top of slices, structs, references, and boxes.
Anyway, I hope you found this useful. If you have any feedback (good or bad), let me know.
Comment by Michael "notriddle" Howell — January 28, 2018 @ 17:00
Comment by JeffPC — January 28, 2018 @ 18:12
Comment by Ted Mielczarek — January 29, 2018 @ 00:04
Comment by JeffPC — January 29, 2018 @ 03:33
Small nitpick, if you just read "let x: &T;", you cannot quite say that the size of x is 4/8 or that it corresponds to "struct T *x;".
If T happens to be a Trait, x is a Trait Object, which is also represented as a fat pointer in rust. And so its size will be 8/16 and its representation will be a struct with a pointer to data and a vtable pointer.
Comment by Anonymous — June 30, 2018 @ 15:04
Comment by anp — June 30, 2018 @ 15:57
Comment by Frans-Willem — June 30, 2018 @ 20:44
Anonymous: I have to admit that I haven't looked enough at traits to understand them at a level beyond the trivial thing you see on the surface. I'm sure that trait object pointers are useful for something, but I can't say that I can think of any. Anyway, thanks for the warning about trait objects being fat pointers.
anp: Right. I meant that as a you cannot just declare one statically. It is still a valid type, and you can certainly have code that produces one. (Also, unsafe code can do just about anything, so no surprise there.)
Frans-Willem: Agreed. Once I understood all the rust pointer types, I remembered that C++ has these fancy pointers, which much like Box<T> in rust are not real pointers but rather structs/classes wrapping a pointer. I just happen to not write C++ and to not think in C++ terms. :)
Comment by Jeffpc — July 2, 2018 @ 11:54
Comment by Guillaume — July 8, 2018 @ 10:47
Comment by JeffPC — July 25, 2018 @ 21:34
Comment by Adam Schabtach — June 22, 2021 @ 15:24