Rust Pointers for C Programmers

Filed under:

— JeffPC @ January 28, 2018 15:47

I’ve been eyeing Rust for about a year now. Here and there, I tried to use it to make a silly little program, or to implement some simple function in it to see for myself how ergonomic it really was, and what sort of machine code rustc spit out. But last weekend I found a need for a tool to clean up some preprocessor mess, and so instead of hacking together some combination of shell and Python, I decided to write it in Rust.

From my earlier attempts, I knew that there are a lot of different “pointers” but I found all the descriptions of them lacking or confusing. Specifically, Rust calls itself a systems programming language, yet I found no clear description of how the different pointers map to C—the systems programming language. Eventually, I stumbled across The Periodic Table of Rust Types, which made things a bit clearer, but I still didn’t feel like I truly understood.

During my weekend expedition to Rust land, I think I’ve grokked things enough to write this explanation of how Rust does things. As always, feedback is welcomed.

I’ll describe what happens in terms of C. To keep things simple, I will:

assume that you are well-versed in C
assume that you can read Rust (any intro will teach you enough)
not bother with const for the C snippets
not talk about mutability

In the following text, I assume that we have some struct T. The actual contents don’t matter. In other words:

struct T {
	/* some members */
};

With that out of the way, let’s dive in!

const T and mut T

These are raw pointers. In general, you shouldn’t use them since only unsafe code can dereference them, and the whole point of Rust is to write as much safe code as possible.

Raw pointers are just like what you have in C. If you make a pointer, you end up using sizeof(struct T *) bytes for the pointer. In other words:

struct T *ptr;

&T and &mut T

These are borrowed references. They use the same amount of space as raw pointers and behave same exact way in the generated machine code. Consider this trivial example:

#[no_mangle]
pub fn raw(p: *mut usize) {
    unsafe {
        *p = 5;
    }

}

#[no_mangle]
pub fn safe(p: &mut usize) {
    *p = 5;
}

A rustc invocation later, we have:

raw()
    raw:     55                 pushq  %rbp
    raw+0x1: 48 89 e5           movq   %rsp,%rbp
    raw+0x4: 48 c7 07 05 00 00  movq   $0x5,(%rdi)
             00 
    raw+0xb: 5d                 popq   %rbp
    raw+0xc: c3                 ret    

safe()
    safe:     55                 pushq  %rbp
    safe+0x1: 48 89 e5           movq   %rsp,%rbp
    safe+0x4: 48 c7 07 05 00 00  movq   $0x5,(%rdi)
              00 
    safe+0xb: 5d                 popq   %rbp
    safe+0xc: c3                 ret

Note that the two functions are bit-for-bit identical.

The only differences between borrowed references and raw pointers are:

references will never point at bogus addresses (i.e., they are never NULL or uninitialized),
the compiler doesn’t let you do arbitrary pointer arithmetic on references,
the borrow checker will make you question your life choices for a while.

(#3 gets better over time.)

Box<T>

These are owned “pointers”. If you are a C++ programmer, you are already familiar with them. Never having truly worked with C++, I had to think about this a bit until it clicked, but it is really easy.

No matter what all the documentation and tutorials out there say, Box<T> is not a pointer but rather a structure containing a pointer to heap allocated memory just big enough to hold T. The heap allocation and freeing is handled automatically. (Allocation is done in the Box::new function, while freeing is done via the Drop trait, but that’s not relevant as far as the memory layout is concerned.) In other words, Box<T> is something like:

struct box_of_T {
	struct T *heap_ptr;
};

Then, when you make a new box you end up putting only what amounts to sizeof(struct T *) on the stack and it magically starts pointing to somewhere on the heap. In other words, the Rust code like this:

let x = Box::new(T { ... });

is roughly equivalent to:

struct box_of_t x;

x.heap_ptr = malloc(sizeof(struct T));
if (!x.heap_ptr)
	oom();

*x.heap_ptr = ...;

&[T] and &mut [T]

These are borrowed slices. This is where things get interesting. Even though it looks like they are just references (which, as stated earlier, translates into a simple C-style pointer), they are much more. These types of references use fat pointers—that is, a combination of a pointer and a length.

struct fat_pointer_to_T {
	struct T *ptr;
	size_t nelem;
};

This is incredibly powerful, since it allows bounds checking at runtime and getting a subset of a slice is essentially free!

&[T; n] and &mut [T; n]

These are borrowed references to arrays. They are different from borrowed slices. Since the length of an array is a compile-time constant (the compiler will yell at you if n is not a constant), all the bounds checking can be performed statically. And therefore there is no need to pass around the length in a fat pointer. So, they are passed around as plain ol’ pointers.

struct T *ptr;

T, [T; n], and [T]

While these aren’t pointers, I thought I’d include them here for completeness’s sake.

T

Just like in C, a struct uses as much space as its type requires (i.e., sum of the sizes of its members plus padding).

[T; n]

Just like in C, an array of structs uses n times the size of the struct.

[T]

The simple answer here is that you cannot make a [T]. That actually makes perfect sense when you consider what that type means. It is saying that we have some variable sized slice of memory that we want to access as elements of type T. Since this is variable sized, the compiler cannot possibly reserve space for it at compile time and so we get a compiler error.

The more complicated answer involves the Sized trait, which I’ve skillfully managed to avoid thus far and so you are on your own.

Summary

That was a lot of text, so I decided to compact it and make the following table. In the table, I assume that our T struct is 100 bytes in size. In other words:

/* Rust */
struct T {
    stuff: [u8; 100],
}

/* C */
struct T {
	uint8_t stuff[100];
};

Now, the table in its full glory:

	Rust	C	Size on ILP32/LP64 (bytes)
Value	let x: T;	struct T x;	100/100
Raw pointer	let x: const T; let x: mut T;	struct T *x;	4/8
Reference	let x: &T; let x: &mut T;	struct T *x;	4/8
Box	let x: Box<T>;	struct box_of_T { struct T *heap_ptr; }; struct box_of_T x;	4/8
Array of 2	let x: [T; 2];	struct T x[2];	200/200
Reference to an array of 2	let x: &[T; 2];	struct T *x;	4/8
A slice	let x: [T];	struct T x[];	unknown at compile time
A reference to a slice	let x: &[T];	struct fat_ptr_to_T { struct T *ptr; size_t nelem; }; struct fat_ptr_to_T x;	8/16

A word of caution: I assume that the sizes of the various pointers are actually implementation details and shouldn’t be relied on to be that way. (Well, with the exception of raw pointers - without those being fixed FFI would be unnecessarily complicated.)

I didn’t cover str, &str, String, and Vec<T> since I don’t consider them fundamental types, but rather convenience types built on top of slices, structs, references, and boxes.

Anyway, I hope you found this useful. If you have any feedback (good or bad), let me know.

Comments (11)

11 Comments »

This is in your table:
```
    let x: Box;
```
It should probably be:
```
    let x: Box<T>;
```
Comment by Michael "notriddle" Howell — January 28, 2018 @ 17:00
You're right. I did have <T> there, but because of how I created the table my blahging software passed it along as raw HTML. Anyway, I fixed it. Thanks!
Comment by JeffPC — January 28, 2018 @ 18:12
Nice post! One thing to note: unless you explicitly use #[repr(C)] on a struct, the memory layout of its members will not necessarily match that of a C struct--the Rust compiler is free to rearrange members to minimize the need for padding, so the size may not match an otherwise-equivalent C struct. If you do use #[repr(C)] then it will match, naturally.
Comment by Ted Mielczarek — January 29, 2018 @ 00:04
Thanks! Good point about the structure member ordering. I want to play more with FFI—specifically, implementing functions in Rust that C applications can use directly. Maybe I'll blahg about it at some point ;)
Comment by JeffPC — January 29, 2018 @ 03:33
Small nitpick, if you just read "let x: &T;", you cannot quite say that the size of x is 4/8 or that it corresponds to "struct T *x;".

If T happens to be a Trait, x is a Trait Object, which is also represented as a fat pointer in rust. And so its size will be 8/16 and its representation will be a struct with a pointer to data and a vtable pointer.

Comment by Anonymous — June 30, 2018 @ 15:04
cannot make a [T]
There's a little nuance here, as unsafe code is able to create fat pointers to those slices without references: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.into_boxed_slice.
Comment by anp — June 30, 2018 @ 15:57
I've always considered Box<T> equivalent to C++'s std::unique_ptr<T>, although that might not help old-school ansi-C programmers ;-)
Comment by Frans-Willem — June 30, 2018 @ 20:44
Anonymous: I have to admit that I haven't looked enough at traits to understand them at a level beyond the trivial thing you see on the surface. I'm sure that trait object pointers are useful for something, but I can't say that I can think of any. Anyway, thanks for the warning about trait objects being fat pointers.

anp: Right. I meant that as a you cannot just declare one statically. It is still a valid type, and you can certainly have code that produces one. (Also, unsafe code can do just about anything, so no surprise there.)

Frans-Willem: Agreed. Once I understood all the rust pointer types, I remembered that C++ has these fancy pointers, which much like Box<T> in rust are not real pointers but rather structs/classes wrapping a pointer. I just happen to not write C++ and to not think in C++ terms. :)

Comment by Jeffpc — July 2, 2018 @ 11:54
A quick comment to let you know that I have published your article into the #77 embedded systems newsletter. Thank you for this good article https://embedsysweekly.com/embedded-systems-issue77/
Comment by Guillaume — July 8, 2018 @ 10:47
Cool! Thanks for spreading the word!
Comment by JeffPC — July 25, 2018 @ 21:34
Thanks very much for this post. I have a good deal of experience with C/C++ but I'm at the about-to-write-a-silly-little-program stage with Rust. Your article clarified and solidified a number of things on my mind after reading the Rust book.
Comment by Adam Schabtach — June 22, 2021 @ 15:24

Atom feed for comments on this post.

Josef “Jeff” Sipek