Josef “Jeff” Sipek

2020-05-06

OpenMCT — While I’m not a fan of web-based UIs, this is a rather neat “dashboard” framework by NASA.

Wideband spectrum received in JO32KF — Over 5 years of HF spectrum waterfall in Enschede, NL.

10 Most(ly dead) Influential Programming Languages

Wikipedia article: PACELC theorem — An extension of the Wikipedia article: CAP theorem.

Learn Rust the Dangerous Way — Finally a Rust tutorial that speaks to people comfortable in C.

Interferometry and Synthesis in Radio Astronomy — An open access book.

Aviation Formulary — Great circle math applied to various aviation problems for those too lazy to derive the formulas themselves.

Papírová platidla Československa 1918-1993, České republiky a Slovenské republiky 1993-2016 — Complete list of all bank notes used in Czechoslovakia, Czech Republic, and Slovak Republic.

NOAA GOES Image ViewerWikipedia article: GOES weather satellite imagery.

Rust Pointers for C Programmers

I’ve been eyeing Rust for about a year now. Here and there, I tried to use it to make a silly little program, or to implement some simple function in it to see for myself how ergonomic it really was, and what sort of machine code rustc spit out. But last weekend I found a need for a tool to clean up some preprocessor mess, and so instead of hacking together some combination of shell and Python, I decided to write it in Rust.

From my earlier attempts, I knew that there are a lot of different “pointers” but I found all the descriptions of them lacking or confusing. Specifically, Rust calls itself a systems programming language, yet I found no clear description of how the different pointers map to C—the systems programming language. Eventually, I stumbled across The Periodic Table of Rust Types, which made things a bit clearer, but I still didn’t feel like I truly understood.

During my weekend expedition to Rust land, I think I’ve grokked things enough to write this explanation of how Rust does things. As always, feedback is welcomed.

I’ll describe what happens in terms of C. To keep things simple, I will:

  • assume that you are well-versed in C
  • assume that you can read Rust (any intro will teach you enough)
  • not bother with const for the C snippets
  • not talk about mutability

In the following text, I assume that we have some struct T. The actual contents don’t matter. In other words:

struct T {
	/* some members */
};

With that out of the way, let’s dive in!

*const T and *mut T

These are raw pointers. In general, you shouldn’t use them since only unsafe code can dereference them, and the whole point of Rust is to write as much safe code as possible.

Raw pointers are just like what you have in C. If you make a pointer, you end up using sizeof(struct T *) bytes for the pointer. In other words:

struct T *ptr;

&T and &mut T

These are borrowed references. They use the same amount of space as raw pointers and behave same exact way in the generated machine code. Consider this trivial example:

#[no_mangle]
pub fn raw(p: *mut usize) {
    unsafe {
        *p = 5;
    }

}

#[no_mangle]
pub fn safe(p: &mut usize) {
    *p = 5;
}

A rustc invocation later, we have:

raw()
    raw:     55                 pushq  %rbp
    raw+0x1: 48 89 e5           movq   %rsp,%rbp
    raw+0x4: 48 c7 07 05 00 00  movq   $0x5,(%rdi)
             00 
    raw+0xb: 5d                 popq   %rbp
    raw+0xc: c3                 ret    

safe()
    safe:     55                 pushq  %rbp
    safe+0x1: 48 89 e5           movq   %rsp,%rbp
    safe+0x4: 48 c7 07 05 00 00  movq   $0x5,(%rdi)
              00 
    safe+0xb: 5d                 popq   %rbp
    safe+0xc: c3                 ret    

Note that the two functions are bit-for-bit identical.

The only differences between borrowed references and raw pointers are:

  1. references will never point at bogus addresses (i.e., they are never NULL or uninitialized),
  2. the compiler doesn’t let you do arbitrary pointer arithmetic on references,
  3. the borrow checker will make you question your life choices for a while.

(#3 gets better over time.)

Box<T>

These are owned “pointers”. If you are a C++ programmer, you are already familiar with them. Never having truly worked with C++, I had to think about this a bit until it clicked, but it is really easy.

No matter what all the documentation and tutorials out there say, Box<T> is not a pointer but rather a structure containing a pointer to heap allocated memory just big enough to hold T. The heap allocation and freeing is handled automatically. (Allocation is done in the Box::new function, while freeing is done via the Drop trait, but that’s not relevant as far as the memory layout is concerned.) In other words, Box<T> is something like:

struct box_of_T {
	struct T *heap_ptr;
};

Then, when you make a new box you end up putting only what amounts to sizeof(struct T *) on the stack and it magically starts pointing to somewhere on the heap. In other words, the Rust code like this:

let x = Box::new(T { ... });

is roughly equivalent to:

struct box_of_t x;

x.heap_ptr = malloc(sizeof(struct T));
if (!x.heap_ptr)
	oom();

*x.heap_ptr = ...;

&[T] and &mut [T]

These are borrowed slices. This is where things get interesting. Even though it looks like they are just references (which, as stated earlier, translates into a simple C-style pointer), they are much more. These types of references use fat pointers—that is, a combination of a pointer and a length.

struct fat_pointer_to_T {
	struct T *ptr;
	size_t nelem;
};

This is incredibly powerful, since it allows bounds checking at runtime and getting a subset of a slice is essentially free!

&[T; n] and &mut [T; n]

These are borrowed references to arrays. They are different from borrowed slices. Since the length of an array is a compile-time constant (the compiler will yell at you if n is not a constant), all the bounds checking can be performed statically. And therefore there is no need to pass around the length in a fat pointer. So, they are passed around as plain ol’ pointers.

struct T *ptr;

T, [T; n], and [T]

While these aren’t pointers, I thought I’d include them here for completeness’s sake.

T

Just like in C, a struct uses as much space as its type requires (i.e., sum of the sizes of its members plus padding).

[T; n]

Just like in C, an array of structs uses n times the size of the struct.

[T]

The simple answer here is that you cannot make a [T]. That actually makes perfect sense when you consider what that type means. It is saying that we have some variable sized slice of memory that we want to access as elements of type T. Since this is variable sized, the compiler cannot possibly reserve space for it at compile time and so we get a compiler error.

The more complicated answer involves the Sized trait, which I’ve skillfully managed to avoid thus far and so you are on your own.

Summary

That was a lot of text, so I decided to compact it and make the following table. In the table, I assume that our T struct is 100 bytes in size. In other words:

/* Rust */
struct T {
    stuff: [u8; 100],
}

/* C */
struct T {
	uint8_t stuff[100];
};

Now, the table in its full glory:

Rust C Size on
ILP32/LP64
(bytes)
Value
let x: T;
struct T x;
100/100
Raw pointer
let x: *const T;
let x: *mut T;
struct T *x;
4/8
Reference
let x: &T;
let x: &mut T;
struct T *x;
4/8
Box
let x: Box<T>;
struct box_of_T {
	struct T *heap_ptr;
};

struct box_of_T x;
4/8
Array of 2
let x: [T; 2];
struct T x[2];
200/200
Reference to
an array of 2
let x: &[T; 2];
struct T *x;
4/8
A slice
let x: [T];
struct T x[];
unknown at
compile time
A reference
to a slice
let x: &[T];
struct fat_ptr_to_T {
	struct T *ptr;
	size_t nelem;
};

struct fat_ptr_to_T x;
8/16

A word of caution: I assume that the sizes of the various pointers are actually implementation details and shouldn’t be relied on to be that way. (Well, with the exception of raw pointers - without those being fixed FFI would be unnecessarily complicated.)

I didn’t cover str, &str, String, and Vec<T> since I don’t consider them fundamental types, but rather convenience types built on top of slices, structs, references, and boxes.

Anyway, I hope you found this useful. If you have any feedback (good or bad), let me know.

2017-04-22

The CRAPL: An academic-strength open source license

How the PC Industry Screws Things Up

C to Rust translator

Csmith — a random generator of C programs

ACME Mapper — high-precision general purpose mapping application

The On-line Verb Conjugator

Powered by blahgd