iTranslated by AI
Introducing the nonmax crate providing NonMax types
Overview
This article introduces the nonmax (0.5.5) crate for Rust (1.89.0), demonstrates an implementation of Dijkstra's algorithm using it, and provides a simple performance comparison.
The nonmax Crate
The nonmax crate is similar to the NonZero type in the standard library. Therefore, I will first introduce the NonZero type.
NonZero types are implemented as NonZero(integer type), such as NonZeroU32 or NonZeroIsize. These types can be used when a value is known to be non-zero, enabling memory layout optimization.
use std::num::NonZeroU64;
assert_eq!(std::mem::size_of::<u64>(), 8);
assert_eq!(std::mem::size_of::<Option<u64>>(), 16);
assert_eq!(std::mem::size_of::<Option<NonZeroU64>>(), 8);
On the other hand, there are cases where a value might be 0 but is known never to take its maximum value. In such cases, the NonMax type can be used.
use nonmax::NonMaxI32;
assert_eq!(std::mem::size_of::<i32>(), 4);
assert_eq!(std::mem::size_of::<Option<i32>>(), 8);
assert_eq!(std::mem::size_of::<Option<NonMaxI32>>(), 4);
Internal Implementation
I will also briefly explain the internal implementation.
In short, NonMaxT stores the bitwise XOR of T and T::MAX inside a NonZeroT. This allows the constraints of NonMaxT to be represented by the constraints of NonZeroT.
Taking u64 as an example, it can be written as follows:
#[derive(Debug, Clone, Copy)]
struct NonMaxU64(NonZeroU64);
impl NonMaxU64 {
fn new(value: u64) -> Option<Self> {
match NonZeroU64::new(value ^ u64::MAX) {
Some(value) => Some(Self(value)),
None => None,
}
}
unsafe fn new_unchecked(value: u64) -> Self {
Self(unsafe { NonZeroU64::new_unchecked(value ^ u64::MAX) })
}
fn get(&self) -> u64 {
self.0.get() ^ u64::MAX
}
}
Checking the Assembly of NonZero Types
Let's use Compiler Explorer to briefly examine the assembly of the NonZero type. We will analyze the assembly for the following code.
use std::hint::black_box;
use std::num::NonZeroUsize;
let a = black_box(NonZeroUsize::new(5));
match a {
Some(_) => {
println!("Non Zero!")
}
None => {
println!("None!")
}
}
Looking at the assembly up to the match expression, it looks like this:
sub rsp, 56
mov qword ptr [rsp], 5
mov rax, rsp
cmp qword ptr [rsp], 0
je .LBB0_1
We can see that in the Option<NonZeroUsize> type, 0 is treated as None. In this way, memory optimization is performed by using 0 as a flag to check whether the value is None. This is known as "niche optimization."
This means that the NonMax type also benefits from this memory layout optimization by utilizing the NonZero type internally.
Details regarding compilation options and other specifics can be found here.
Generated assembly
example::a::h9680fd3bb6034c0f:
sub rsp, 56
mov qword ptr [rsp], 5
mov rax, rsp
cmp qword ptr [rsp], 0
je .LBB0_1
lea rax, [rip + .L__unnamed_1]
jmp .LBB0_3
.LBB0_1:
lea rax, [rip + .L__unnamed_2]
.LBB0_3:
mov qword ptr [rsp + 8], rax
mov qword ptr [rsp + 16], 1
mov qword ptr [rsp + 24], 8
xorps xmm0, xmm0
movups xmmword ptr [rsp + 32], xmm0
lea rdi, [rsp + 8]
call qword ptr [rip + std::io::stdio::_print::he9dfbe767523a89e@GOTPCREL]
add rsp, 56
ret
.L__unnamed_3:
.ascii "None!\n"
.L__unnamed_2:
.quad .L__unnamed_3
.asciz "\006\000\000\000\000\000\000"
.L__unnamed_4:
.ascii "Non Zero!\n"
.L__unnamed_1:
.quad .L__unnamed_4
.asciz "\n\000\000\000\000\000\000"
Using the NonMax Type for Dijkstra's Algorithm Implementation
The competitive programming problem A64 - Shortest Path 2 is a task that uses Dijkstra's algorithm, so we aim to implement a program to solve it. I have extracted the necessary information from the problem description below.
Problem Statement
Please solve the shortest path problem for a weighted undirected graph. Specifically, given a graph like the one described below, find the shortest path length from vertex 1 to each vertex.
- The number of vertices is
, and the number of edges is N . M - The
-th edge connects vertex i and vertex A_i , and its length is B_i . C_i On the
-th line, output the shortest path length from vertex 1 to vertex k . However, if vertex k is unreachable, output k instead. -1 Constraints
2 \leq N \leq 100000
1 \leq M \leq \min(100000, N(N-1)/2)
1 \leq C_i \leq 10000
All inputs are integers.
When implementing Dijkstra's algorithm, there are several common patterns for the type representing the shortest path length to vertex T.
- Represent the shortest path length to vertex
ask Option<T>, where unreachability is represented byNone. - Represent the shortest path length to vertex
ask T, where unreachability is represented by a value that can never actually occur, such asT::MAX.
Pattern 1 is a natural approach that leverages the type system, but it doesn't allow for memory layout optimization. Pattern 2 allows for memory layout optimization, but having to embed specific magic values feels less than ideal.
In such cases, using NonMaxT allows you to benefit from the type system while also gaining the advantages of memory layout optimization.
So, let's implement Dijkstra's algorithm using NonMaxT. That said, the nonmax crate is not available on AtCoder, and since the NonMaxT type in the nonmax crate does not support Add, I have implemented only the necessary parts myself.
use std::fmt::Display;
use std::num::NonZeroU64;
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
struct NonMaxU64(NonZeroU64);
impl NonMaxU64 {
fn new(value: u64) -> Option<Self> {
match NonZeroU64::new(value ^ u64::MAX) {
Some(value) => Some(Self(value)),
None => None,
}
}
fn get(&self) -> u64 {
self.0.get() ^ u64::MAX
}
fn checked_add(&self, other: Self) -> Option<Self> {
Self::new(self.get() + other.get())
}
}
impl Display for NonMaxU64 {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
self.get().fmt(f)
}
}
impl PartialOrd for NonMaxU64 {
fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
self.get().partial_cmp(&other.get())
}
}
impl Ord for NonMaxU64 {
fn cmp(&self, other: &Self) -> std::cmp::Ordering {
self.partial_cmp(other).unwrap()
}
}
Using this, you can implement a typical Dijkstra's algorithm as follows. [1]
fn dijkstras_algorithm(
size: usize,
edges: &[(usize, usize, NonMaxU64)],
start: usize,
) -> Vec<Option<NonMaxU64>> {
let adjacency_list: Vec<Vec<(usize, NonMaxU64)>> = {
let mut adjacency_list = vec![vec![]; size];
for &(u, v, w) in edges {
adjacency_list[u].push((v, w));
adjacency_list[v].push((u, w));
}
adjacency_list
};
let mut dist = vec![None; size];
dist[start] = NonMaxU64::new(0);
let mut hq = BinaryHeap::new();
hq.push(Reverse((NonMaxU64::new(0).unwrap(), start)));
while let Some(Reverse((d, v))) = hq.pop() {
if let Some(dm) = dist[v]
&& d > dm
{
continue;
}
for &(nxt, w) in adjacency_list[v].iter() {
let d = d.checked_add(w).unwrap();
match dist[nxt] {
Some(dm) if d >= dm => {}
_ => {
hq.push(std::cmp::Reverse((d, nxt)));
dist[nxt] = Some(d);
}
}
}
}
return dist;
}
Comparison
- The submission using
NonMaxU64had an execution time of and used30 \text{ms} of memory. [2]19388 \text{KiB}
Submitted code
use std::io::{self, prelude::*};
use std::{cmp::Reverse, collections::BinaryHeap, fmt::Display, num::NonZeroU64};
fn main() {
let (stdin, stdout) = (io::read_to_string(io::stdin()).unwrap(), io::stdout());
let (mut stdin, mut buffer) = (stdin.split_whitespace(), io::BufWriter::new(stdout.lock()));
macro_rules! input {
(Usize1) => {
stdin.next().unwrap().parse::<usize>().unwrap() - 1
};
($t: ty) => {
stdin.next().unwrap().parse::<$t>().unwrap()
};
}
let n = input!(usize);
let m = input!(usize);
let edges = (0..m)
.map(|_| {
(
input!(Usize1),
input!(Usize1),
NonMaxU64::new(input!(u64)).unwrap(),
)
})
.collect::<Vec<_>>();
let dist = dijkstras_algorithm(n, &edges, 0);
for d in dist {
match d {
Some(d) => {
let _ = writeln!(buffer, "{}", d);
}
None => {
let _ = writeln!(buffer, "-1");
}
}
}
}
fn dijkstras_algorithm(
size: usize,
edges: &[(usize, usize, NonMaxU64)],
start: usize,
) -> Vec<Option<NonMaxU64>> {
let adjacency_list: Vec<Vec<(usize, NonMaxU64)>> = {
let mut adjacency_list = vec![vec![]; size];
for &(u, v, w) in edges {
adjacency_list[u].push((v, w));
adjacency_list[v].push((u, w));
}
adjacency_list
};
let mut dist = vec![None; size];
dist[start] = NonMaxU64::new(0);
let mut hq = BinaryHeap::new();
hq.push(Reverse((NonMaxU64::new(0).unwrap(), start)));
while let Some(Reverse((d, v))) = hq.pop() {
if let Some(dm) = dist[v]
&& d > dm
{
continue;
}
for &(nxt, w) in adjacency_list[v].iter() {
let d = d.checked_add(w).unwrap();
match dist[nxt] {
Some(dm) if d >= dm => {}
_ => {
hq.push(std::cmp::Reverse((d, nxt)));
dist[nxt] = Some(d);
}
}
}
}
return dist;
}
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
struct NonMaxU64(NonZeroU64);
impl NonMaxU64 {
fn new(value: u64) -> Option<Self> {
match NonZeroU64::new(value ^ u64::MAX) {
Some(value) => Some(Self(value)),
None => None,
}
}
fn get(&self) -> u64 {
self.0.get() ^ u64::MAX
}
fn checked_add(&self, other: Self) -> Option<Self> {
Self::new(self.get() + other.get())
}
}
impl Display for NonMaxU64 {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
(self.get()).fmt(f)
}
}
impl PartialOrd for NonMaxU64 {
fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
self.get().partial_cmp(&other.get())
}
}
impl Ord for NonMaxU64 {
fn cmp(&self, other: &Self) -> std::cmp::Ordering {
self.partial_cmp(other).unwrap()
}
}
- The submission using
u64had an execution time of and used38 \text{ms} of memory.20228 \text{KiB}
Submitted code
use std::io::{self, prelude::*};
use std::{cmp::Reverse, collections::BinaryHeap};
fn main() {
let (stdin, stdout) = (io::read_to_string(io::stdin()).unwrap(), io::stdout());
let (mut stdin, mut buffer) = (stdin.split_whitespace(), io::BufWriter::new(stdout.lock()));
macro_rules! input {
(Usize1) => {
stdin.next().unwrap().parse::<usize>().unwrap() - 1
};
($t: ty) => {
stdin.next().unwrap().parse::<$t>().unwrap()
};
}
let n = input!(usize);
let m = input!(usize);
let edges = (0..m)
.map(|_| (input!(Usize1), input!(Usize1), input!(u64)))
.collect::<Vec<_>>();
let dist = dijkstras_algorithm(n, &edges, 0);
for d in dist {
match d {
Some(d) => {
let _ = writeln!(buffer, "{}", d);
}
None => {
let _ = writeln!(buffer, "-1");
}
}
}
}
fn dijkstras_algorithm(
size: usize,
edges: &[(usize, usize, u64)],
start: usize,
) -> Vec<Option<u64>> {
let adjacency_list: Vec<Vec<(usize, u64)>> = {
let mut adjacency_list = vec![vec![]; size];
for &(u, v, w) in edges {
adjacency_list[u].push((v, w));
adjacency_list[v].push((u, w));
}
adjacency_list
};
let mut dist = vec![None; size];
dist[start] = Some(0);
let mut hq = BinaryHeap::new();
hq.push(Reverse((0, start)));
while let Some(Reverse((d, v))) = hq.pop() {
if let Some(dm) = dist[v]
&& d > dm
{
continue;
}
for &(nxt, w) in adjacency_list[v].iter() {
let d = d + w;
match dist[nxt] {
Some(dm) if d >= dm => {}
_ => {
hq.push(std::cmp::Reverse((d, nxt)));
dist[nxt] = Some(d);
}
}
}
}
return dist;
}
Regarding execution speed, the submission using NonMaxU64 was slightly faster.
Regarding memory usage, the memory size difference is
Afterword
While reading a book on mathematical optimization, I came across a chapter on shortest path problems. As I thought about the implementation, I suddenly felt that it would be nice to have something like a NonMax type similar to the NonZero type, which led me to find the nonmax crate.
When writing programs in Rust, I want to enjoy the benefits of the type system, but it feels sad if performance is sacrificed, even if only slightly. In that sense, it feels great when you can achieve both like this.
Links
-
nonmax - crates.io
- The crate introduced in this article.
-
A64 - Shortest Path 2
- Used to examine the performance of Dijkstra's algorithm.
Discussion