iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🦀️

SipHash and Its Implementation in the Rust Standard Library

に公開

This article is for Day 14 of the Rust Advent Calendar 2023.

Introduction

SipHash is the hash algorithm used by default in places like std::collections::HashMap in the Rust standard library.
It is resistant to HashDoS attacks and suitable as a default algorithm, but since I personally tend to casually switch to other (non-resistant) algorithms when I want to speed up a HashMap, I decided to investigate it properly this time.

What is SipHash

Overview

SipHash is a type of PRF that works as follows:

It generates a 64-bit hash value from a 128-bit secret key and an input of arbitrary length.
As long as the contents of the secret key remain unknown, it is impossible to predict what hash value an input will produce, which is why it is resistant to HashDoS attacks.

Secret Key Generation in Rust

In the Rust standard library, the secret key is generated by obtaining a random number from the OS only at the beginning, and thereafter, that value is maintained in thread_local and incremented by one each time.

https://github.com/rust-lang/rust/blob/1a3aa4ad149438a9b11f758c16df9e222de51f91/library/std/src/hash/random.rs#L55-L77

The comments mention that obtaining a random number from the OS every time was slow.

Algorithm

Let's examine the SipHash algorithm along with its implementation in the Rust standard library.

  1. Initialize the internal state with two u64 values, k0 and k1 (a total of 128 bits for the secret key), and fixed constants.

https://github.com/rust-lang/rust/blob/1a3aa4ad149438a9b11f758c16df9e222de51f91/library/core/src/hash/sip.rs#L204-L213

The constants being XORed come from the ASCII codes of b'somepseudorandomgeneratedbytes'.
It seems the goal is simply to ensure each state v0-3 starts with a different value.

# python
>>> ''.join(['%x' % c for c in b'somepseudorandomgeneratedbytes'])
'736f6d6570736575646f72616e646f6d67656e6572617465646279746573'
  1. Read the input 8 bytes at a time (until less than 8 bytes of input remain) and update the internal state.

mi receives the input in 64-bit segments.
What it does is simply XOR the input into the internal state before and after calling c_rounds.

https://github.com/rust-lang/rust/blob/1a3aa4ad149438a9b11f758c16df9e222de51f91/library/core/src/hash/sip.rs#L286-L297

As for the crucial c_rounds, the Rust standard library uses a variant called SipHash-1-3, so it calls compress! only once.
The "1" in SipHash-1-3 indicates that compress! is called exactly once for each 8-byte chunk of input.

https://github.com/rust-lang/rust/blob/1a3aa4ad149438a9b11f758c16df9e222de51f91/library/core/src/hash/sip.rs#L363-L375

compress! is defined as a macro (perhaps they didn't trust the compiler's optimization?).
In any case, it updates the internal state by repeating three types of operations: wrapping_add (Add), rotate_left, and XOR.
Functions that use only these three types of operations are called ARX, named after their initials.
They have the advantage of running quickly even on low-end CPUs because they don't require special instructions (like AES instructions).

https://github.com/rust-lang/rust/blob/1a3aa4ad149438a9b11f758c16df9e222de51f91/library/core/src/hash/sip.rs#L75-L93

ARX Network
Image cited from SipHash: a fast short-input PRF Jean-Philippe Aumasson and Daniel J. Bernstein

The number of rotations for each step was apparently determined through mechanical testing.

  1. Final Output

When less than 8 bytes of input remain, it updates the internal state by incorporating input length % 255, then XORs 0xff into v2, calls d_rounds to update the state, and finally XORs v0-3 together for the output.
Since it's SipHash-1-3, d_rounds calls compress! three times.

https://github.com/rust-lang/rust/blob/1a3aa4ad149438a9b11f758c16df9e222de51f91/library/core/src/hash/sip.rs#L314-L329

SipHash Variants

The notation SipHash-c-d represents the number of rounds in SipHash. It calls compress! c times for each 8-byte chunk and d times at the end.
While the paper recommends SipHash-2-4, the Rust standard library uses SipHash-1-3.
According to SipHash designer Jean-Philippe Aumasson[1], SipHash-1-3 appears to be sufficient (https://github.com/rust-lang/rust/issues/29754).

Comparison with Other Algorithms

SipHash is faster than HMAC-SHA1, which was its obvious competitor at the time, and it is particularly optimized for short inputs, making it suitable for use in data structures like HashMap.
While HMAC-SHA1 is secure, the reasoning seems to be that since a secret key is used anyway, security can be maintained even with less computational effort than SHA1 (broad paraphrase).

History

脚注
  1. [author of Serious Cryptography] ↩︎

GitHubで編集を提案

Discussion