iTranslated by AI
What I Got Wrong About ULID Timestamps and Sorting
Introduction
ULID is a system that allows for sorting by attaching a string based on a timestamp and ultimately provides a unique random string.
I had this understanding.
Therefore, I thought that as long as I used ULID for numbering, they would be sortable.
However, that understanding was partially incorrect.
To verify this, please look at the following code.
import { ulid } from 'ulid'
const ids = [1, 2, 3, 4, 5, 6, 7].map(v => ({ id: v, ulid: uli() }))
console.log(ids.sort((a, b) => {
if (a.ulid > b.ulid) {
return -1
}
if (a.ulid < b.ulid) {
return 1
}
return 0;
}))
ulid() is provided by the ulid package and is a function that generates a ULID.
I am creating an array with these as values.
Now, please try executing this.
According to my original understanding of ULID, the object with the id value of 7 should be at the beginning of the array, followed by 6, 5, 4... in descending order.
However, if you actually run it about three times, the results will look like this:
/** 1st execution */
[
{ id: 3, ulid: "01J6VWSHZJYTCEMWNQ9V5VXQ80" },
{ id: 6, ulid: "01J6VWSHZJMBRT6225XZ7J5WR4" },
{ id: 5, ulid: "01J6VWSHZJG8TDKD2SXD64QZV5" },
{ id: 7, ulid: "01J6VWSHZJ90VFMF45DPHJBCRD" },
{ id: 4, ulid: "01J6VWSHZJ31QSMCV7MSSTJ3QG" },
{ id: 2, ulid: "01J6VWSHZHW2D3PVCSGX4W211X" },
{ id: 1, ulid: "01J6VWSHZH0X8MB18EEC1BH2M3" },
][
/** 2nd execution */
({ id: 2, ulid: "01J6VWSE7DXAA35D718XHGFKTH" },
{ id: 3, ulid: "01J6VWSE7DM4HRXBZ6X5DNFMGJ" },
{ id: 7, ulid: "01J6VWSE7DBDTYFXJ6SYTRJ8T3" },
{ id: 6, ulid: "01J6VWSE7D9B0CJGMGMJA5391Z" },
{ id: 4, ulid: "01J6VWSE7D6N4985P1XWA5SKVX" },
{ id: 5, ulid: "01J6VWSE7D4Q29SY2QMV4SA4TX" },
{ id: 1, ulid: "01J6VWSE7CG76MK61F2689J96W" })
][
/** 3rd execution */
({ id: 7, ulid: "01J6VWS9NSGWKWD7Q0GEN69T23" },
{ id: 6, ulid: "01J6VWS9NS8YMWN7Y16GR881H2" },
{ id: 3, ulid: "01J6VWS9NRSR0S97EDXGJ03ZCN" },
{ id: 4, ulid: "01J6VWS9NRQ8SCWJHDAW2HTNWB" },
{ id: 2, ulid: "01J6VWS9NRQ7M81GXMJDSQ9Z0W" },
{ id: 5, ulid: "01J6VWS9NR6C43JR6TZJYME1YB" },
{ id: 1, ulid: "01J6VWS9NQNFWKRC5RMT4S2HT2" })
];
The order is not stable at all.
This feels strange if we consider the fact that ULIDs are sortable because they use timestamps.
However, this is normal behavior.
In this article, we will examine why this ULID behavior is normal and look at how to sort even in cases like the sample code.
First, Let's Confirm What ULID Is
To determine if the ULID's behavior is correct, we need to confirm what a ULID actually is.
According to the documentation, a ULID is composed as follows:
01AN4Z07BY 79KA1307SR9X4MV3
|----------| |----------------|
Timestamp Randomness
48bits 80bits
The structure is described as follows:
Components
Timestamp
- 48 bit integer
- UNIX-time in milliseconds
- Won't run out of space 'til the year 10889 AD.
Randomness
- 80 bits
- Cryptographically secure source of randomness, if possible
The first 48 bits represent the UNIX time in milliseconds when the ULID was generated.
The remainder is formed from a random string that is as cryptographically secure as possible.
Regarding UNIX time, as mentioned above, it is guaranteed to work correctly until the year 10889.
And, because it will be relevant later, let me emphasize that ULID outputs an encoded version of the UNIX time in milliseconds.
Now that I've described the structure of ULID, let's also look at its characteristics.
However, since this part has little direct relation to the main theme of this article, you can skip it if you like.
Checking the documentation again, the features of ULID are as follows:
- 128-bit compatibility with UUID
- 1.21e+24 unique ULIDs per millisecond
- Lexicographically sortable!
- Canonically encoded as a 26 character string, as opposed to the 36 character UUID
- Uses Crockford's base32 for better efficiency and readability (5 bits per character)
- Case insensitive
- No special characters (URL safe)
- Monotonic sort order (correctly detects and handles the same millisecond)
While a UUID is 36 characters, a ULID is represented by 26 characters.
However, both have a data size of 128 bits, maintaining compatibility.
It can guarantee uniqueness for up to 1.21E+24 (≈1.21x10^24) generations per millisecond.
It is encoded using Base 32, which is 5 bits per character.
The reason why 48 bits were represented by 10 characters in the structure explanation is that they were encoded in Base32.
Finally, ULIDs can be sorted lexicographically.
This is because the first 10 characters of a ULID are based on the UNIX time when it was generated, and the string is generated in correspondence with the generation time.
ULIDs are sortable because they are designed to follow the sorting order of this UNIX time.
The fact that they can be sorted according to this timestamp was exactly as I had understood.
This concludes the overview of ULID.
Digression: Checking How Sorting Is Handled in ULID Libraries
Earlier, I mentioned that ULIDs are sortable because the first 10 characters are a string created based on UNIX time. Now, let's examine how real-world libraries satisfy this requirement of being sortable.
We will look at ulid/javascript. Specifically, we will check the encodeTime function, which converts UNIX time into a string. Note that I have modified the code slightly to make it easier to understand. The code is as follows:
const ENCODING = "0123456789ABCDEFGHJKMNPQRSTVWXYZ";
export function encodeTime(now: number): string {
let mod;
let str = "";
for (let i = 10; i > 0; i--) {
/**
* Ensure the remainder is always between 0 and 31.
* This allows us to retrieve a character from Base32.
*/
mod = now % 32;
/** Retrieve the matching character */
str = ENCODING.charAt(mod) + str;
/** Reassign after removing the remainder */
now = (now - mod) / 32;
}
return str;
}
The function divides the input numerical value by 32 and obtains the remainder. Based on the remainder's value, it retrieves and assigns characters in sequence. Once a character is retrieved, it subtracts the remainder from the original number and divides it by 32. In other words, it retrieves the character matching the last digit in base 32.
Additionally, the characters are set such that the higher their index in the encoding string, the later they appear in the sort order. Furthermore, since UNIX time is a value that inevitably increases as time progresses, the value will always be larger for later times. From this, we can see that ULID is designed to be sortable according to the timestamp. This is because UNIX time is a continuously increasing number, and by repeatedly dividing by 32, a larger initial value will always yield a larger mod value at each step.
As a test, try passing two arbitrary values into the function below. The larger number will always end with a larger value when viewed from the right.
function hoge(num: number) {
let localNum = num;
let str = "";
for (let i = 0; i < 10; i++) {
const mod = localNum % 32;
str = `${mod}` + str;
localNum = (localNum - mod) / 32;
}
console.log(str);
}
I've also provided a playground, so feel free to experiment. ULID takes advantage of the fact that larger numbers translate to strings that appear later lexicographically. Thus, even though ULIDs are strings, they can be sorted by timestamp.
Why Sorting Did Not Work Well with ULID
So far, we have looked at the overview of ULID. Based on that, let's examine why the code in the "Introduction" section could not be sorted correctly.
However, as I have been emphasizing this point, you may have already guessed the reason. That's right, it's because the timestamp ULID refers to is in milliseconds.
Since the timestamp is based on milliseconds, if the process finishes faster than a millisecond, the first 10 characters will be the same value. Therefore, the sorting target becomes the remaining characters. However, since the remaining characters are generated randomly, they do not guarantee any sorting order. This is why the sorting did not work well.
By the way, cases where processing finishes faster than a millisecond can be easily reproduced by using a loop, such as a for statement. For example, if you execute the following code, the output values will sometimes be the same.
for (let i = 0; i < 5; i++) {
console.log(Date.now());
}
/** Output values */
// 1725971252215
// 1725971252217
// 1725971252219
// 1725971252220
// 1725971252220
The last two have the same value. As shown here, in loop processing, the process may complete within the same millisecond.
Looking at the factory function code in ulid/javascript that we saw earlier, it defaults to outputting the millisecond value using Date.now() without exception.
export function factory(currPrng?: PRNG): ULID {
if (!currPrng) {
currPrng = detectPrng();
}
return function ulid(seedTime?: number): string {
if (isNaN(seedTime)) {
seedTime = Date.now();
}
return encodeTime(seedTime, TIME_LEN) + encodeRandom(RANDOM_LEN, currPrng);
};
}
Since the values become the same, sorting is determined by the trailing characters, which do not guarantee any sort order. This is why the initial code failed to sort correctly. It would have been obvious if I had correctly understood that the string generation is based on milliseconds, but assuming it's sortable simply because it uses a timestamp can lead to pitfalls like this.
Even so, If You Want to Sort Within the Same Millisecond
We have confirmed that when using ULID, sorting does not work correctly for generations within the same millisecond. Looking at the definition, the reason is quite understandable. However, there are likely situations where you still want sorting to work even for generations within the same millisecond.
ULID accounts for this case, and the documentation describes how to handle it. This is covered in the Monotonicity section. It states:
When generating a ULID within the same millisecond, we can provide some guarantees regarding sort order. Namely, if the same millisecond is detected, the
randomcomponent is incremented by 1 bit in the least significant bit position (with carrying).
In short, it says that the sorting order within the same millisecond can be guaranteed. Specifically, if the same millisecond is detected, it mentions generating the ULID by incrementing the last generated random string portion by 1 bit.
It's easier to visualize with code, so let's check the sample code for the monotonic generation function provided by ulid/javascript.
import { monotonicFactory } from "ulid";
const ulid = monotonicFactory();
// Strict ordering for the same timestamp, by incrementing the least-significant random bit by 1
ulid(150000); // 000XAL6S41ACTAV9WEVGEMMVR8
ulid(150000); // 000XAL6S41ACTAV9WEVGEMMVR9
ulid(150000); // 000XAL6S41ACTAV9WEVGEMMVRA
ulid(150000); // 000XAL6S41ACTAV9WEVGEMMVRB
ulid(150000); // 000XAL6S41ACTAV9WEVGEMMVRC
// Even if a lower timestamp is passed (or generated), it will preserve sort order
ulid(100000); // 000XAL6S41ACTAV9WEVGEMMVRD
When generated within the same millisecond, you can see that the suffix of the previously created ULID becomes the next character in Base32. Otherwise, you can confirm that the strings are identical. In this way, if you want to create a ULID that can be sorted even within the same millisecond, you can achieve your requirements by generating it with Monotonicity.
However, as a point of caution, this introduces patterns into the ULIDs, making them easier to guess than before. For this reason, they become less suitable as keys for retrieving sensitive information, so you need to use them selectively depending on the situation.
Also, as mentioned in the documentation, this method only guarantees sorting for up to 2^80 generations within a single millisecond.
If, in the extremely unlikely event that, you manage to generate more than 280 ULIDs within the same millisecond, or cause the random component to overflow with less, the generation will fail.
Generation will fail if you exceed this limit, so please be careful when executing processes at ultra-high speeds. Although, to be honest, I can't really imagine a situation where an application would reach that limit...
Lastly, there is a note regarding the library itself rather than ULID itself. Regarding the monotonicFactory function in ulid/javascript, if you do not set the timestamp argument, sorting is not guaranteed unless you have pre-executed the monotonicFactory function to create a generator instance. In terms of code, using it in the following way will not guarantee sorted generation:
import { monotonicFactory } from "ulid";
for (let i = 0; i < 10; i++) {
console.log(monotonicFactory()());
}
The reason for this becomes clear when checking the source code of the monotonicFactory function:
export function monotonicFactory(currPrng?: PRNG): ULID {
if (!currPrng) {
currPrng = detectPrng();
}
let lastTime: number = 0;
let lastRandom: string;
return function ulid(seedTime?: number): string {
if (isNaN(seedTime)) {
seedTime = Date.now();
}
if (seedTime <= lastTime) {
const incrementedRandom = (lastRandom = incrementBase32(lastRandom));
return encodeTime(lastTime, TIME_LEN) + incrementedRandom;
}
lastTime = seedTime;
const newRandom = (lastRandom = encodeRandom(RANDOM_LEN, currPrng));
return encodeTime(seedTime, TIME_LEN) + newRandom;
};
}
Since let lastTime: number = 0 is defined inside the factory, executing it as monotonicFactory()() means it always runs with lastTime at 0. As a result, the branch responsible for handling same-millisecond generation will always evaluate to false. Therefore, it won't behave as intended.
The fix is to extract the ULID generation function first, like this:
import { monotonicFactory } from "ulid";
const generateUlid = monotonicFactory();
for (let i = 0; i < 10; i++) {
console.log(generateUlid());
}
By doing this, lastTime doesn't start from 0 every time; it gets updated with each execution of the generateUlid function. Therefore, it becomes Monotonic when generated within the same millisecond. This isn't directly about the ULID spec, but I shared it because I struggled when it didn't work as expected due to not understanding the internal logic.
Conclusion
In this article, we looked specifically at ULID timestamps and sorting. I had been using it somewhat casually and ran into trouble when things didn't go as expected. However, I'm glad I took the time to research and understand it properly.
I hope this serves as a helpful reference when using ULID as a unique ID that guarantees sorting. Thank you for reading to the end.
Discussion