iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🦔

Detecting CPU Features at Runtime: x86

に公開
1

Series:


CPUs sold on the market do more than just improve in performance over generations; they also increase the number of available instructions. In the case of x86, we've seen SSE, then AVX, then AVX-512 (which was then removed), and so on. Arm architecture also adds instructions as its minor versions increase.

If you want to use these newly added instructions in your program but call them directly, that program will no longer run on older CPUs. One approach is to switch based on compilation options, but this is problematic if you want the same binary to run on multiple CPUs and achieve optimal performance. Therefore, a method of detecting CPU features at runtime and branching within the program can be considered. In pseudo-code, it looks like this:

bool has_AVX2 = ...; // Detect whether AVX2 can be used by some method
if (has_AVX2) {
    // Perform processing using AVX2
} else {
    // Perform processing without using AVX2
}

In the case of x86 systems, the cpuid instruction is used to detect CPU features at runtime. However, using the cpuid instruction requires reading and writing specific registers, and accessing it from a high-level language like C requires compiler-dependent techniques. Here, I will introduce how to use the cpuid instruction for each compiler.

cpuid can be interpreted as a function that receives two integers (often called leaf or function) representing the genre of the feature to be queried and returns four integers. In pseudo-code, it looks like this:

function cpuid(leaf: u32, subleaf: u32) -> (eax: u32, ebx: u32, ecx: u32, edx: u32)

Depending on the first argument, the second argument might be ignored. Therefore, some intrinsic functions are provided that only take one argument.

For example, the presence of AVX can be checked as follows:

(_, _, c, _) = cpuid(0x01, 0); // subleaf is ignored
avx: bool = (c & (1 << 28)) != 0;
(_, b, _, _) = cpuid(0x07, 0);
avx2: bool = (b & (1 << 5)) != 0;
avx512f: bool = (b & (1 << 16)) != 0;

[Addendum] Even if the CPU supports AVX, the OS might not support the YMM registers or others, so this pseudo-code is insufficient if you really want to use AVX. For a proper method, please refer to the Intel SDM or similar documents, or use a library that wraps it nicely (including the "GCC/Clang built-in functions" described later). [/Addendum]

For details on the cpuid instruction, please refer to the Intel SDM or AMD APM (since AMD-specific instructions are not in the Intel SDM, you need to check the AMD manual). Here, we will look at how to use cpuid from each compiler.

Using <cpuid.h>

GCC and Clang provide <cpuid.h>, which includes the __cpuid macro (a wrapper for the cpuid instruction), the __cpuid_count macro, the __get_cpuid_max function, the __get_cpuid function, the __get_cpuid_count function, and constants corresponding to various features.

#include <cpuid.h>
void __cpuid(unsigned int leaf, [out] unsigned int eax, [out] unsigned int ebx, [out] unsigned int ecx, [out] unsigned int edx); // Macro
void __cpuid_count(unsigned int leaf, unsigned int count, [out] unsigned int eax, [out] unsigned int ebx, [out] unsigned int ecx, [out] unsigned int edx); // Macro
unsigned int __get_cpuid_max(unsigned int leaf, unsigned int *sig);
int __get_cpuid(unsigned int leaf, unsigned int *eax, unsigned int *ebx, unsigned int *ecx, unsigned int *edx);
int __get_cpuid_count(unsigned int leaf, unsigned int subleaf, unsigned int *eax, unsigned int *ebx, unsigned int *ecx, unsigned int *edx);

The functions starting with __get_ seem to check whether the leaf is within range when calling the cpuid instruction.

An example of use is as follows.

#include <cpuid.h>
#include <stdio.h>

int main(void)
{
    unsigned int eax, ebx, ecx, edx;
    __cpuid(0x01, eax, ebx, ecx, edx);
    printf("AVX: %d\n", (ecx & bit_AVX) != 0);
    __cpuid_count(0x07, 0, eax, ebx, ecx, edx);
    printf("AVX2: %d\n", (ebx & bit_AVX2) != 0);
    printf("AVX-512F: %d\n", (ebx & bit_AVX512F) != 0);
}

[Addendum] If you want to check if you can really use AVX, this code is insufficient. Refer to the previous warning. [/Addendum]

Execution example:

AVX: 1
AVX2: 1
AVX-512F: 1

Using GCC/Clang Built-in Functions

GCC and Clang provide built-in functions that wrap cpuid.

void __builtin_cpu_init(void);
int __builtin_cpu_is(const char *cpuname);
int __builtin_cpu_supports(const char *feature);

You can check for a feature by passing its name as a string to __builtin_cpu_supports.

__builtin_cpu_init must be called if you use __builtin_cpu_supports before the main function. If it is within or after the main function, there is no need to call it explicitly.

Example of use:

#include <stdio.h>

int main(void)
{
    printf("AVX: %d\n", !!__builtin_cpu_supports("avx"));
    printf("AVX2: %d\n", !!__builtin_cpu_supports("avx2"));
    printf("AVX-512F: %d\n", !!__builtin_cpu_supports("avx512f"));
}

Using MSVC Built-in Functions

MSVC also provides built-in functions. Note that while the names are the same as the macros in <cpuid.h>, the arguments are different.

void __cpuid(int cpuInfo[4], int function_id);
void __cpuidex(int cpuInfo[4], int function_id, int subfunction_id);

Example of use:

#include <stdio.h>

int main(void)
{
    int result[4]; // {eax, ebx, ecx, edx}
    __cpuid(result, 0x01);
    printf("AVX: %d\n", (result[2] & (1 << 28)) != 0);
    __cpuidex(result, 0x07, 0);
    printf("AVX2: %d\n", (result[1] & (1 << 5)) != 0);
    printf("AVX-512F: %d\n", (result[1] & (1 << 16)) != 0);
}

[Addendum] If you want to check if you can really use AVX, this code is insufficient. Refer to the previous warning. [/Addendum]

Using Intel Built-in Functions

Are there built-in functions for cpuid in everyone's favorite Intel Intrinsics Guide? Actually, there are.

#include <immintrin.h>
int _may_i_use_cpu_feature(unsigned __int64 a);
int _may_i_use_cpu_feature_ext(unsigned __int64 a, unsigned page);
int _may_i_use_cpu_feature_str(string literal feature, ...);

Well, from what I've tried, GCC, Clang, and MSVC do not seem to support these, so they seem to be specific to the Intel C compiler.

Example of use:

#include <immintrin.h>
#include <stdio.h>

int main(void)
{
    printf("AVX: %d\n", !!_may_i_use_cpu_feature(_FEATURE_AVX));
    printf("AVX2: %d\n", !!_may_i_use_cpu_feature(_FEATURE_AVX2));
    printf("AVX-512F: %d\n", !!_may_i_use_cpu_feature(_FEATURE_AVX512F));
}
#include <immintrin.h>
#include <stdio.h>

int main(void)
{
    printf("AVX: %d\n", !!_may_i_use_cpu_feature_str("avx"));
    printf("AVX2: %d\n", !!_may_i_use_cpu_feature_str("avx2"));
    printf("AVX-512F: %d\n", !!_may_i_use_cpu_feature_str("avx512f"));
}

[Addendum] I haven't checked whether Intel's built-in functions consider "whether the OS supports YMM, etc." If you seriously want to test AVX availability with these, you should check that as well. [/Addendum]

Using Inline Assembly

In GCC and Clang, you can also call the cpuid instruction using inline assembly.

#include <stdio.h>

int main(void) {
    unsigned int eax, ebx, ecx, edx;
    asm volatile("cpuid" : "=a"(eax), "=b"(ebx), "=c"(ecx), "=d"(edx) : "0"(0x01), "2"(0));
    printf("AVX: %d\n", (ecx & (1 << 28)) != 0);
    asm volatile("cpuid" : "=a"(eax), "=b"(ebx), "=c"(ecx), "=d"(edx) : "0"(0x07), "2"(0));
    printf("AVX2: %d\n", (ebx & (1 << 5)) != 0);
    printf("AVX-512F: %d\n", (ebx & (1 << 16)) != 0);
}

[Addendum] If you want to check if you can really use AVX, this code is insufficient. Refer to the previous warning. [/Addendum]

Note that on x86_64, since rbx is a callee-saved register, Clang's <cpuid.h> appears to save and restore rbx around the cpuid call. However, in my testing, GCC and Clang seem to output code that saves rbx in the function prologue and epilogue even without manual saving. There might be some subtle reasons involved that I'm not aware of, but...

Notes on detecting AVX-512 on macOS

According to the Intel SDM, to detect AVX-512 support, in addition to checking the CPUID bits, you must check XCR0 to confirm that the OS has enabled the AVX-512 state. However, with this method, even on macOS machines where AVX-512 is available, it will be determined that "AVX-512 is not supported."

For the reason why this happens, please refer to the following:

Roughly speaking, it seems to be designed so that programs that do not use AVX-512 do not need to save the AVX-512 state. The judgment by XCR0 that "AVX-512 cannot be used" is correct in a sense; an exception occurs when a program tries to use AVX-512, but the kernel catches that exception and enables the AVX-512 state for that program.

In any case, detecting AVX-512 on macOS cannot be done with regular cpuid, so you need to use an OS-specific method. Specifically, you use sysctl. Please also see Detecting CPU features at runtime: Arm edition.

An example code is as follows:

#include <stdbool.h>
#include <stdio.h>
#include <sys/sysctl.h>

bool query_cpu_feature(const char *name)
{
    int result = 0;
    size_t len = sizeof(result);
    int ok = sysctlbyname(name, &result, &len, NULL, 0);
    // Returns 0 on success
    return ok == 0 && result != 0;
}

int main(void)
{
    printf("AVX512F: %d\\n", (int)query_cpu_feature("hw.optional.avx512f"));
    printf("AVX512CD: %d\\n", (int)query_cpu_feature("hw.optional.avx512cd"));
    printf("AVX512DQ: %d\\n", (int)query_cpu_feature("hw.optional.avx512dq"));
    printf("AVX512BW: %d\\n", (int)query_cpu_feature("hw.optional.avx512bw"));
    printf("AVX512VL: %d\\n", (int)query_cpu_feature("hw.optional.avx512vl"));
    printf("AVX512IFMA: %d\\n", (int)query_cpu_feature("hw.optional.avx512ifma"));
    printf("AVX512VBMI: %d\\n", (int)query_cpu_feature("hw.optional.avx512vbmi"));
}

Example output:

AVX512F: 1
AVX512CD: 1
AVX512DQ: 1
AVX512BW: 1
AVX512VL: 1
AVX512IFMA: 1
AVX512VBMI: 1

Well, checking only AVX512F with sysctl and determining the rest with regular cpuid should be fine.

Discussion