iTranslated by AI
setlocale() on Windows and Linux
Introduction
In C/C++, the standard way to handle encodings is through locales. In this article, we will briefly compare setlocale(), which sets and retrieves the locale, between Windows and Linux from the perspective of character encoding.
0. Source Code Location
Refer to the README for build instructions.
1. How to set setlocale with the default locale
This program sets the current default locale and displays its string representation.
1-1. Windows (Visual Studio) Execution Results
The system locale set in the registry is read, configured, and displayed.
Since the source code is unified to UTF-8, a build option (/execution-charset:.932) is specified to compile it using Shift JIS.
C:\...\Release>setlocale_default
現在のロケール: Japanese_Japan.932
C:\...\Release>
A string indicating the character encoding of the system locale (Shift JIS) is displayed.
1-2. Windows (MSYS2) Execution Results
The system locale set in the registry is read, configured, and displayed.
If you compile it as-is, the string literals will be in UTF-8. If the system locale is the default Shift JIS, this will result in garbled text. Therefore, -fexec-charset is used during compilation to specify the character encoding (Shift JIS) that matches the system locale, aligning the string literals accordingly.
$ ./setlocale_default
現在のロケール: Japanese_Japan.932
$
A string indicating the character encoding of the system locale (Shift JIS) is displayed.
1-3. Linux Execution Results
The settings are configured and displayed by referring to environment variables such as LANG.
$ ./setlocale_default
現在のロケール: ja_JP.UTF-8
$
A string indicating UTF-8 is displayed.
1-4. Summary
| Japanese Windows (VS) | Japanese Windows (MSYS2) | Linux | |
|---|---|---|---|
| Result | System locale character encoding (Shift JIS) | System locale character encoding (Shift JIS) | UTF-8 |
| Source Data | Registry | Registry | Environment Variables |
| String Literal Encoding | System locale character encoding (Shift JIS) | System locale character encoding (Shift JIS) | UTF-8 |
| Compilation Options | /execution-charset:.932 (Because the source code encoding was changed from Shift JIS to UTF-8) | -fexec-charset=System locale character encoding (Shift JIS) | Standard |
2. How to set setlocale with a fixed UTF-8 encoding
This is a method to set UTF-8 unconditionally, regardless of environment variables or the registry.
*Note: In Windows, you can omit the current language/region by using ".UTF8", but since this isn't possible in Linux, the full specification (ja_JP.UTF-8) is used here.
2-1. Windows (Visual Studio) Execution Results
C:\...\Release>setlocale_utf8
現在のロケール: ja_JP.UTF-8
C:\...\Release>
As you will see if you try it, Japanese cannot be processed correctly when connected via a pipe with existing Windows commands.
C:\...\Release>setlocale_utf8 | findstr ロケール
C:\...\Release>
It should be outputting "ロケール" (locale), but it isn't found. Because it's outputting in UTF-8, it won't be found when searching in Shift JIS.
2-2. Windows (MSYS2) Execution Results
$ ./setlocale_utf8
現在のロケール: ja_JP.UTF-8
$
The terminal's output encoding setting remains Shift JIS, but the UCRT automatically converts the output from UTF-8 to Shift JIS. Therefore, the text is not garbled, but note that characters not present in Shift JIS cannot be output.
For example, if you try to output emojis, they do not exist in Shift JIS. Below, I've changed the terminal's output encoding to UTF-8 using chcp before outputting.
$ ./utf8_emoji
??
$ chcp.com 65001
Active code page: 65001
$ ./utf8_emoji
😀
$ chcp.com 932
現在のコード ページ: 932
$
2-3. Linux Execution Results
The output doesn't change. It just stops looking at the environment variables.
$ ./setlocale_utf8
現在のロケール: ja_JP.UTF-8
$
2-4. Summary
| Japanese Windows (VS) | Japanese Windows (MSYS2) | Linux | |
|---|---|---|---|
| Result | UTF-8 | UTF-8 | UTF-8 |
| Source Data | Fixed value | Fixed value | Fixed value |
| String Literal Encoding | UTF-8 | UTF-8 | UTF-8 |
| Compilation Options | Standard (Because the source code encoding was changed from Shift JIS to UTF-8) | Standard | Standard |
- Note: Cannot connect correctly via pipes with existing Windows commands
- Note: If the terminal encoding is Shift JIS, characters not included in Shift JIS cannot be displayed (when going via UCRT)
3. Sample for conversion from the default locale to UTF-8
Sets the default locale, reads a string from standard input, converts it to UTF-8, and outputs it to standard output.
If you insert this between a program (like hoge.c which outputs Shift JIS in an environment where the system locale is Shift JIS) and MSYS2 commands using a pipe, you can prevent garbled text.
$ ./setlocale_default | cat
▒▒▒݂̃▒▒P▒[▒▒: Japanese_Japan.932
$ ./setlocale_default | ./def2utf8 | cat
現在のロケール: Japanese_Japan.932
$
You can easily create utf82def in the same way.
Discussion