iTranslated by AI
Character Set Changes in MySQL 8.0.24
Regarding things that caught my attention in MySQL 8.0.24, which was released on 2021/4/20.
Well, it's pretty much just about character sets.
Outputting utf8 as utf8mb3
Client applications and test suite plugins now report utf8mb3 rather than utf8 when writing character set names. (Bug #32164079, Bug #32164125)
Important Note: When a utf8mb3 collation was specified in a CREATE TABLE statement, SHOW CREATE TABLE, DEFAULT CHARSET, the values of system variables containing character set names, and the binary log all subsequently displayed the character set as utf8 which is becoming a synonym for utf8mb4. Now in such cases, utf8mb3 is shown instead, and CREATE TABLE raises the warning 'collation_name' is a collation of the deprecated character set UTF8MB3. Please consider using UTF8MB4 with an appropriate collation instead. (Bug #27225287, Bug #32085357, Bug #32122844)
I also wrote about this in (Whoa, has it already been 3 years since then...?),
Also, despite being told to use utf8mb3, it's a bit disappointing that it displays as utf8 when referenced. If you use the result of show create table as is, you get a warning...
That was the situation, but finally, it now appears as utf8mb3 even in SHOW CREATE TABLE.
8.0.23
mysql> create table t (i int) charset utf8mb3;
Query OK, 0 rows affected, 1 warning (0.14 sec)
mysql> show create table t\G
*************************** 1. row ***************************
Table: t
Create Table: CREATE TABLE `t` (
`i` int DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.02 sec)
8.0.24
mysql> create table t (i int) charset utf8mb3;
Query OK, 0 rows affected, 1 warning (0.11 sec)
mysql> show create table t\G
*************************** 1. row ***************************
Table: t
Create Table: CREATE TABLE `t` (
`i` int DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3
1 row in set (0.03 sec)
It seems like preparation for when utf8 becomes an alias for utf8mb4 in the future (maybe 8.1 or 9.0?) (I'd like to hope that won't happen in 8.0.x).
I think it could have been done much sooner...
If you set up replication with a version earlier than 8.0.24 as the master and a version where utf8 has become an alias for utf8mb4 as the slave, it seems like the master would be utf8mb3 while the slave would become utf8mb4, so you should probably be careful about that.
Fixed an issue where Japanese could not be entered with mysql --default-character-set=utf8mb4
For builds compiled using the libedit library, if the mysql client was invoked with the --default-character-set=utf8 option, libedit rejected input of multibyte characters. (Bug #32329078, Bug #32583436, Bug #102806)
I wrote something similar in but what I wrote in that article was about not being able to enter Japanese if the locale wasn't set, so it's a bit different.
It's more like this tweet
8.0.23
% LC_ALL=C.UTF-8 mysql --default-character-set=utf8mb4
mysql> (← Japanese cannot be entered)
8.0.24
% LC_ALL=C.UTF-8 mysql --default-character-set=utf8mb4
mysql> select 'あ';
+-----+
| あ |
+-----+
| あ |
+-----+
1 row in set (0.00 sec)
8-bit characters are now treated as invalid in the ascii charset
It was possible to insert illegal ASCII values (outside 7-bit range) into character columns that used the ascii character set. This is now prohibited. (Bug #24847620)
ASCII is a 7-bit character set, but MySQL's ascii charset used to accept 8-bit characters as well.
I used to use it occasionally for purposes like "I want it to be case-sensitive but also want to include 8-bit data," but it is finally being treated as invalid characters.
In 8.0.23, data could be stored even though warnings were issued.
mysql> create table t (c varchar(10)) charset ascii;
Query OK, 0 rows affected (0.15 sec)
mysql> insert into t value (0x414243ff303132);
Query OK, 1 row affected, 1 warning (0.03 sec)
mysql> show warnings;
+---------+------+------------------------------------------------+
| Level | Code | Message |
+---------+------+------------------------------------------------+
| Warning | 1300 | Invalid ascii character string: 'ABC\xFF01...' |
+---------+------+------------------------------------------------+
1 row in set (0.00 sec)
mysql> select * from t;
+---------+
| c |
+---------+
| ABC?012 |
+---------+
1 row in set (0.00 sec)
mysql> select c,hex(c) from t;
+---------+----------------+
| c | hex(c) |
+---------+----------------+
| ABC?012 | 414243FF303132 |
+---------+----------------+
1 row in set (0.00 sec)
8.0.24 results in an error. If you change the sql_mode, it looks like it can be stored with warnings... but the data is truncated after the 8-bit data.
mysql> create table t (c varchar(10)) charset ascii;
Query OK, 0 rows affected (0.15 sec)
mysql> insert into t value (0x414243ff303132);
ERROR 1366 (HY000): Incorrect string value: '\xFF012' for column 'c' at row 1
mysql> set sql_mode='';
Query OK, 0 rows affected (0.00 sec)
mysql> insert into t value (0x414243ff303132);
Query OK, 1 row affected, 1 warning (0.03 sec)
mysql> show warnings;
+---------+------+-----------------------------------------------------------+
| Level | Code | Message |
+---------+------+-----------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\xFF012' for column 'c' at row 1 |
+---------+------+-----------------------------------------------------------+
1 row in set (0.00 sec)
mysql> select c,hex(c) from t;
+------+--------+
| c | hex(c) |
+------+--------+
| ABC | 414243 |
+------+--------+
1 row in set (0.00 sec)
I wonder if this should be treated as a bug fix. It feels like an incompatibility. Well, I suppose it's the user's fault for using it in a strange way.
Discussion