fix(mysql): explicitly set charset=utf8mb4 to resolve connection failure with mysql-connector-python >= 8.0.30#2656
Conversation
…odadata#2583) When using mysql-connector-python >= 8.0.30, the connector internally remaps charset 'utf8' to 'utf8mb4'. Since no charset was explicitly set in the connect() call, this remapping triggered automatically and caused the error on MySQL servers < 5.5.3: "Character set 'utf8' unsupported" Fix: explicitly pass charset="utf8mb4" and use_unicode=True to the mysql.connector.connect() call. This bypasses the internal remapping logic entirely and works correctly across all supported MySQL versions. utf8mb4 is the recommended charset since MySQL 5.5.3 and is fully backwards compatible with utf8 data. Fixes sodadata#2583
for more information, see https://pre-commit.ci
|
|
Hi @bmarinovic @tomassatka @Niels-b I have submitted this fix for issue #2583 where mysql-connector-python >= 8.0.30 silently remaps charset utf8 → utf8mb4, causing connection failures on MySQL servers older than 5.5.3. All checks are passing! Could someone please review when you get a chance? Happy to make any changes based on feedback. Thank you. |
|
Hi @bmarinovic @tomassatka @Niels-b Friendly bump on PR #2656. There is also a workflow awaiting approval from a maintainer before CI can fully run. Thank you! |
|
Hi @bmarinovic @tomassatka @Niels-b Hope you are doing well! |



Problem
When using soda-core-mysql with mysql-connector-python >= 8.0.30,
connecting to a MySQL data source fails with:
"Character set 'utf8' unsupported"
"Encountered a problem while trying to connect to mysql:
Character set 'utf8' unsupported"
Root Cause
The connect() method in mysql_data_source.py creates a MySQL connection
without specifying a charset parameter:
mysql.connector.connect(
user=self.username, password=self.password,
host=self.host, port=self.port, database=self.database
)
Starting from mysql-connector-python 8.0.30, when no charset is
specified, the connector defaults to 'utf8' and then silently remaps
utf8 → utf8mb4. This remapping fails on MySQL servers older than 5.5.3
which do not support utf8mb4.
Fix
Explicitly pass charset="utf8mb4" and use_unicode=True:
mysql.connector.connect(
user=self.username, password=self.password,
host=self.host, port=self.port, database=self.database,
charset="utf8mb4", use_unicode=True
)
This bypasses the connector's internal remapping logic entirely.
Impact
References