Consider converting from wchar_t to char32_t, everywhere. #18

Closed
opened 2026-03-30 14:25:14 +00:00 by simon · 4 comments
Owner

So far I've used wchar_t as the C type for characters, everywhere. The meaning of this is ambiguous; it may be interpreted as meaning 16 bits, or 32 bits. I have allowed 32 bits, everywhere.

Today I learned that there's an alternative definition, char32_t, which is unambiguously 32 bit.

This is currently only defined (as far as I can see) in headers intended for use with C++, but it might be worth using.

So far I've used wchar_t as the C type for characters, everywhere. The meaning of this is ambiguous; it may be interpreted as meaning 16 bits, or 32 bits. I have allowed 32 bits, everywhere. Today I learned that there's an alternative definition, `char32_t`, which is unambiguously 32 bit. This is currently only defined (as far as I can see) in headers intended for use with C++, but it might be worth using.
Author
Owner

Still considering.

The ambiguity of a wchar_t is a problem, but it is not bigger than char32_t, ever. Don't know. With char32_t we're explicitly hanging on to 32 bits. Probably good.

Still considering. The ambiguity of a `wchar_t` is a problem, but it is not bigger than `char32_t`, ever. Don't know. With `char32_t` we're explicitly hanging on to 32 bits. Probably good.
simon added the
Architecture change
label 2026-04-19 22:43:10 +00:00
Author
Owner

OK, this may have been a mistake. fwprintf and friends, and consequently url_fwprintf and friends, expect wchar_t, and consequently are throwing compiler warnings. It compiles, and there's no obvious bad behaviour, so my strings are being understood correctly; but there may be problems in future/on other platforms. I am going to leave this ticket open for now and may revert the change. Fortunately, it is easy to make.

OK, this may have been a mistake. `fwprintf` and friends, and consequently `url_fwprintf` and friends, expect `wchar_t`, and consequently are throwing compiler warnings. It compiles, and there's no obvious bad behaviour, so my strings are being understood correctly; but there may be problems in future/on other platforms. I am going to leave this ticket open for now and may revert the change. Fortunately, it is easy to make.
Author
Owner

Yeah, this one's a bust. You can't reliably cast between wchar_t and char32_t, and a given character in the one is not equal to the same character in the other. Going to have to roll this change back. :-(

Yeah, this one's a bust. You can't reliably cast between `wchar_t` and `char32_t`, and a given character in the one is not equal to the same character in the other. Going to have to roll this change back. :-(
simon added the
Won't fix
label 2026-05-06 22:22:54 +00:00
Author
Owner

This doesn't work. A lot of libraries I depend on need wchar_t and don't have (or I haven't found) char32_t equivalents. The same character in different encodings is not equal, and there I have not found convenient conversion functions. So we need to use just one encoding everywhere, and that encoding has to be wchar_t.

This doesn't work. A lot of libraries I depend on need `wchar_t` and don't have (or I haven't found) `char32_t` equivalents. The same character in different encodings is not equal, and there I have not found convenient conversion functions. So we need to use just one encoding everywhere, and that encoding has to be `wchar_t`.
simon closed this issue 2026-05-06 22:26:34 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: simon/post-scarcity#18
No description provided.