Consider converting from wchar_t to char32_t, everywhere. #18
Labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: simon/post-scarcity#18
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
So far I've used wchar_t as the C type for characters, everywhere. The meaning of this is ambiguous; it may be interpreted as meaning 16 bits, or 32 bits. I have allowed 32 bits, everywhere.
Today I learned that there's an alternative definition,
char32_t, which is unambiguously 32 bit.This is currently only defined (as far as I can see) in headers intended for use with C++, but it might be worth using.
Still considering.
The ambiguity of a
wchar_tis a problem, but it is not bigger thanchar32_t, ever. Don't know. Withchar32_twe're explicitly hanging on to 32 bits. Probably good.OK, this may have been a mistake.
fwprintfand friends, and consequentlyurl_fwprintfand friends, expectwchar_t, and consequently are throwing compiler warnings. It compiles, and there's no obvious bad behaviour, so my strings are being understood correctly; but there may be problems in future/on other platforms. I am going to leave this ticket open for now and may revert the change. Fortunately, it is easy to make.Yeah, this one's a bust. You can't reliably cast between
wchar_tandchar32_t, and a given character in the one is not equal to the same character in the other. Going to have to roll this change back. :-(This doesn't work. A lot of libraries I depend on need
wchar_tand don't have (or I haven't found)char32_tequivalents. The same character in different encodings is not equal, and there I have not found convenient conversion functions. So we need to use just one encoding everywhere, and that encoding has to bewchar_t.