Ted @esdin

Recent searches

Search options

Not available on peoplemaking.games.

**Josh Jersild** @JoshJers · May 23, 2023 *

Question for masses: What's a programming language that you've seen that handles text strings really well? Specifically, great support for Unicode (i.e. cleanly handles the distinction between code points and code units, stuff like that)? Is there anything out there that you think fits that bill?

Edited to add: the main thing that I'm really curious about is if any language has elegantly handled the problem where a Unicode character might be stored as multiple units (i.e. one or more 'char's) - it's an easy thing to solve if you just say "all my strings are utf-32 and thus hold the whole space" but then suddenly your strings are gigantic in memory so I think it's untenable in practice (although that's what we did at my current job for simplicity)

May 23, 2023 *

**Eniko Fox** @eniko · May 23, 2023

Eniko Fox @eniko

@JoshJers the problem is that there's no such thing as a unicode character, especially now that you have joining characters for emoji that can join an arbitrary number of different emojis into a new one. Having researched this the best idea I found is utf8 everywhere and then you can iterate over the code points. But really I think no language solves this because it's not strictly solvable

May 23, 2023

**Eniko Fox** @eniko · May 23, 2023

Eniko Fox @eniko

@JoshJers I did write iterators over code points in C# (forward and backward) but that still doesn't give you actual characters so *farting noises*

May 23, 2023

**Josh Jersild** @JoshJers · May 23, 2023

Josh Jersild @JoshJers

@eniko yeah and there's not even a really great way to HANDLE that without, like, "here's an array of utf-32 chars that represent the whole visible thing, lol have fun" at every step

At work I wrote the code that does canonical string comparison (i.e. if you have é written as e + accent or as a single character, they'll compare as identical) and it was a nightmare of tables and clever table compression

May 23, 2023

Josh Jersild @JoshJers@peoplemaking.games

@eniko (actually first became aware of how annoying the problem is because @AbuhRae was working on a thing where she had a Korean filename (Korean can be represented in multiple ways), and one OS was dealing with filenames in composed mode and another was dealing with filenames in decomposed mode and god help you transferring files from one to the next)

May 23, 2023, 10:32 PM··Web

0boosts·2favorites

**Eniko Fox** @eniko · May 23, 2023

Eniko Fox @eniko

@JoshJers @AbuhRae *despair activated*

May 23, 2023

Drag & drop to upload

Recent searches

Search options

Facilitated by:

Server stats:

Recent searches

Search options

Facilitated by:

Server stats:

Posts and replies