Question for masses: What's a programming language that you've seen that handles text strings really well? Specifically, great support for Unicode (i.e. cleanly handles the distinction between code points and code units, stuff like that)? Is there anything out there that you think fits that bill?
Edited to add: the main thing that I'm really curious about is if any language has elegantly handled the problem where a Unicode character might be stored as multiple units (i.e. one or more 'char's) - it's an easy thing to solve if you just say "all my strings are utf-32 and thus hold the whole space" but then suddenly your strings are gigantic in memory so I think it's untenable in practice (although that's what we did at my current job for simplicity)
@JoshJers the problem is that there's no such thing as a unicode character, especially now that you have joining characters for emoji that can join an arbitrary number of different emojis into a new one. Having researched this the best idea I found is utf8 everywhere and then you can iterate over the code points. But really I think no language solves this because it's not strictly solvable
@JoshJers I did write iterators over code points in C# (forward and backward) but that still doesn't give you actual characters so *farting noises*
@eniko yeah and there's not even a really great way to HANDLE that without, like, "here's an array of utf-32 chars that represent the whole visible thing, lol have fun" at every step
At work I wrote the code that does canonical string comparison (i.e. if you have é written as e + accent or as a single character, they'll compare as identical) and it was a nightmare of tables and clever table compression
@eniko (actually first became aware of how annoying the problem is because @AbuhRae was working on a thing where she had a Korean filename (Korean can be represented in multiple ways), and one OS was dealing with filenames in composed mode and another was dealing with filenames in decomposed mode and god help you transferring files from one to the next)