peoplemaking.games is one of the many independent Mastodon servers you can use to participate in the fediverse.
We're a community of folks who celebrate the craft of creating interactive experiences, whether they're working in games or around them! This is a space for games folks and creatives to call home!

Facilitated by:

Server stats:

348
active users

Josh Jersild
Public

Question for masses: What's a programming language that you've seen that handles text strings really well? Specifically, great support for Unicode (i.e. cleanly handles the distinction between code points and code units, stuff like that)? Is there anything out there that you think fits that bill?

Edited to add: the main thing that I'm really curious about is if any language has elegantly handled the problem where a Unicode character might be stored as multiple units (i.e. one or more 'char's) - it's an easy thing to solve if you just say "all my strings are utf-32 and thus hold the whole space" but then suddenly your strings are gigantic in memory so I think it's untenable in practice (although that's what we did at my current job for simplicity)

Eniko Fox
Public

@JoshJers the problem is that there's no such thing as a unicode character, especially now that you have joining characters for emoji that can join an arbitrary number of different emojis into a new one. Having researched this the best idea I found is utf8 everywhere and then you can iterate over the code points. But really I think no language solves this because it's not strictly solvable

Eniko Fox
Public

@JoshJers I did write iterators over code points in C# (forward and backward) but that still doesn't give you actual characters so *farting noises*

Josh Jersild
Public

@eniko yeah and there's not even a really great way to HANDLE that without, like, "here's an array of utf-32 chars that represent the whole visible thing, lol have fun" at every step

At work I wrote the code that does canonical string comparison (i.e. if you have é written as e + accent or as a single character, they'll compare as identical) and it was a nightmare of tables and clever table compression

Josh Jersild

@eniko (actually first became aware of how annoying the problem is because @AbuhRae was working on a thing where she had a Korean filename (Korean can be represented in multiple ways), and one OS was dealing with filenames in composed mode and another was dealing with filenames in decomposed mode and god help you transferring files from one to the next)

Eniko Fox
Public

@JoshJers @AbuhRae *despair activated*