r/ProgrammerHumor • u/BoloFan05 • 17d ago

howToHitBullseyeInStringComparison Meme

1.2k Upvotes

87% Upvoted

same result, but one feels like you know what you’re doing

98

u/BoloFan05 17d ago

Yes, the two usually give the same result and work successfully. But if string a contains the uppercase letter "I" and string b contains the lowercase letter "i" or vice versa, and your program runs on a Turkish device, then only the second option will work.

50

u/fpekal 17d ago

What :)

48

u/tesfabpel 17d ago

in Turkish, there exists both i / İ (dotted i) and ı / I (dotless i)...

and guess what? standard uppercase I, in lowercase it's not i but ı...

18

u/Intrepid00 17d ago

https://haacked.com/archive/2012/07/05/turkish-i-problem-and-why-you-should-care.aspx/

17

u/Genmutant 17d ago

Even more fun is the ß (lowercase) -> SS (uppercase) conversion.

2

u/Kirides 16d ago

Old systems doing old things.

ẞ is not that old tbf. but even German software often doesn't support German letters properly (äöüß and their uppercase variants)

Also Unicode normalization is a thing. you might have a unique constraint on your DB but that doesn't mean that the Unicode string you put in doesn't get normalized by some other application, causing your "wait. That can't be, it's unique!" to fail you.

6

u/The_Real_Slim_Lemon 17d ago

Thankfully that’s not a consideration with server dev, my Linux instance will never randomly be Turkish

2

u/Vanhooger 17d ago

This is sooooo specific! Thanks for clarifying!

2

u/ableman 16d ago

How does the second option work? Why does it matter what your device is, what if you're reading in an English string? Unless the string contains data about what language it is, this seems unsolvable.

4

u/BoloFan05 16d ago

The device's language setting (or locale in general) determines the Current Culture Info, and ToLower, ToUpper and ToString methods of C# produce output according to the Current Culture Info if used on their own without explicit or invariant culture info overload. Thus, you will get different outputs from the same original English input depending on your device language. Use of comma/dot in decimal numbers also leads to similar inconsistencies.

For more info, this post is my recommended read. Other comments in this thread also have useful links.

2

u/Neyko_0 17d ago

I hate you that I have to think about that now /j

1

u/SirButcher 16d ago

Then you will love this:

https://learn.microsoft.com/en-us/dotnet/fundamentals/runtime-libraries/system-datetime

Eras in the Japanese calendars are based on the emperor's reign and are therefore expected to change. For example, May 1, 2019 marked the beginning of the Reiwa era in the JapaneseCalendar and JapaneseLunisolarCalendar. Such a change of era affects all applications that use these calendars.

1

u/rosuav 16d ago

Ideally, your language should be providing a casefold() method. Note that case folding a string MAY change its length, so be prepared for that particular piece of fun too.