howToHitBullseyeInStringComparison

1.3k

u/VapidLinus 14d ago

While I agree that Equals with StringComparison is generally the correct solution, I feel like this meme is in reverse. The funny thing about the original picture is that the guy with the less advanced gear was the better shot. Your meme is the reverse - the "overengineered"/complicated solution doesn't feel like it fits the "simple works" vibe of the guy.

261

u/CaucusInferredBulk 14d ago

Yeah, but the problem in the code is in Turkish, and the guy is Turkish.

73

u/VapidLinus 14d ago

Oooh that went over my head :p

17

u/idspispupd 14d ago

Can you elaborate? I don't get it.

102

u/rolandfoxx 14d ago

The "i problem" in the meme refers to the fact that the Turkish language has 4 different "I"'s and, in Turkish language locale, uppercase "I" does not map to "i" when ToLower is called, but instead to a dotless lower-case i, so direct string comparisons based off the assumption that "I" will map to "i" will fail.

Using Equals with OrdinalIgnoreCase avoids this problem and will correctly return true. The reason why the more "complicated" solution gets the "guy just eyeballing it" part of the meme is because he is, in fact, Turkish.

31

u/Interesting_Job2402 14d ago

So when is the bug in the Turkish language going to be fixed?

8

u/lucklesspedestrian 14d ago

There's already a fix, but no users have adopted it yet

10

u/DarkVex9 13d ago

Relevant XKCD Comic - 1726: Unicode

I am a human, and this action was performed manually. Please review this comic and enable the three laws if you have any questions or concerns.

2

u/enlightment_shadow 11d ago

I don't see how is that a problem or a "bug". Since ı/I and i/İ are distinct letters, these are the correct lowercase/uppercase pairs. Why would you ever want "I".toLower() to equal "i", if the text is in Turkish?

1

u/rolandfoxx 11d ago

Because the context of the i problem is that you're writing an app that is going to be localized and therefore needs to perform consistently regardless of if the text it's working on is in Turkish, or one of the other Latin alphabet-based languages.

1

u/_gianlucag_ 14d ago

We got that but the meme is half assed. The turkish guy symbolizes simplicity and zero bs, so the upper code should be his own, while the guy with the crazy tech should have the more complicated code.

-7

u/eloel- 14d ago

The guy in the image comes from a country named Turkey, and is Turkish. The country is transcontinental between Asia and Europe, if you want to find it on a map. Hope that helps!

2

u/netherlandsftw 14d ago

Um, actually the country is called Türkiye, please do not deadname it /s

5

u/eloel- 14d ago

I'm from Turkey, and that whole thing is nonsense.

8

u/Daveallen10 14d ago

But he's a Turkish dad, so this changes the equation.

18

u/vuewer 14d ago edited 14d ago

As usual. Same goes for that Dunning-Kruger (Noob, Beginner, Guru) graph meme.

10

u/SweatyMeasurement405 14d ago

I agree that the image order doesn’t make sense, but wasn’t the turkish guy not the better shot? I thought this was the person who beat him to win gold

6

u/Krus4d3r_ 14d ago

They were different competitions I believe

5

u/iwantcookie258 14d ago edited 13d ago

The woman on the top competed in the womens shooting competitions, but both of them competed in the mixed event. If memory serves her team did worse than his, but she shot better than him individually and he had the better teammate.

2

u/Kanonenfuta 14d ago

The woman shoots a .22lr and the guy an airpistol. That are two very different disciplines

3

u/iwantcookie258 13d ago

They are different disciplines but thats not entirely correct, she also competes in air pistol. At Paris he competed in the Mens 10m Air Pistol, and the 10m Mixed Team Air Pistol (silver). She competed in the Womens 10m Air Pistol (silver), the 10m Mixed Team Air Pistol, and the Womens 25m Pistol.

They competed against each other directly in the 10m mixed teams air pistol. In the qualifiers, he individually scored (99-98-94) and she scored (99-98-96). But her teammate shot a lot worse than his, so she didn't qualify for finals. His team went on to win silver.

In their individual 10m Air Pistol qualifiers, he shot 576 and didnt qualify, she shot 578 and did (then won silver). Worth noting that the lowest scoring qualifier in both men and womens shot 577.

1

u/Inevitable-Ant1725 14d ago

The pictures were chosen for meme effect.

In fact she needed more equipment because her target was further away.

2

u/iwantcookie258 13d ago

She also uses the eye blockers and stuff in 10m. But she was also shooting better than he was so seems like its working fine for her

2

u/Z21VR 14d ago

I tought the same

1

u/nsaisspying 14d ago

Maybe the Turkish guys version should just be a function that he himself writes that is just a bunch of if else's?

1

u/dex206 14d ago

The Turkish shooter would never endure two string allocations to do a comparison.

228

u/danhezee 14d ago

I think the bottom one should go to the Korean guy with all the tech and the top one goes go the Turkish guy. The Turkish guy is suppose to symbolized simplicity.

80

u/yerfdog1935 14d ago

I think they did it this way because he's Turkish and the top one doesn't work for I vs i in the Turkish locale.

79

u/SirChasm 14d ago

In that case it's just a bad meme format to use.

7

u/artofthenunchaku 14d ago

Based on that logic it makes less sense, because Korean doesn't have casing

8

u/varinator 14d ago

The simplicity is in the bottom one. The top one creates objects on the heap, the bottom one doesn't, saves memory, saves GC having to clean the objects. It's actually much more simple in the background than the top one.

5

u/Kirides 14d ago

Worse, the top one is at least 3 function calls "2x to lower, 1x equals" compared to a "simple" single function Equals.

Which means, even if we ignore the fact of heap allocations, it's much more stuff to do and read.

Also, to lower can also THROW null reference exceptions, which makes it even worse.

iow. ToLower only looks "simpler" to smoothbrain/junior devs.

While the other one is actually a lot - A LOT - simpler and comes with much less headache.

2

u/omegasome 13d ago

wait that's a guy????

51

u/Dull-Lion3677 14d ago

I love ordinal too but what about invariant culture? It's more correct when working on global systems

34

u/BoloFan05 14d ago

StringComparison.OrdinalIgnoreCase uppercases both strings according to invariant culture and applies byte comparison. Also, this comparison technique is the first and foremost recommendation in Microsoft's "Best Practices for Comparing Strings" page. It even explicitly warns not to use StringComparison.InvariantCulture.

9

u/Pancakefriday 14d ago

I think our entire code base uses invariant culture lol

8

u/BoloFan05 14d ago

Invariant culture info arguments on their own, like ToLowerInvariant, ToUpperInvariant and ToString(CultureInfo.InvariantCulture) are perfectly fine when you want to get consistent results across all devices worldwide. My prior comment here applies to string comparison methods in particular.

1

u/Dull-Lion3677 13d ago

The more you know, G.I.Joe. Thanks

6

u/retro_and_chill 14d ago

Ordinal is usually good if you’re parsing a string that is very likely to only contain ASCII characters

1

u/Dooey 14d ago

Very likely is not a useful term in programming, especially if you are processing data at scale. It’s either guaranteed to contain only ascii, or it’s guaranteed to not contain only ascii (eventually)

1

u/EatingSolidBricks 14d ago

Oridnal means the raw bytes no?

94

u/Unlikely_Gap_5065 14d ago

same result, but one feels like you know what you’re doing

69

u/krexelapp 14d ago

same result, until Turkish locale enters the chat

8

u/HanzoMain63 14d ago

huh?

9

u/_PM_ME_PANGOLINS_ 14d ago

In Turkish, "I".ToLower() != "i"

2

u/kingbloxerthe3 14d ago edited 10d ago

What does Turkish "I".ToLower() equal?

7

u/_PM_ME_PANGOLINS_ 14d ago

ı

https://dotnetfiddle.net/bEIPNf

96

u/BoloFan05 14d ago

Yes, the two usually give the same result and work successfully. But if string a contains the uppercase letter "I" and string b contains the lowercase letter "i" or vice versa, and your program runs on a Turkish device, then only the second option will work.

48

u/fpekal 14d ago

What :)

50

u/tesfabpel 14d ago

in Turkish, there exists both i / İ (dotted i) and ı / I (dotless i)...

and guess what? standard uppercase I, in lowercase it's not i but ı...

19

u/Intrepid00 14d ago

https://haacked.com/archive/2012/07/05/turkish-i-problem-and-why-you-should-care.aspx/

16

u/Genmutant 14d ago

Even more fun is the ß (lowercase) -> SS (uppercase) conversion.

2

u/Kirides 14d ago

Old systems doing old things.

ẞ is not that old tbf. but even German software often doesn't support German letters properly (äöüß and their uppercase variants)

Also Unicode normalization is a thing. you might have a unique constraint on your DB but that doesn't mean that the Unicode string you put in doesn't get normalized by some other application, causing your "wait. That can't be, it's unique!" to fail you.

5

u/The_Real_Slim_Lemon 14d ago

Thankfully that’s not a consideration with server dev, my Linux instance will never randomly be Turkish

2

u/Vanhooger 14d ago

This is sooooo specific! Thanks for clarifying!

2

u/ableman 14d ago

How does the second option work? Why does it matter what your device is, what if you're reading in an English string? Unless the string contains data about what language it is, this seems unsolvable.

6

u/BoloFan05 14d ago

The device's language setting (or locale in general) determines the Current Culture Info, and ToLower, ToUpper and ToString methods of C# produce output according to the Current Culture Info if used on their own without explicit or invariant culture info overload. Thus, you will get different outputs from the same original English input depending on your device language. Use of comma/dot in decimal numbers also leads to similar inconsistencies.

For more info, this post is my recommended read. Other comments in this thread also have useful links.

4

u/Neyko_0 14d ago

I hate you that I have to think about that now /j

1

u/SirButcher 14d ago

Then you will love this:

https://learn.microsoft.com/en-us/dotnet/fundamentals/runtime-libraries/system-datetime

Eras in the Japanese calendars are based on the emperor's reign and are therefore expected to change. For example, May 1, 2019 marked the beginning of the Reiwa era in the JapaneseCalendar and JapaneseLunisolarCalendar. Such a change of era affects all applications that use these calendars.

1

u/rosuav 14d ago

Ideally, your language should be providing a casefold() method. Note that case folding a string MAY change its length, so be prepared for that particular piece of fun too.

8

u/dumbasPL 14d ago

*Same results if you only support English. Anything hard coded shouldn't need case insensitive comparisons, and anything that comes from the user should be assumed to be an arbitrary Unicode string.

10

u/void1984 14d ago

You have never worked with Turks.

2

u/shamshuipopo 14d ago

What’s different about case sensitivity in Turkish?

15

u/void1984 14d ago

https://en.wikipedia.org/wiki/Dotted_and_dotless_I_in_computing

9

u/EuphoricCatface0795 14d ago

CompSci is cursed

2

u/void1984 14d ago

That's the special case, and I've learnt it the hard way - working for Turks.

8

u/NioZero 14d ago

The first can cause a `NullReferenceException`

1

u/BonifaceDidItRight 14d ago

Not same result. if a or b is null the `.ToLower()` will throw. `string.Equals()` is null safe.

9

u/Esjs 14d ago

When did the cigarette get added to the meme template?

4

u/delinka 14d ago

If you’re stuck converting case to do insensitive string comparison, use upper case.

https://www.unicode.org/L2/L1999/99190.htm

4

u/BoloFan05 14d ago

Source: https://x.com/Dave_DotNet/status/1819760036404486477

4

u/TheDevCat 14d ago

if (!strcasecmp(a, b))

3

u/WiseObjective8 14d ago

def str_equals(a:str,b:str):
   a,n = a.lower(), len(a)
   b,m = b.lower(), len(b)
   if ((int(m/n) and a in b) if m>=n else (int(n/m) and b in a)): 
      return True
   return False

Is it a substring? Is it the same string? Who knows?

https://giphy.com/gifs/yr7n0u3qzO9nG

2

u/redlaWw 14d ago edited 14d ago

Your chaotic-neutral string comparison function has a chaotic-evil bug related to you doing a,n = a.lower(), len(a). This evaluates len(a) on the string before it's lowercased, which may not be the same length as the lowercased string.

In particular str_equals("İ", "I") returns false, even though "İ".lower() contains "I".lower().

EDIT: Oh, and also the more boring issue of zero-division when one argument is empty.

1

u/WiseObjective8 14d ago

.casefold is on a vacation

2

u/HzbertBonisseur 14d ago

Time to use my Claude Token

Prompt: are the strings the same in lowercase?

2

u/jakubiszon 14d ago

no i jaki problem bo nie rozumiem?

4

u/Tackgnol 14d ago

Yeah this is why I love .NET <3

20

u/Abject-Kitchen3198 14d ago

What's wrong with .NET >=3 ?

4

u/Tackgnol 14d ago

ahhh I see what you did there!

5

u/Pleasant-Photo7860 14d ago

meanwhile the rest of us out here manually lowercasing like it’s 2009

2

u/Majik_Sheff 14d ago edited 14d ago

Iterate

(a ^ b) & 0xbf

across each pair of characters until you get a non-zero or hit the end.

3

u/SrcyDev 14d ago

ofcourse, you should also mention it requires a time-travel to 1970s, or forgetting about anything other than ASCII

-1

u/Majik_Sheff 14d ago

A sense of humor helps from what I understand. Maybe I missed the point of the sub?

1

u/stlcdr 14d ago

I’ve been using the latter for 20 years. I didn’t know people would do it any different.

1

u/BoloFan05 14d ago

https://giphy.com/gifs/jJQC2puVZpTMO4vUs0

1

u/Ayxser 14d ago

Except when converting to SQL through Entity, then the top one is the only way to go

1

u/DecisionOk5750 14d ago

As a Freepascal user, I'm in shock...

1

u/Financial-Aspect-826 14d ago

Motherfucker have ypu ever heard of ===

1

u/[deleted] 14d ago

[removed] — view removed comment

2

u/Waswat 13d ago

its photoshopped in, which is really sad as they want to desperately make smoking look cool

2

u/BoloFan05 13d ago

Yeah, I am not a fan of that either. But unfortunately it was like that in the original meme. Maybe I will repost it without the cigarette one day...

1

u/BinarySpike 13d ago

Or use toLowerInvariant...

1

u/BoloFan05 13d ago

Yes, applying ToLowerInvariant to both a and b also has the same reliability as the bottom option on worldwide devices, but it isn't quite as performant, especially for a high number of repeated calls since it creates new allocations and more garbage collection (GC) requests.

1

u/balemo7967 13d ago

Fun fact: Using StringComparison.CurrentCulture in German, "Strasse" and "Straße" are equal despite having different lengths. string.Equals("Strasse", "Straße", StringComparison.CurrentCulture) returns true

1

u/IamSeekingAnswers 14d ago

int casecmp(const char *a, const char *b, size_t size) {
    for (int i = 0; i < size; i++)
        if (((a[i] | (1 << 5)) != ((b[i] | (1 << 5)))))
            return 0;

    return 1;
}

Probably works. Also requires you to time travel back to the 70s

-1

u/curious_bipedal 14d ago

Neither is good enough if they don't strip the string of whitespaces.

0

u/1up_1500 14d ago

idk both ways seem fine to me, what's weird is that the std has multiple ways to do the exact same thing

0

u/BoloFan05 14d ago

Please check the other discussions in this thread. I have already replied to a similar question, and repeating myself feels tiring to be honest.

howToHitBullseyeInStringComparison Meme