What are the groups of four dashes in the .NET reference source code?
Asked Answered
E

1

12

I was browsing the source of the PluralizationService when I noticed something odd. In the class there are a couple of private dictionaries reflecting different pluralisation rules. For example:

    private string[] _uninflectiveWordList =
        new string[] { 
            "bison", "flounder", "pliers", "bream", "gallows", "proceedings", 
            "breeches", "graffiti", "rabies", "britches", "headquarters", "salmon", 
            "carp", "----", "scissors", "ch----is", "high-jinks", "sea-bass", 
            "clippers", "homework", "series", "cod", "innings", "shears", "contretemps", 
            "jackanapes", "species", "corps", "mackerel", "swine", "debris", "measles", 
            "trout", "diabetes", "mews", "tuna", "djinn", "mumps", "whiting", "eland", 
            "news", "wildebeest", "elk", "pincers", "police", "hair", "ice", "chaos",
            "milk", "cotton", "pneumonoultramicroscopicsilicovolcanoconiosis",
            "information", "aircraft", "scabies", "traffic", "corn", "millet", "rice", 
            "hay", "----", "tobacco", "cabbage", "okra", "broccoli", "asparagus", 
            "lettuce", "beef", "pork", "venison", "mutton",  "cattle", "offspring", 
            "molasses", "shambles", "shingles"};

What are the groups of four dashes in the strings? I did not them see handled in the code, so they're not some kind of a template. The only thing I can think of is that those are censored expletives ('ch----is' would be 'chassis'), which in this case is actually hurting the readability. Did anyone else come across this? If I were to be interested in the actual full list, how would I view it?

Eellike answered 23/11, 2015 at 14:8 Comment(7)
Don't know for certain, but my guess would be that it's some kind of placeholder as a wildcard (e.g. matching pattern that consists of ch, then 4 characters, then is would match).Eyecup
"pneumonoultramicroscopicsilicovolcanoconiosis" I'm guessing the tester who found that one got a good laugh out of the bug report, and the developer who fixed it laughed back... (its the longest word in the english language according to Wikipedia)Oarsman
My best guess would be a pattern match where the letters themselves didn't matter but the length did, for example: cat, hat, bat if it didn't match the other cases could be lumped together in the dash pattern and pluralized the same. Just a guess though.Vinia
That's WAY more grammar related code than I ever expected to find in a ORM library.Cynthea
I can only think of one word (Trapezium) that matches t----zium (which is another word from the same file, so it does look like it is censoring certain words.Cancellation
#30632126Nix
"Two cabbages" is really found to be less likely to be correct than "two cabbage"?Palenque
C
6

From using Reflector to look at the decompiled code I can verify that the compiled version doesn't have "----" in there and it does indeed seem to be some kind of censorship somewhere along the way. The decompiled code has this in the constructor:

this._uninflectiveWordList = new string[] { 
    "bison", "flounder", "pliers", "bream", "gallows", "proceedings", "breeches", "graffiti", "rabies", "britches", "headquarters", "salmon", "carp", "herpes", "scissors", "chassis", 
    "high-jinks", "sea-bass", "clippers", "homework", "series", "cod", "innings", "shears", "contretemps", "jackanapes", "species", "corps", "mackerel", "swine", "debris", "measles", 
    "trout", "diabetes", "mews", "tuna", "djinn", "mumps", "whiting", "eland", "news", "wildebeest", "elk", "pincers", "police", "hair", "ice", "chaos", 
    "milk", "cotton", "pneumonoultramicroscopicsilicovolcanoconiosis", "information", "aircraft", "scabies", "traffic", "corn", "millet", "rice", "hay", "hemp", "tobacco", "cabbage", "okra", "broccoli", 
    "asparagus", "lettuce", "beef", "pork", "venison", "mutton", "cattle", "offspring", "molasses", "shambles", "shingles"
 };

As you can see the censored words are "herpes", "chassis" and "hemp" (if I've followed along correctly). None of which I personally think need censoring which suggests it is some kind of automated system doing it. I would assume that the original source has them in rather than them being added in some kind of precompile merge (if nothing else because "----" really isn't enough for anything to say what it should be replaced with). I'd imagine for some reason the reference website gets them censored.

Hans Passant also in comments linked to an answer to a very similar question: What does ----s mean in the context of StringBuilder.ToString()? . This explains that "The source code for the published Reference Source is pushed through a filter that removes objectionable content from the source".

Confess answered 23/11, 2015 at 14:49 Comment(6)
ass, not chassis. It will probably make somebody blush.Nix
You are right that "ass" is what was removed. I was referring to what the full words were.Confess
@PeterM: Seems that way but I'm not really in a position to say for sure what they did and why it seemed to make such a mess of it without insider knowledge of the process they use. It seems likely though that it is an automated process being rubbish. At least it reassures me that Skynet isn't likely to be a problem for a while. ;-)Confess
@PeterM do you mean "cl----ic" filtering?Divergent
@RobLang Ironically (and by that I mean clbuttic being mentioned on SO) blog.codinghorror.com/…Carnassial
I remember the chat for an online browser game running a filter that just stripped obsenity. Refering to an assassin (or "an in") became rather hard which was troublesome since a unit in the game was called an assassin...Confess

© 2022 - 2024 — McMap. All rights reserved.