How are strings passed in .NET?
Asked Answered
S

3

151

When I pass a string to a function, is a pointer to the string's contents passed, or is the entire string passed to the function on the stack like a struct would be?

Spandex answered 29/5, 2012 at 3:3 Comment(0)
W
342

A reference is passed; however, it's not technically passed by reference. This is a subtle, but very important distinction. Consider the following code:

void DoSomething(string strLocal)
{
    strLocal = "local";
}
void Main()
{
    string strMain = "main";
    DoSomething(strMain);
    Console.WriteLine(strMain); // What gets printed?
}

There are three things you need to know to understand what happens here:

  1. Strings are reference types in C#.
  2. They are also immutable, so any time you do something that looks like you're changing the string, you aren't. A completely new string gets created, the reference is pointed at it, and the old one gets thrown away.
  3. Even though strings are reference types, strMain isn't passed by reference. It's a reference type, but the reference itself is passed by value. Any time you pass a parameter without the ref keyword (not counting out parameters), you've passed something by value.

So that must mean you're...passing a reference by value. Since it's a reference type, only the reference was copied onto the stack. But what does that mean?

Passing reference types by value: You're already doing it

C# variables are either reference types or value types. C# parameters are either passed by reference or passed by value. Terminology is a problem here; these sound like the same thing, but they're not.

If you pass a parameter of ANY type, and you don't use the ref keyword, then you've passed it by value. If you've passed it by value, what you really passed was a copy. But if the parameter was a reference type, then the thing you copied was the reference, not whatever it was pointing at.

Here's the first line of the Main method:

string strMain = "main";

We've created two things on this line: a string with the value main stored off in memory somewhere, and a reference variable called strMain pointing to it.

DoSomething(strMain);

Now we pass that reference to DoSomething. We've passed it by value, so that means we made a copy. It's a reference type, so that means we copied the reference, not the string itself. Now we have two references that each point to the same value in memory.

Inside the callee

Here's the top of the DoSomething method:

void DoSomething(string strLocal)

No ref keyword, so strLocal and strMain are two different references pointing at the same value. If we reassign strLocal...

strLocal = "local";   

...we haven't changed the stored value; we took the reference called strLocal and aimed it at a brand new string. What happens to strMain when we do that? Nothing. It's still pointing at the old string.

string strMain = "main";    // Store a string, create a reference to it
DoSomething(strMain);       // Reference gets copied, copy gets re-pointed
Console.WriteLine(strMain); // The original string is still "main" 

Immutability

Let's change the scenario for a second. Imagine we aren't working with strings, but some mutable reference type, like a class you've created.

class MutableThing
{
    public int ChangeMe { get; set; }
}

If you follow the reference objLocal to the object it points to, you can change its properties:

void DoSomething(MutableThing objLocal)
{
     objLocal.ChangeMe = 0;
} 

There's still only one MutableThing in memory, and both the copied reference and the original reference still point to it. The properties of the MutableThing itself have changed:

void Main()
{
    var objMain = new MutableThing();
    objMain.ChangeMe = 5; 
    Console.WriteLine(objMain.ChangeMe); // it's 5 on objMain

    DoSomething(objMain);                // now it's 0 on objLocal
    Console.WriteLine(objMain.ChangeMe); // it's also 0 on objMain   
}

Ah, but strings are immutable! There's no ChangeMe property to set. You can't do strLocal[3] = 'H' in C# like you could with a C-style char array; you have to construct a whole new string instead. The only way to change strLocal is to point the reference at another string, and that means nothing you do to strLocal can affect strMain. The value is immutable, and the reference is a copy.

Passing a reference by reference

To prove there's a difference, here's what happens when you pass a reference by reference:

void DoSomethingByReference(ref string strLocal)
{
    strLocal = "local";
}
void Main()
{
    string strMain = "main";
    DoSomethingByReference(ref strMain);
    Console.WriteLine(strMain);          // Prints "local"
}

This time, the string in Main really does get changed because you passed the reference without copying it on the stack.

So even though strings are reference types, passing them by value means whatever goes on in the callee won't affect the string in the caller. But since they are reference types, you don't have to copy the entire string in memory when you want to pass it around.

Further resources:

Wulfe answered 29/5, 2012 at 3:45 Comment(10)
@TheLight - Sorry, but you're incorrect here when you say: "A reference type is passed by reference by default." By default, all parameters are passed by value, but with reference types, this means that the reference is passed by value. You're conflating reference types with reference parameters, which is understandable because it's a very confusing distinction. See the Passing Reference Types by Value section here. Your linked article is quite correct, but it actually supports my point.Wulfe
@JustinMorgan Not to bring up a a dead comment thread, but I think TheLight's comment makes sense if you think in C. In C, data is just a block of memory. A reference is a pointer to that block of memory. If you pass the entire block of memory to a function, that's called "passing by value". If you pass the pointer it's called "passing by reference". In C#, there is no notion of passing in the entire block of memory, so they redefined "passing by value" to mean passing the pointer in. That seems wrong, but a pointer is just a block of memory too! To me, the terminology is pretty arbitraryFormaldehyde
@roliu - The problem is that we're not working in C, and C# is extremely different despite its similar name and syntax. For one thing, references are not the same as pointers, and thinking of them that way can lead to pitfalls. The biggest problem, though, is that "passing by reference" has a very specific meaning in C#, requiring the ref keyword. To prove that passing by reference makes a difference, see this demo: rextester.com/WKBG5978Wulfe
@JustinMorgan I agree that mixing C and C# terminology is bad, but, while I enjoyed lippert's post, I don't agree that thinking of references as pointers particularly fogs up anything here. The blog post describes how thinking of a reference as a pointer gives it too much power. I'm aware that the ref keyword has utility, I was just trying to explain why one might think of passing a reference type by value in C# seems like the "traditional" (i.e. C) notion of passing by reference (and passing a reference type by reference in C# seems more like passing a reference to a reference by value).Formaldehyde
@roliu - You're right, they have similarities, and it's hard to avoid the comparison when coming (as many of us did) from C/C++ to C#. I think we agree on most things except the importance of terminology. For less confusion, we could talk about Call-by-Sharing (C#) vs Call-by-Reference (C). Actually, this question is a perfect example of why it's important: With call-by-reference semantics, the original string would be changed by operations in the callee.Wulfe
@adamnationx, if you read this - I saw your suggested edit and fixed the typo you found. Great catch, thank you.Wulfe
You are correct, but I think @roliu was referencing how a function such as Foo(string bar) could be thought of as Foo(char* bar) whereas Foo(ref string bar) would be Foo(char** bar) (or Foo(char*& bar) or Foo(string& bar) in C++). Sure, it's not how you should think of it everyday, but it actually helped me finally understand what is happening under the hood.Spandex
Actually I see no differences in passing as parameter between string and any other reference type. I can't find any special in the specification or in Lippert's blog about passing it. As stated by Lippert , there is 3rd kind of value - references. "We see that references and instances of value types are essentially the same thing as far as their storage is concerned; they go on either the stack, in registers, or the heap depending on whether the storage of the value needs to be short-lived or long-lived."Melodramatize
Actually, (learn.microsoft.com](learn.microsoft.com/en-us/dotnet/csharp/… implies that "passing by value" is just the name for passing without ref, out etc.Melodramatize
@КоеКто - You're correct, there's nothing special about strings when passed as a parameter. I didn't mean to imply otherwise. It's just that strings' immutability makes it easier to reason about them in this case.Wulfe
B
28

Strings in C# are immutable reference objects. This means that references to them are passed around (by value), and once a string is created, you cannot modify it. Methods that produce modified versions of the string (substrings, trimmed versions, etc.) create modified copies of the original string.

Blat answered 29/5, 2012 at 3:5 Comment(0)
A
14

Strings are special cases. Each instance is immutable. When you change the value of a string you are allocating a new string in memory.

So only the reference is passed to your function, but when the string is edited it becomes a new instance and doesn't modify the old instance.

Anew answered 29/5, 2012 at 3:6 Comment(7)
Strings are not a special case in this aspect. It is very easy to create immutable objects which could have the same semantics. (That is, an instance of a type which does not expose a method to mutate it...)Polypropylene
Strings are special cases - they are effectively immutable reference types that appear to be mutable in that they behave like value types.Anew
StringBuilder is a mutable string class that allows fast modification of the string being built without allocating new strings in memory for each modification.Anew
@Anew By that logic then Uri (class) and Guid (struct) are also special cases. I do not see how System.String acts like a "value type" any more than other immutable types... of either class or struct origins.Polypropylene
@pst - Strings have special creation semantics - unlike Uri & Guid - you can just assign a string-literal value to a string variable. The string appears to be mutable, like an int being reassigned, but it's creating an object implicitly - no new keyword.Anew
String is a special case, but that has no relevance to this question. Value type, reference type, whatever type will all act the same in this question.Adulation
The only thing that makes strings a special case is that C# supports writing them as literals, and as @KirkBroadhurst points out, that's not relevant. Everything else, including their "value-type-like" behavior (by which I assume you mean things like == comparing them by value) can be easily replicated in user-defined types. I would not describe them as behaving like value types.Wulfe

© 2022 - 2024 — McMap. All rights reserved.