Assign this keyword in C#
Asked Answered
Y

4

16

Main question is what are the implications of allowing the this keyword to be modified in regards to usefulness and memory; and why is this allowed in the C# language specifications?

The other questions/subparts can be answered or not if choose to do so. I thought answers to them would help clarify the answer to the main question.

I ran across this as an answer to What's the strangest corner case you've seen in C# or .NET?

public struct Teaser
{
    public void Foo()
    {
        this = new Teaser();
    }
}

I've been trying to wrap my head around why the C# language specifications would even allow this. Sub-part 1. is there anything that would justify having this be modifiable? Is it every useful?

One of the comments to that answer was

From CLR via C#: The reason they made this is because you can call the parameterless constructor of a struct in another constructor. If you only want to initialize one value of a struct and want the other values to be zero/null (default), you can write public Foo(int bar){this = new Foo(); specialVar = bar;}. This is not efficient and not really justified (specialVar is assigned twice), but just FYI. (That's the reason given in the book, I don't know why we shouldn't just do public Foo(int bar) : this())

Sub-part 2. I'm not sure I follow that reasoning. Can someone clarify what he meant? Maybe a concrete example of how it would be used?

EDIT (Disregard stack or heap main point is in regards to memory release or garbage collection. Instead of the int[] you could replace that with 262144 public int fields) Also from my understanding structs are created on the stack as opposed to the heap if this struct were to have a 1 Mb byte array field initialized as so

public int[] Mb = new int[262144];

Sub-part 3. does this ever get removed from the stack when Foo is called? To me it seems since the struct never went out of scope it would not be removed from the stack. Don't have time tonight to create a test case but maybe I will for this one tomorrow.

In the below code

Teaser t1 = new Teaser();
Teaser tPlaceHolder = t1;
t1.Foo();

Sub-part 4. Are t1 and tPlaceHolder occupying the same or different address space?

Sorry to bring up a 3 year old post but this one has really got my head scratching.

FYI first question on stackoverflow so if I got something wrong with the question kindly post a comment and I will edit.

After 2 days I'll put a bounty of 50 on this question even if I have a winner chosen in my mind already as I think the answer will require a reasonable amount of work to explain the questions.

Youmans answered 6/4, 2012 at 2:54 Comment(14)
4 questions in 1 are not appropriate for Stack Overflow. One question should be one question.Examination
@AnthonyPegram - In thise case i think its acceptable since these questions are subquestions towards the main question which is how structs really workLanti
My apologies not sure how to reword it all into one. The main question I suppose can be summed up by what are the side affects of allowing the this keyword to modified? The Questions 3-4 are side effects that I thought might be a possibility. If anyone has power to edit question besides me and thinks they can ask it better please feel free to do so. Or leave comment suggesting a better question and I'll start from scratch.Youmans
Also was why I was going to put a bounty on it regardless because I feel it deserved the extra points for the multi-part question.Youmans
be careful making assumptions about what goes on "the stack" vs "the heap" as those assumptions are often wrong: blogs.msdn.com/b/ericlippert/archive/2009/04/27/…Bogtrotter
omg, I see the question police are out in force again. This is a perfectly legitimate and real question that has concrete and objective answers, even if it includes four questions.Curium
1. It's because this needs to be a ref parameter otherwise you could never modify the current instance of a struct by passing it to a method that mutates it because struct are passed by value. So, we need to be able to pass the current instance by ref, and to do that this has to be a ref parameter. In fact, the spec says it is. Once it's ref, you can assign to it. 2. No, that's not right. The reason I gave in 1. is the reason.Curium
@Jason I know this was my first question as well and really wanted an answer. I'm off to bed but will reword the question so that it fits within the policies tomorrow morning.Youmans
3. Your understanding of struct/stack reference/heap is wrong. Short-lived objects go in short-term memory (stack, registers) and long-lived objects go in long-term memory (heap). Sometimes struct are short-lived, sometimes they are not (if they are fields of a class, for example). Just because a struct has a field that is long-lived does not mean that the struct itself is long-lived. Don't confuse scope and lifetime, it's an abuse of the terminology. "Scope" of a named variable means "the region of the program where I can refer to that variable by its simple name."Curium
4. Assignments create copies in C#. Therefore tPlaceHolder and t1 are different storage locations.Curium
@Jason yeah the two other answers that got in before this was closed mentioned that. I'm still confused as to if you only have one instance t1 and you call t1.foo does the original memory of t1 get cleaned up but I'll try to word that into my question tomorrow carefully not asking more than 1 question :)Youmans
Oh, missed that question. Sorry. Since this is a ref to the storage for the current instance, the space is reused when you assign to this in an instance method.Curium
BTW is it acceptable to use a template for my question giving relevant details as a whole. And then asking a new stackoverflow question for each of my subparts giving relevant detail for each of my subparts. Or will the police flag that as duplicate question?Youmans
let us continue this discussion in chatYoumans
B
14

First of all, I think you should start by examining if you're even asking the right question. Perhaps we should be asking, "Why would C# not allow assignment to this in a struct?"

Assigning to the this keyword in a reference type is potentially dangerous: you are overwriting a reference to the object who's method you are running; you could even be doing so within the constructor that is initializing that reference. Its not clear what the behavior of that ought to be. To avoid having to figure that out, since it is not generally useful, it's not allowed by the spec (or compiler).

Assigning to the this keyword in a value type, however, is well defined. Assignment of value types is a copy operation. The value of each field is recursively copied over from right to left side of the assignment. This is a perfectly safe operation on a structure, even in a constructor, because the original copy of the structure is still present, you are just changing its data. It is exactly equivalent to manually setting each field in the struct. Why should the spec or compiler forbid an operation that is well-defined and safe?

This, by the way, answers one of your sub-questions. Value type assignment is a deep copy operation, not a reference copy. Given this code:

Teaser t1 = new Teaser();
Teaser tPlaceHolder = t1;
t1.Foo();

You have allocated two copies of your Teaser structure, and copied the values of the fields in the first into the fields in the second. This is the nature of value types: two types that have identical fields are identical, just like two int variables that both contain 10 are identical, regardless of where they are "in memory".

Also, this is important and worth repeating: careful making assumptions about what goes on "the stack" vs "the heap". Value types end up on the heap all the time, depending on the context in which they are used. Short-lived (locally scoped) structs that are not closed over or otherwise lifted out of their scope are quite likely to get allocated onto the stack. But that is an unimportant implementation detail that you should neither care about nor rely on. The key is that they are value types, and behave as such.

As far as how useful assignment to this really is: not very. Specific use cases have been mentioned already. You can use it to mostly-initialize a structure with default values but specify a small number. Since you are required to set all fields before your constructor returns, this can save a lot of redundant code:

public struct Foo
{
  // Fields etc here.

  public Foo(int a)
  {
    this = new Foo();
    this.a = a;
  }
}

It can also be used to perform a quick swap operation:

public void SwapValues(MyStruct other)
{
  var temp = other;
  other = this;
  this = temp;
}

Beyond that, its just an interesting side-effect of the language and the way that structures and value types are implemented that you will most likely never need to know about.

Bogtrotter answered 6/4, 2012 at 4:9 Comment(2)
+1 for attempting before it was closed. I'll try to reword this question sometime tomorrow.Youmans
I decided to vote this is answer and not reopen a new one. The copy by value is what made it finally sink in. I guess in my mind I have always thought of "this" as being reference and wasn't thinking in terms of "this" as referring to value type when dealing with structs. I don't think of even used the this keyword in any of the structs I have made.Youmans
L
1

Having this assignable allows for 'advanced' corner cases with structs. One example i found was a swap method:

struct Foo 
{
    void Swap(ref Foo other)
    {
         Foo temp = this;
         this = other;
         other = temp;
    }
}

I would strongly argue against this use since it violates the default 'desired' nature of a struct which is immutability. The reason for having this option around is arguably unclear.

Now when it comes to structs themselfs. They differ from classes in a few ways:

  • They can live on the stack rather than the managed heap.
  • They can be marshaled back to unmanaged code.
  • They can not be assigned to a NULL value.

For a complete overview, see: http://www.jaggersoft.com/pubs/StructsVsClasses.htm

Relative to your question is whether your struct lives on the stack or the heap. This is determined by the allocation location of a struct. If the struct is a member of a class, it will be allocated on the heap. Else if a struct is allocated directly, it will be allocated on the heap (Actually this is only a part of the picture. This whole will get pretty complex once starting to talk about closures introduced in C# 2.0 but for now it's sufficient in order to answer your question).

An array in .NET is be default allocated on the heap (this behavior is not consistent when using unsafe code and the stackalloc keyword). Going back to the explanation above, that would indicate that the struct instances also gets allocated on the heap. In fact, an easy way of proving this is by allocating an array of 1 mb in size and observe how NO stackoverflow exception is thrown.

The lifetime for an instance on the stack is determined by it's scope. This is different from an instance on the manager heap which lifetime is determined by the garbage collector (and whether there are still references towards that instance). You can ensure that anything on the stack lives as long as it's within scope. Allocating an instance on the stack and calling a method wont deallocate that instance until that instance gets out of scope (by default when the method wherein that instance was declared ends).

A struct cant have managed references towards it (pointers are possible in unmanaged code). When working with structs on the stack in C#, you basically have a label towards an instance rather than a reference. Assigning one struct to another simply copies the underlying data. You can see references as structs. Naively put, a reference is nothing more than a struct containing a pointer to a certain part in memory. When assigning one reference to the other, the pointer data gets copied.

// declare 2 references to instances on the managed heap
var c1 = new MyClass();
var c2 = new MyClass();

// declare 2 labels to instances on the stack
var s1 = new MyStruct();
var s2 = new MyStruct();

c1 = c2; // copies the reference data which is the pointer internally, c1 and c2 both point to the same instance
s1 = s2; // copies the data which is the struct internally, c1 and c2 both point to their own instance with the same data
Lanti answered 6/4, 2012 at 3:41 Comment(5)
I guess regardless of it's copied to the stack or to the heap when calling Foo will the GC eventually clean up the memory associated with the original struct? Just to clarify I am speaking of strictly managed code. Ohh and by Foo I meant the Foo in the Teaser struct not your Foo struct.Youmans
The GC will only clean up instances allocated on the managed heap. Stack based instances are actually not really cleaned up (hence, a struct cant contain a destructor). Their label will simply become invalid once out of scope. See eric lippert's answer in this post: #6441718Lanti
structs are value types; they are never explicitly garbage collected. The memory for value types is returned to the OS when the value type leaves scope; if it's a local variable or parameter, this happens then the function returns. If it's a field, it happens when its container class is GC'd. Etc.Bogtrotter
refer to my edit of replacing the int[] with N amount of public int fields. When calling t1.Foo does the memory for those ints ever get cleaned up / unallocated / freed off of the stack?Youmans
+1 for attempting before it was closed. I'll try to reword this question sometime tomorrow.Youmans
C
0

You can take advantage of this and mutate an immutable structure

public struct ImmutableData
{
    private readonly int data;
    private readonly string name;

    public ImmutableData(int data, string name)
    {
        this.data = data;
        this.name = name;
    }

    public int Data { get => data; }
    public string Name { get => name; }

    public void SetName(string newName)
    {
        // this wont work
        // this.name = name; 

        // but this will
        this = new ImmutableData(this.data, newName);
    }

    public override string ToString() => $"Data={data}, Name={name}";
}

class Program
{
    static void Main(string[] args)
    {
        var X = new ImmutableData(100, "Jane");
        X.SetName("Anne");

        Debug.WriteLine(X);
        // "Data=100, Name=Anne"
    }
}

This is advantageous as you can implement IXmlSerializable and maintain the robustness of immutable structures, while allowing serialization (that happens one property at a time).

Just two methods in the above example to achieve this:

    public void ReadXml(XmlReader reader)
    {
        var data = int.Parse(reader.GetAttribute("Data"));
        var name = reader.GetAttribute("Name");

        this = new ImmutableData(data, name);
    }
    public void WriteXml(XmlWriter writer)
    {
        writer.WriteAttributeString("Data", data.ToString());
        writer.WriteAttributeString("Name", name);
    }

which creates the followng xml file

<?xml version="1.0" encoding="utf-8"?>
<ImmutableData Data="100" Name="Anne" />

and can be read with

        var xs = new XmlSerializer(typeof(ImmutableData));
        var fs = File.OpenText("Store.xml");
        var Y = (ImmutableData)xs.Deserialize(fs);
        fs.Close();
Chaves answered 19/12, 2019 at 21:16 Comment(0)
J
0

I came across this when I was looking up how System.Guid was implemented, because I had a similar scenario.

Basically, it does this (simplified):

struct Guid
{
    Guid(string value)
    {
        this = Parse(value);
    }
}

Which I think is a pretty neat solution.

Jesselyn answered 28/6, 2022 at 10:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.