How to prove "copy-on-write" on String type in Swift
Asked Answered
G

1

11

As the title said, I tried to prove myself that COW(copy on write) is supported for String in Swift. But I cannot find a proof. I proved the COW on Array and Dictionary after trying the following codes:

func address(of object: UnsafeRawPointer) -> String {
    let addr = Int(bitPattern: object)
    return String(format: "%p", addr)
}

var xArray = [20, 30, 40, 50, 60]
var yArray = xArray

// These two addresses were the same
address(of: xArray) 
address(of: yArray)

yArray[0] = 200
// The address of yArray got changed
address(of: yArray)

But for String type, it was not working.

var xString = "Hello World"
var yString = xString

// These two addresses were different
address(of: xString)
address(of: yString)

And I dumped the test function from the official Swift code repo.

func _rawIdentifier(s: String) -> (UInt, UInt) {
    let tripe = unsafeBitCast(s, to: (UInt, UInt, UInt).self)
    let minusCount = (tripe.0, tripe.2)
    return minusCount
}

But this function seems to only cast the actual value pointed to not the address. So two different String variables with the same value would have the same rawIdentifier. Still cannot prove COW to me.

var xString = "Hello World"
var yString = "Hello" + " World" 

// These two rawIdentifiers were the same
_rawIdentifier(s: xString)
_rawIdentifier(s: yString)

So how does COW work on String type in Swift?

Garderobe answered 14/10, 2017 at 17:26 Comment(7)
You could just look at the source code: github.com/apple/swift/blob/master/stdlib/public/core/…Commonage
Apparently the compiler recognizes that "Hello World" and "Hello" + " World" are the same string literal, and creates only a single storage for them. Try the same with different strings.Spermophyte
The compiler does constant folding even in unoptimised builds (it's a so-called "guaranteed optimisation"), so as Martin says, "Hello" + " World" is folded into "Hello World". You could also do var yString = "Hello"; yString += " World" to notice a difference (the buffer will also gain an owner in that case, as it's now dynamically, rather than statically, allocated)Violante
A word of caution about _rawIdentifier: Using unsafeBitCast bypasses retain count operations, so it is fully possible for you to be looking at dangling pointers (i.e the string buffer gets released while you're still looking at the string value) – attempting to dereference those would then be undefined behaviour. In the code you took it from, the callers used _fixLifetime in order to guarantee this couldn't happen. You can use withExtendedLifetime(_:_:) to ensure this too (or use an UnsafeMutablePointer and rebind the memory).Violante
@Hamish: But _rawIdentifier() does not dereference the pointer, or does it?Spermophyte
@MartinR It doesn't; but OP should be careful what he does with the result (i.e don't use the values a bit patterns for UnsafePointers and then try to inspect the pointees). It was more of a pre-emptive warning :)Violante
@Commonage it's deep blue see, can you please please point what code to look forRust
S
14

The compiler creates only a single storage for both "Hello World" and "Hello" + " World".

You can verify that for example by examining the assembly code obtained from

swiftc -emit-assembly cow.swift

which defines only a single string literal

    .section    __TEXT,__cstring,cstring_literals
L___unnamed_1:
    .asciz  "Hello World"

As soon as the string is mutated, the address of the string storage buffer (the first member of that "magic" tuple, actually _baseAddress of struct _StringCore, defined in StringCore.swift) changes:

var xString = "Hello World"
var yString = "Hello" + " World"

print(_rawIdentifier(s: xString)) // (4300325536, 0)
print(_rawIdentifier(s: yString)) // (4300325536, 0)

yString.append("!")
print(_rawIdentifier(s: yString)) // (4322384560, 4322384528)

And why does your

func address(of object: UnsafeRawPointer) -> String

function show the same values for xArray and yArray, but not for xString and yString?

Passing an array to a function taking a unsafe pointer passes the address of the first array element, that is the same for both arrays if they share the storage.

Passing a string to a function taking an unsafe pointer passes a pointer to a temporary UTF-8 representation of the string. That address can be different in each call, even for the same string.

This behavior is documented in the "Using Swift with Cocoa and Objective-C" reference for UnsafePointer<T> arguments, but apparently works the same for UnsafeRawPointer arguments.

Spermophyte answered 14/10, 2017 at 18:18 Comment(2)
For anyone's that interested in the second number in the tuple, and why it changes on appending: that's a pointer to the string buffer's owner. The buffer has an owner when it's dynamically allocated (but not when statically allocated, such as is the case with a string literal). The owner is just a class instance that takes care of the reference counting for the buffer. When the owner is deallocated, the buffer is released.Violante
I believe under the hood for native storage, a single ManagedBuffer instance is used for a dynamic string buffer; so the raw buffer pointer (first element of tuple) just points to the start of the body of that instance, and the owner points to the start of the header.Violante

© 2022 - 2024 — McMap. All rights reserved.