What exactly are strings in Nim?
Asked Answered
P

1

12

From what I understand, strings in Nim are basically a mutable sequence of bytes and that they are copied on assignment.

Given that, I assumed that sizeof would tell me (like len) the number of bytes, but instead it always gives 8 on my 64-bit machine, so it seems to be holding a pointer.

Given that, I have the following questions...

  • What was the motivation behind copy on assignment? Is it because they're mutable?

  • Is there ever a time when it isn't copied when assigned? (I assume non-var function parameters don't copy. Anything else?)

  • Are they optimized such that they only actually get copied if/when they're mutated?

  • Is there any significant difference between a string and a sequence, or can the answers to the above questions be equally applied to all sequences?

  • Anything else in general worth noting?

Thank you!

Psychoactive answered 1/4, 2015 at 20:15 Comment(4)
for completeness, the nim string definition lives on github which may help explain why it reports what it reports.Grajeda
@Mike'Pomax'Kamermans: Thanks for the link, but I don't know what string* {.magic: String.} is trying to tell me. ;-)Psychoactive
yeah, not too sure myself either, just figured it might be good info to have in the question as a possible lead for people better at discovering what's up than us =)Grajeda
Ah, I see what you're getting at. :-)Psychoactive
N
14

The definition of strings actually is in system.nim, just under another name:

type
  TGenericSeq {.compilerproc, pure, inheritable.} = object
    len, reserved: int
  PGenericSeq {.exportc.} = ptr TGenericSeq
  UncheckedCharArray {.unchecked.} = array[0..ArrayDummySize, char]
  # len and space without counting the terminating zero:
  NimStringDesc {.compilerproc, final.} = object of TGenericSeq
    data: UncheckedCharArray
  NimString = ptr NimStringDesc

So a string is a raw pointer to an object with a len, reserved and data field. The procs for strings are defined in sysstr.nim.

The semantics of string assignments have been chosen to be the same as for all value types (not ref or ptr) in Nim by default, so you can assume that assignments create a copy. When a copy is unneccessary, the compiler can leave it out, but I'm not sure how much that is happening so far. Passing strings into a proc doesn't copy them. There is no optimization that prevents string copies until they are mutated. Sequences behave in the same way.

You can change the default assignment behaviour of strings and seqs by marking them as shallow, then no copy is done on assignment:

var s = "foo"
shallow s
No answered 2/4, 2015 at 11:11 Comment(2)
Thanks def-. That's very helpful and it seems shallow will be useful.Psychoactive
besides shallow which enables the "copy as pointer" behavior for all future copies, there is also a shallowCopy operation that can be used only in specific places. Under the hood, stings managed this way will share the same ref-counted memory allocation.Yellowish

© 2022 - 2024 — McMap. All rights reserved.