Unpack multiple variables from sequence
Asked Answered
E

5

7

I am expecting the code below to print chr7.

import strutils

var splitLine = "chr7    127471196  127472363  Pos1  0  +".split()
var chrom, startPos, endPos = splitLine[0..2]
echo chrom

Instead it prints @[chr7, 127471196, 127472363].

Is there a way to unpack multiple values from sequences at the same time?

And what would the tersest way to do the above be if the elements weren't contiguous? For example:

var chrom, startPos, strand = splitLine[0..1, 5]

Gives the error:

read_bed.nim(8, 40) Error: type mismatch: got (seq[string], Slice[system.int], int literal(5))
but expected one of:
system.[](a: array[Idx, T], x: Slice[system.int])
system.[](s: string, x: Slice[system.int])
system.[](a: array[Idx, T], x: Slice[[].Idx])
system.[](s: seq[T], x: Slice[system.int])

  var chrom, startPos, strand = splitLine[0..1, 5]
                                         ^
Eddra answered 11/8, 2015 at 17:22 Comment(0)
K
6

This can be accomplished using macros.

import macros

macro `..=`*(lhs: untyped, rhs: tuple|seq|array): auto =
  # Check that the lhs is a tuple of identifiers.
  expectKind(lhs, nnkPar)
  for i in 0..len(lhs)-1:
    expectKind(lhs[i], nnkIdent)
  # Result is a statement list starting with an
  # assignment to a tmp variable of rhs.
  let t = genSym()
  result = newStmtList(quote do:
    let `t` = `rhs`)
  # assign each component to the corresponding
  # variable.
  for i in 0..len(lhs)-1:
    let v = lhs[i]
    # skip assignments to _.
    if $v.toStrLit != "_":
      result.add(quote do:
        `v` = `t`[`i`])

macro headAux(count: int, rhs: seq|array|tuple): auto =
  let t = genSym()
  result = quote do:
    let `t` = `rhs`
    ()
  for i in 0..count.intVal-1:
    result[1].add(quote do:
      `t`[`i`])

template head*(count: static[int], rhs: untyped): auto =
  # We need to redirect this through a template because
  # of a bug in the current Nim compiler when using
  # static[int] with macros.
  headAux(count, rhs)

var x, y: int
(x, y) ..= (1, 2)
echo x, y
(x, _) ..= (3, 4)
echo x, y
(x, y) ..= @[4, 5, 6]
echo x, y
let z = head(2, @[4, 5, 6])
echo z
(x, y) ..= head(2, @[7, 8, 9])
echo x, y

The ..= macro unpacks tuple or sequence assignments. You can accomplish the same with var (x, y) = (1, 2), for example, but ..= works for seqs and arrays, too, and allows you to reuse variables.

The head template/macro extracts the first count elements from a tuple, array, or seqs and returns them as a tuple (which can then be used like any other tuple, e.g. for destructuring with let or var).

Keloid answered 12/8, 2015 at 21:29 Comment(2)
Very nice solution. This issue with static[int] and macros was exactly why I failed to come up with other solutions. Good to see a workaround for that.Paradiddle
in at least v2.0.0 of nim you need to use nnkTupleConstr instead of nnkPar, but everything else worksScolopendrid
C
3

For anyone that's looking for a quick solution, here's a nimble package I wrote called unpack.

You can do sequence and object destructuring/unpacking with syntax like this:

someSeqOrTupleOrArray.lunpack(a, b, c)
[a2, b2, c2] <- someSeqOrTupleOrArray

{name, job} <- tim

tom.lunpack(job, otherName = name)
{job, name: yetAnotherName} <- john
Conjecture answered 5/12, 2018 at 6:30 Comment(0)
P
2

Currently pattern matching in Nim only works with tuples. This also makes sense, because pattern matching requires a statically known arity. For instance, what should happen in your example, if the seq does not have a length of three? Note that in your example the length of the sequence can only be determined at runtime, so the compiler does not know if it is actually possible to extract three variables.

Therefore I think the solution which was linked by @def- was going in the right direction. This example uses arrays, which do have a statically known size. In this case the compiler knows the tuple arity, i.e., the extraction is well defined.

If you want an alternative (maybe convenient but unsafe) approach you could do something like this:

import macros

macro extract(args: varargs[untyped]): typed =
  ## assumes that the first expression is an expression
  ## which can take a bracket expression. Let's call it
  ## `arr`. The generated AST will then correspond to:
  ##
  ## let <second_arg> = arr[0]
  ## let <third_arg>  = arr[1]
  ## ...
  result = newStmtList()
  # the first vararg is the "array"
  let arr = args[0]
  var i = 0
  # all other varargs are now used as "injected" let bindings
  for arg in args.children:
    if i > 0:
      var rhs = newNimNode(nnkBracketExpr)
      rhs.add(arr)
      rhs.add(newIntLitNode(i-1))

      let assign = newLetStmt(arg, rhs) # could be replaced by newVarStmt
      result.add(assign)
    i += 1
  #echo result.treerepr


let s = @["X", "Y", "Z"]

s.extract(a, b, c)
# this essentially produces:
# let a = s[0]
# let b = s[1]
# let c = s[2]

# check if it works:
echo a, b, c

I do not have included a check for the seq length yet, so you would simply get out-of-bounds error if the seq does not have the required length. Another warning: If the first expression is not a literal, the expression would be evaluated/calculated several times.

Note that the _ literal is allowed in let bindings as a placeholder, which means that you could do things like this:

s.extract(a, b, _, _, _, x)

This would address your splitLine[0..1, 5] example, which btw is simply not a valid indexing syntax.

Paradiddle answered 12/8, 2015 at 11:5 Comment(3)
Okay, thanks. Writing the extract for sequences of length a billion does not sound like fun ;) Wonder if something like Pythons wonderful itemgetter is possible in nim...Eddra
That's basically the point here: If you have a billion fields, classic pattern matching makes not so much sense, because you do not want to have a billion variables in your code anyway (that what pattern matching is for). In these cases you will most likely select specific fields any way, say indices 3, 17, and 12382. And that's what random access indexing is for. Note that you could expand the idea to maybe write s.extract(a, 3, b, 17, c, 12382). But you want gain much over direct indexing e.g. let a, b, c = (s[3], s[17], c[12382]).Paradiddle
Ah. Perhaps the a, b, c = (s[3]... ) example should introduce your reply? If that indeed is idiomatic nim... Perhaps st about whether a, b, c = (s[3..4], s[17]) would work also....Eddra
E
1

yet another option is package definesugar:

import strutils, definesugar

# need to use splitWhitespace instead of split to prevent empty string elements in sequence
var splitLine = "chr7    127471196  127472363  Pos1  0  +".splitWhitespace()
echo splitLine

block:
  (chrom, startPos, endPos) := splitLine[0..2]
  echo chrom # chr7
  echo startPos # 127471196
  echo endPos # 127472363

block:
  (chrom, startPos, strand) := splitLine[0..1] & splitLine[5] # splitLine[0..1, 5] not supported
  echo chrom
  echo startPos
  echo strand # +

# alternative syntax
block:
  (chrom, startPos, *_, strand) := splitLine
  echo chrom
  echo startPos
  echo strand

see https://forum.nim-lang.org/t/7072 for recent discussion

Eggshaped answered 12/11, 2020 at 9:6 Comment(0)
D
0
import std/strscans
var line = "chr7    127471196  127472363  Pos1  0  +"
var (_, chrom, startPos, endPos) = line.scanTuple("$w$s$i$s$i")
echo chrom # chr7

The _ is a success:bool which indicates if the scan was successful. It is discarded for this demo.

$w (word), $s (space), $i (integer)

Debouch answered 22/9 at 20:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.