How do you use ranges in D?
Asked Answered
M

4

7

Whenever I try to use ranges in D, I fail miserably.

What is the proper way to use ranges in D? (See inline comments for my confusion.)

void print(R)(/* ref? auto ref? neither? */ R r)
{
    foreach (x; r)
    {
        writeln(x);
    }

    // Million $$$ question:
    //
    // Will I get back the same things as last time?
    // Do I have to check for this every time?

    foreach (x; r)
    {
        writeln(x);
    }
}

void test2(alias F, R)(/* ref/auto ref? */ R items)
{
    // Will it consume items?
    // _Should_ it consume items?
    // Will the caller be affected? How do I know?
    // Am I supposed to?
    F(items);
}
Mistook answered 25/6, 2012 at 13:21 Comment(15)
The range is an interface for a collection, like an iterator in other languages. While you can always iterate over an iterator (or a range) you cannot effectively make broad generalizations about other behavior. My understanding is that the range itself is always a reference type but the underlying collection could be anything that supports the interface, so if you are going to modify the elements of the range, you should use ref and you should probably use a constraint to ensure that the underlying collection is writable. If you aren't going to modify the elements, then you don't need ref.Interrelate
Sorry, I should clarify: if you are going to modify the elements of the range, you should use ref in the foreach loop (in case the elements of the collection are value types), I don't think you would ever need a ref qualifier for the function, as the range should always be a reference, but I could be overlooking something.Interrelate
@Tim: I think you're misunderstanding the question (or I'm not understanding what you mean). I'm talking about modifying the range, not the items. (Like, think: LockingTextReader(stdin) as opposed to an array. One of them is single-pass, the other is multi-pass.)Mistook
Ahh, sorry, you're right I did misunderstand. Inside the function the range should be treated as an interface (a reference type) even though your template will sometimes allow special cases (arrays, or struct collections). In most cases, I would think this reference will have been purpose built to be passed into a function, so it doesn't really make sense to try and change it that way. Consider, if I call myFunc(myArray[10..20]) there is no qualifier that would make it acceptable for the function to reassign the range on my end, so ref makes no sense. Even less so for non-array collections.Interrelate
@Tim: "even though your template will sometimes allow special cases"... How will it?! :O I think that's the question. :-) If I don't know what's being passed to me, how do I know if it's one of those "special" cases (slices, Arrays, proxies for these, etc.) that I have to treat differently, after, say, a foreach?Mistook
Well, you don't have to treat them differently, the purpose of the range is to allow you to treat them all the same so long as you restrict yourself to the range's defined operations.Interrelate
@Tim: "Defined"? foreach is defined in either case... it just behaves differently in each case!Mistook
In your test2 above, for example, because you are placing no restrictions on the type R, you cannot guarantee that F will not modify its state. If items is a forward only file buffer, it may well be empty when F returns. If that's the case though, items will still be in a valid (empty) state in the caller test2 because the range will be a reference. If items is a dynamic array, F might consume its range, without affecting test2.Interrelate
@Tim: Now you're just repeating the problem back to me. :) The question is, what's the solution?Mistook
It isn't the foreach that is behaving differently, it's just whether the underlying collection is shared between calls or unique, which is, not being a defined part of the range interface, not defined without some additional contract. What I've been trying to say is that you need some additional contract if you expect to extract anything more definite from a range. The interface isn't that specific.Interrelate
For instance, you could use a const qualifier if you wanted to ensure that you don't receive any range that could be unintentionally consumed.Interrelate
@Tim: Oh hmmmm I see what you're saying now. So what contract(s) might I write in my code? I'm kinda confused since I can't find anything that tells you whether foreach modifies a range or not. (Btw, const ranges never work. Ever. :-) )Mistook
Well, ok, so I've had a lot of trouble getting DMD to enforce the guarantees that const and immutable are supposed to make, but in theory... Anyway, std.range.isForwardRange is the built in 'repeatable' range contract.Interrelate
@Tim: D'oh. Yeah, forward ranges would solve my problem, but now I realize there's a different issue involved -- my code needs to work both with slices and with console input (which is not a forward range). Looks like I have some reworking to do, but I think this clarifies things. Thanks a lot!Mistook
No problem, thanks for the discussion, it's helped me to clarify my own thoughts quite a bit :DInterrelate
I
4

Sorry, I can't fit this into a comment :D. Consider if Range were defined this way:

interface Range {
    void doForeach(void delegate() myDel);
}

And your function looked like this:

void myFunc(Range r) {
    doForeach(() {
        //blah
    });
}

You wouldn't expect anything strange to happen when you reassigned r, nor would you expect to be able to modify the caller's Range. I think the problem is that you are expecting your template function to be able to account for all of the variation in range types, while still taking advantage of the specialization. That doesn't work. You can apply a contract to the template to take advantage of the specialization, or use only the general functionality. Does this help at all?

Edit (what we've been talking about in comments):

void funcThatDoesntRuinYourRanges(R)(R r)
if (isForwardRange(r)) {
    //do some stuff
}

Edit 2 std.range It looks like isForwardRange simply checks whether save is defined, and save is just a primitive that makes a sort of un-linked copy of the range. The docs specify that save is not defined for e.g. files and sockets.

Interrelate answered 25/6, 2012 at 15:31 Comment(1)
+1 yeah, I guess I need a forward range after all, took me a while to realize that. :-) Guess I'll have to make a wrapper for console input..Mistook
D
7

You should probably read this tutorial on ranges if you haven't.

When a range will and won't be consumed depends on its type. If it's an input range and not a forward range (e.g if it's an input stream of some kind - std.stdio.byLine would be one example of this), then iterating over it in any way shape or form will consume it.

//Will consume
auto result = find(inRange, needle);

//Will consume
foreach(e; inRange) {}

If it's a forward range and it's a reference type, then it will be consumed whenever you iterate over it, but you can call save to get a copy of it, and consuming the copy won't consume the original (nor will consuming the original consume the copy).

//Will consume
auto result = find(refRange, needle);

//Will consume
foreach(e; refRange) {}

//Won't consume
auto result = find(refRange.save, needle);

//Won't consume
foreach(e; refRange.save) {}

Where things get more interesting is forward ranges which are value types (or arrays). They act the same as any forward range with regards to save, but they differ in that simply passing them to a function or using them in a foreach implicitly saves them.

//Won't consume
auto result = find(valRange, needle);

//Won't consume
foreach(e; valRange) {}

//Won't consume
auto result = find(valRange.save, needle);

//Won't consume
foreach(e; valRange.save) {}

So, if you're dealing with an input range which isn't a forward range, it will be consumed regardless. And if you're dealing with a forward range, you need to call save if you want want to guarantee that it isn't consumed - otherwise whether it's consumed or not depends on its type.

With regards to ref, if you declare a range-based function to take its argument by ref, then it won't be copied, so it won't matter whether the range passed in is a reference type or not, but it does mean that you can't pass an rvalue, which would be really annoying, so you probably shouldn't use ref on a range parameter unless you actually need it to always mutate the original (e.g. std.range.popFrontN takes a ref because it explicitly mutates the original rather than potentially operating on a copy).

As for calling range-based functions with forward ranges, value type ranges are most likely to work properly, since far too often, code is written and tested with value type ranges and isn't always properly tested with reference types. Unfortunately, this includes Phobos' functions (though that will be fixed; it just hasn't been properly tested for in all cases yet - if you run into any cases where a Phobos function doesn't work properly with a reference type forward range, please report it). So, reference type forward ranges don't always work as they should.

Divertissement answered 25/6, 2012 at 18:8 Comment(0)
I
4

Sorry, I can't fit this into a comment :D. Consider if Range were defined this way:

interface Range {
    void doForeach(void delegate() myDel);
}

And your function looked like this:

void myFunc(Range r) {
    doForeach(() {
        //blah
    });
}

You wouldn't expect anything strange to happen when you reassigned r, nor would you expect to be able to modify the caller's Range. I think the problem is that you are expecting your template function to be able to account for all of the variation in range types, while still taking advantage of the specialization. That doesn't work. You can apply a contract to the template to take advantage of the specialization, or use only the general functionality. Does this help at all?

Edit (what we've been talking about in comments):

void funcThatDoesntRuinYourRanges(R)(R r)
if (isForwardRange(r)) {
    //do some stuff
}

Edit 2 std.range It looks like isForwardRange simply checks whether save is defined, and save is just a primitive that makes a sort of un-linked copy of the range. The docs specify that save is not defined for e.g. files and sockets.

Interrelate answered 25/6, 2012 at 15:31 Comment(1)
+1 yeah, I guess I need a forward range after all, took me a while to realize that. :-) Guess I'll have to make a wrapper for console input..Mistook
H
1

The short of it; ranges are consumed. This is what you should expect and plan for.

The ref on the foreach plays no role in this, it only relates to the value returned by the range.

The long; ranges are consumed, but may get copied. You'll need to look at the documentation to decide what will happen. Value types get copied and thus a range may not be modified when passed to a function, but you can not rely on if the range comes as a struct as the data stream my be a reference, e.g. FILE. And of course a ref function parameter will add to the confusion.

Heartfelt answered 25/6, 2012 at 15:0 Comment(3)
"This is what you should expect and plan for" -> then does that mean I can't use foreach? Or do I popFrontN afterwards? How do I "look at the documentation" when it's a generic type that's just been passed to me?Mistook
Also, my bad about the ref on the foreach... I'm aware they aren't related to this issue; I had another issue on my mind as well and happened to put them there. I've removed them.Mistook
@Mehrdad: some ranges are inherently destructive in that they read from something that doesn't support moving the current location in revers (e.g. an un-buffered network stream). For those, no matter how you read them, you will modify all alias to the lower level thing.Quintal
Y
1

Say your print function looks like this:

void print(R)(R r) {
  foreach (x; r) {
    writeln(x);
  }
}

Here, r is passed into the function using reference semantics, using the generic type R: so you don't need ref here (and auto will give a compilation error). Otherwise, this will print the contents of r, item-by-item. (I seem to remember there being a way to constrain the generic type to that of a range, because ranges have certain properties, but I forget the details!)

Anyway:

auto myRange = [1, 2, 3];
print(myRange);
print(myRange);

...will output:

1
2
3
1
2
3

If you change your function to (presuming x++ makes sense for your range):

void print(R)(R r) {
  foreach (x; r) {
    x++;
    writeln(x);
  }
}

...then each element will be increased before being printed, but this is using copy semantics. That is, the original values in myRange won't be changed, so the output will be:

2
3
4
2
3
4

If, however, you change your function to:

void print(R)(R r) {
  foreach (ref x; r) {
    x++;
    writeln(x);
  }
}

...then the x is reverted to reference semantics, which refer to the original elements of myRange. Hence the output will now be:

2
3
4
3
4
5
Yorktown answered 25/6, 2012 at 15:2 Comment(1)
I think you misunderstood the question -- I'm asking about modification of the range itself, not the items. Some ranges are really the same under the hood, so modifying one modifies all of them. Some of them (like slices) aren't. The question is, how do I deal with this?Mistook

© 2022 - 2024 — McMap. All rights reserved.