How does a streaming operator differ from deferred execution?

Go To StackoverFlow.com

7

In LINQ Where is a streaming operator. Where-as OrderByDescending is a non-streaming operator. AFAIK, a streaming operator only gathers the next item that is necessary. A non-streaming operator evaluates the entire data stream at once.

I fail to see the relevance of defining a Streaming Operator. To me, it is redundant with Deferred Execution. Take the example where I have written a custom extension and consumed it using the where operator and and orderby.

public static class ExtensionStuff
{
    public static IEnumerable<int> Where(this IEnumerable<int> sequence, Func<int, bool> predicate)
    {
        foreach (int i in sequence)
        {
            if (predicate(i))
            {
                yield return i;
            }
        }
    }
}

    public static void Main()
    {
        TestLinq3();
    }

    private static void TestLinq3()
    {
        int[] items = { 1, 2, 3,4 };

        var selected = items.Where(i => i < 3)
                            .OrderByDescending(i => i);

        Write(selected);
    }



    private static void Write(IEnumerable<int> selected)
    {
        foreach(var i in selected)
            Console.WriteLine(i);
    }

In either case, Where needs to evaluate each element in order to determine which elements meet the condition. The fact that it yields seems to only become relevant because the operator gains deferred execution.

So, what is the importance of Streaming Operators?

2012-04-05 21:08
by P.Brian.Mackey
Try that again with about 2 billion ints in items - cHao 2012-04-05 21:11
@cHao or an infinite sequence, or a sequence derived from an open network stream - Marc Gravell 2012-04-05 21:13
More specific exampleL.B 2012-04-05 21:29
There are cases where Linq can't defer execution. Some operations require slurping the IEnumerable. Not Where(), that's one at a time. But definitely OrderBy, you can't sort a collection unless you know all the collection items. My favorite one is Enumerable.Reverse() the one that breaks the IEnumerable rule. Shockingly unoptimized with O(n) storage and O(n) execution on a IList<>. Sloppy - Hans Passant 2012-04-05 21:32


12

There are two aspects: speed and memory.

The speed aspect becomes more apparent when you use a method like .Take() to only consume a portion of the original result set.

// Consumes ten elements, yields 5 results.
Enumerable.Range(1, 1000000).Where(i => i % 2 == 0)
    .Take(5)
    .ToList();

// Consumes one million elements, yields 5 results.
Enumerable.Range(1, 1000000).Where(i => i % 2 == 0)
    .OrderByDescending(i => i)
    .Take(5)
    .ToList();

Because the first example uses only streaming operators before the call to Take, you only end up yielding values 1 through 10 before Take stops evaluating. Furthermore, only one value is loaded into memory at a time, so you have a very small memory footprint.

In the second example, OrderByDescending is not streaming, so the moment Take pulls the first item, the entire result that's passed through the Where filter has to be placed in memory for sorting. This could take a long time and produce a big memory footprint.

Even if you weren't using Take, the memory issue can be important. For example:

// Puts half a million elements in memory, sorts, then outputs them.
var numbers = Enumerable.Range(1, 1000000).Where(i => i % 2 == 0)
    .OrderByDescending(i => i);
foreach(var number in numbers) Console.WriteLine(number);

// Puts one element in memory at a time.
var numbers = Enumerable.Range(1, 1000000).Where(i => i % 2 == 0);
foreach(var number in numbers) Console.WriteLine(number);
2012-04-05 21:11
by StriplingWarrior


2

The fact that it yields seems to only become relevant because the operator gains deferred execution.

So, what is the importance of Streaming Operators?

I.e. you could not process infinite sequences with buffering / non-streaming extension methods - while you can "run" such a sequence (until you abort) just fine using only streaming extension methods.

Take for example this method:

public IEnumerable<int> GetNumbers(int start)
{
    int num = start;

    while(true)
    {
        yield return num;
        num++;
    }
}

You can use Where just fine:

foreach (var num in GetNumbers(0).Where(x => x % 2 == 0))
{
    Console.WriteLine(num);
}

OrderBy() would not work in this case since it would have to exhaustively enumerate the results before emitting a single number.

2012-04-05 21:12
by BrokenGlass
Nitpick, but it is infinite, it's just that each result isn't unique. It will eventually overflow, but it wraps, so it can overflow an infinite number of times. (As long as you're not in a checked block.) You could just have while(true)yield return 4; if you wanted - Servy 2012-04-05 21:23
Aah that's right ;- - BrokenGlass 2012-04-05 21:25


2

Just to be explicit; in the case you mentioned there's no advantage to the fact that where streams, since the orderby sucks the whole thing in anyway. There are however times where the advantages of streaming are used (other answers/comments have given examples), so all LINQ operators stream to the best of their ability. Orderby streams as much as it can, which happens to be not very much. Where streams very effectively.

2012-04-05 21:26
by Servy
Are there LINQ operators whose streaming lies somewhere between Where and OrderBy - Zev Spitz 2016-09-29 01:35
@ZevSpitz I suppose it would depend on how you define your terms. SkipWhile could qualify, depending on how you define what the quantity of "streaming" is - Servy 2016-09-29 13:19
MSDN defines streaming as being able to parse the results one at a time (e.g. Select), whereas non-streaming requires all the values in order to evaluate the first value (e.g. OrderBy). By that definition, why should SkipWhile be any less streaming than Select? The first iteration doesn't require all the values - Zev Spitz 2016-09-29 13:27
@ZevSpitz You've used a strictly binary definition, in which something either streams or it doesn't, rather than an analog definition in which something can stream to a given degree. SkipWhile doesn't need to consume all of the input sequence in order to return the first item, but it needs to consume an arbitrary percentage of it, at which point it then switches to consuming one item from the input sequence for each result yielded. So again, this is why I say it depends on how you define your terms; there is no single well defined definition in this context - Servy 2016-09-29 13:31
By your logic, Where should be thought of as intermittently streaming and non-streaming. But some condition that determines if the value is passed to the next operator / foreach doesn't fundamentally change the behavior of the query; each value can be evaluated precisely when it's needed to pass onto other operators. OrderBy is very different -- the first result requires an evaluation of all the results - Zev Spitz 2016-09-30 00:33
@ZevSpitz Again, you're using a strictly binary definition, and using it to say that Where does or doesn't stream. I'm using a analog definition of the term to describe the degree to which something streams its data. Where doesn't stream the data perfectly, the way something like Select does. It streams the data quite a lot more than other operators, such as OrderBy. It's a spectrum. Where is rather close to the "perfect streaming" site without being quite there, OrderBy is very close to the other side. If you want to draw a line on that spectrum you can - Servy 2016-09-30 13:07
Shouldn't the degree of streaming be a fundamental property of the specific method? But Where can span the entire gamut of the degree of streaming -- Where(x=>true) is fully streaming like Select, Where(x=>false) is nonstreaming like OrderBy, and the degree of streamin in var rnd = new Random(); var query = data.Where(x=>x>rnd.Next()) depends on the algorithm of the psuedorandom number generator. This means that it is pointless to categorize methods in terms of their degree of streaming, because it depends on the arguments passed to the method - Zev Spitz 2016-10-01 16:47
You're defining streaming as a function of when the first value is passed to the next operator. I'm defining streaming as a function of when the next value is evaluated relative to when it is passed to the next operator (if ever) - Zev Spitz 2016-10-01 16:52
Ads