Use 'yield return' in async methods to achieve pipelining
Asked Answered
S

6

8

I'm building a UWP app that gets a list of files from a folder, do some processing on them, then get rid of the files.

This was working fine:

List<StorageFile> files;
public MainPage()
{
     this.InitializeComponent();
     files = new List<StorageFile>();
}

private async Task<List<StorageFile>> GetFiles(StorageFolder folder)
{
     var items = await folder.GetItemsAsync();          
     foreach (var item in items)
     {
         if (item.GetType() == typeof(StorageFile))
             files.Add(item);
         else
             await GetFiles(item as StorageFolder);
     }

     return files;
}


private async void GetFilesBtn_Click(object sender, RoutedEventArgs e)
{
     // opening folder picker, then selecting a folder
     var files = await GetFiles(folder);    
     // process files
     ProcessFiles(files);    
     // dispose
     DisposeFiles(files);
}

However, when working with large number of files, the memory consumption went really high (obviously).

So what came to mind is to use yield return file and process each file as it came, then once I'm done with that file I can dispose of it, and start working on the next file and so on.

What I've tried to do is this:

public async Task<IEnumerable<StorageFile>> GetFiles(StorageFolder folder)
{
       var items = await folder.GetItemsAsync();    
       foreach (var item in items)
       {
            if (item.GetType() == typeof(StorageFile))
                yield return item;
            else
               await GetFiles(item as StorageFolder);
       }
}

Then:

foreach (var file in GetFiles(folder))
{
      // process file
      ProcessFile(file);
      // dispose
      DisposeFile(file);
} 

When doing this I'm getting:

The body of 'GetFiles(StorageFolder)' cannot be an iterator block because Task IEnumerable StorageFile is not an iterator interface type.

I've never used yield return before so I'm not sure how to accomplish this.

Studied answered 7/4, 2018 at 10:52 Comment(1)
You need to look at reactive extensions (IObservable interface and related stuff). It can solve this problem in a nice way, and is just generally useful technique.Orfield
F
13

As of C# 8, this can now be accomplished using IAsyncEnumerable.

You just need to change your return type from Task<IEnumerable<StorageFile>> to IAsyncEnumerable<StorageFile>, then call the method using await foreach instead of foreach.

So your example will now look like this:

public async IAsyncEnumerable<StorageFile> GetFiles(StorageFolder folder)
{
       var items = await folder.GetItemsAsync();    
       foreach (var item in items)
       {
            if (item.GetType() == typeof(StorageFile))
                yield return item;
            else
               await foreach (var item2 in GetFiles(item as StorageFolder))
                  yield return item2;
       }
}

Then:

await foreach (var file in GetFiles(folder))
{
      // process file
      ProcessFile(file);
      // dispose
      DisposeFile(file);
}
Farmstead answered 30/7, 2021 at 17:26 Comment(0)
H
4

You sure know how to make life difficult for yourself - async, yield and recursion! Unfortunately async/await and yield are not compatible in dotnet at the moment.

I would advise taking a different approach, instead of having your recursive function build up a list, pass it an action for it to apply to each file as it goes, something like:

public async Task<IEnumerable<StorageFile>> ProcessFiles(StorageFolder folder, Action<StorageFile> process)
{
    var items = await folder.GetItemsAsync();          
    foreach (var item in items)
    {
        if (item.GetType() == typeof(StorageFile))
            process(item);
        else
           await ProcessFiles(item as StorageFolder);
    }
}

ProcessFiles(folder, file => {
    ProcessFile(file);
    DisposeFile(file);
});

You may wish to make ProcessFile and DisposeFile asynchronous, in which case:

ProcessFiles(folder, async file => {
    await ProcessFile(file);
    await DisposeFile(file);
});

If you want to define your action separately, you do it like this:

Action<StorageFolder> processor = ProcessFiles(folder, async file => {
    await ProcessFile(file);
    await DisposeFile(file);
});

ProcessFiles(folder, processor);
Herve answered 7/4, 2018 at 11:51 Comment(0)
W
2

The async keyword tells C# compiler to rewrite the method as a state machine that works asynchronously (aka Task).

The yield return keyword tells C# compiler to rewrite the method as a state machine that generates results lazily (aka Enumerator).

What you're trying to do is to combine the two approaches which will make the C# compiler very sad as it is currently not able to decide how to generate both state machines from one method. There is an open issue to support this feature in C# at dotnet/csharplang: Champion "Async Streams" (including async disposable)

There is however a different approach you can use, Task.WhenAll, described in this question: Is it possible to "await yield return DoSomethingAsync()"

Note that the Task.WhenAll will resolve all the intermediate results into memory so you might end up with solution that is more complex and more memory-consuming than your original if you're not careful.

Next, you also have a bug in your code:

foreach (var item in items)
{
    if (item.GetType() == typeof(StorageFile))
        yield return item;
    else
        await GetFiles(item as StorageFolder); // <---- no return here
}

In the else branch, you do not return the retrieved value. So even if that code compiled, you would soon find out that it's not working correctly.

However, if you added the return, your method would need to both generate a state machine for StorageFile but also return the entire sequence in case of StorageFolder. This is not possible and you would need to take a different approach called flattening, simply by putting another foreach (note that the asynchronicity was removed for simplicity):

foreach (var item in items)
{
    if (item.GetType() == typeof(StorageFile))
    {
        yield return item;
    }
    else
    {
       foreach (var file in GetFiles(item as StorageFolder))
       {
           yield return file;
       }
    }
}
Workmanship answered 7/4, 2018 at 12:3 Comment(0)
S
2

This is perfect opportunity to go Reactive!

I created this simple program, which you can easily edit to use StorageFolder and StorageFile instead of string as path:

class Program
{

    static void Main(string[] args)
    {
        Task.Run(async () =>
        {
            GetFilesFromDirectory(Environment.GetFolderPath(Environment.SpecialFolder.ApplicationData)).Subscribe(
                file =>
                {
                    Console.WriteLine(file);
                });


            var files = await GetFilesFromDirectory(Environment.GetFolderPath(Environment.SpecialFolder.ApplicationData))
                .ToArray(); // you can also do this

            foreach (var file in files)
            {
                Console.WriteLine(file);
            }

            Console.ReadLine();
        }).Wait();
    }


    static IObservable<string> GetFilesFromDirectory(string path)
    {
         var files = new Subject<string>();
        var directories = new Subject<string>();

        directories.Select(x => new DirectoryInfo(x)).Subscribe(dir =>
        {
            foreach (var fileInfo in dir.GetFiles())
            {
                files.OnNext(fileInfo.FullName);
            }

            foreach (var directoryInfo in dir.GetDirectories())
            {
                directories.OnNext(directoryInfo.FullName);
            }
        }, () =>
        {
            files.OnCompleted();
        });

        Task.Run(() =>
        {
            directories.OnNext(path);
            directories.OnCompleted();
        });

        return files;
    }

There is also an overload to Directory.GetFiles which does recursive search for you:

var fileList = new DirectoryInfo(sDir).GetFiles("*", SearchOption.AllDirectories);

Skeleton answered 7/4, 2018 at 12:51 Comment(0)
S
0

I have somewhat similar example of using yield. Basically replace GetRandomStringWDelayAsync with your GetFile code. Locking is not really necessary.

        object lockObj = new object();

        for (int i = 0; i < callCount; i++)
        {
            int j = i; // because of scope we can't use "i"  

            yield return Task.Run(async delegate {
                var pair = await StringService.RandomValues.GetRandomStringWDelayAsync(j);

                if (pair.Value != null)
                {
                    lock (lockObj)
                        dictionary[j] = pair.Value;
                }
            });
        }

full code here: https://github.com/sergeklokov/AsynchronousTasksDemo

Stambul answered 25/8, 2019 at 5:7 Comment(0)
H
-1

Because yield supports methods that their return type is IEnumerable<T> and GetFiles() returns Task<IEnumerable<T>>
Check out this:

    public IEnumerable<StorageFile> GetFiles(StorageFolder folder)
    {
        // your code here
    }

    public Task<IEnumerable<StorageFile>> GetFilesAsync(StorageFolder folder)
    {
        return Task.Run(() => GetFiles(folder));
    }

And then you can do
foreach (var file in GetFiles(folder)) or foreach (var file in await GetFilesAsync(folder))

Edit: Ah, your talking about IAsyncEnumerable which is not exist in the date you posted the answer

Helianthus answered 7/4, 2018 at 11:50 Comment(1)
This will only await once and will not work with yield.Ophthalmoscopy

© 2022 - 2024 — McMap. All rights reserved.