Sponsored Links


Resources

.NET Research Library
Get .NET related white papers, case studies and webcasts

Iterators With C#2Iterators With C#2Iterators With C#2 Discuss DiscussDiscuss Printer friendly Printer friendlyPrinter friendly
Iterators With C#2

February 18, 2005

Introduction

As with anonymous methods, iterators are a non-obvious but powerful feature introduced in the version 2 of C#. Iterators can be qualified as syntactic sugar since they don’t imply any CLR/IL instruction set changes compared to .NET 1.x. Just like anonymous methods, all the magic is in the compiler. To understand the underlying motivations of C#2 iterators, we’ll first dig into C#1 iterators. Then, we’ll cover the basics of this feature in version 2, following with an analysis of the compiler’s work. Finally, we’ll drill into advanced uses of C#2 iterators.

C#1 iterators

Enumerables, enumerators and the iterator design pattern

It is necessary to get a strong sight on what are enumerables and enumerators, before covering C#1 and C#2 enumerators. An object is an enumerable if it contains or references a collection of objects and if elements of this collection can be enumerated with the C# keyword foreach . Instances of classes that implement System.Collections.IEnumerable are enumerables. As we’ll see, there exists some enumerables that don’t fulfill this condition.

An object is an enumerator if its class implements the interface System.Collections.IEnumerator . Here are definitions of these interfaces:

A client that wishes to enumerate items over an enumerable asks for an enumerator to the enumerable with the method IEnumerator IEnumerable.GetEnumerator(). Then, the client uses methods IEnumerator.MoveNext() and IEnumerator.Reset() on the enumerator to iterate over items of the collection. An enumerator is a stateful object that keeps tracks of an index on the collection. A client calls the get accessor of the IEnumerator.Current property to access the item referenced currently by the index of the iterator. If we try to represent enumerables, enumerators and their clients in an UML diagram it would look like this:

For those who are acquainted with Design Patterns of the Gang of Four, it becomes obvious that iterators of C# implements the design pattern named iterator.

An example

Here is an example of implementing iterators in C#1. The Personn es class is the enumerable class. The PersonsEnumerator class is the enumerator class. Note that the class Person s could have played both roles of enumerable and enumerator if it would have implemented both interfaces. However, we’ll see that it is always preferable that each role are implemented in a dedicated class (after all, we follow a central object oriented programming tenet : one responsibility/one class).

This program outputs:

Michel
Christine
Mathieu
Julien

It's a fairly long program to code such a simple functionality. Most of lines implement the index's behaviour of the enumerator. Hopefully, we'll see that C#2 dramatically reduces the amount of code to express the same need.

Note that the C# compiler has generated something like the following to interpret the foreach keyword:

Several enumerators for a single enumerable

An enumerable class is not obliged to implement the IEnumerable interface. This responsibility can be delegated to a third class PersonsEnumerable , for instance:

This way, it is easy to implement several enumerator classes that target the same enumerable class. It can be quite useful to have an enumerator Reverse that yields elements in the reverse order or an enumerator EvenPosOnly that yields in order only items at even positions.

Drawbacks of C#1 iterators

Clearly, the C#1 syntax for iterators is too heavy compared to the functionality provided. For this reason, most developers don’t use it. Moreover, with this syntax, trying to enumerate over items of a barely exotic collection (such as a binary tree) quickly becomes a nightmare.

C#2 iterators

The keyword yield return

C#2 has been enhanced with the yield return keyword to implement the iterator pattern seamlessly. Concretely, it relieves developers from the burden of implementing enumerator and enumerable classes. Here is our previous example rewritten:

The behaviour of the yield return keyword might stump you. It looks like the yield return keyword is returning a string but the type of the return value of the method GetEnumerator() is IEnumerator. Moreover, which class implements the IEnumerator returned since we don’t explicitly supply such an implementation? As this article unfolds we’ll thoroughly shed light on these mysteries but it’s still time to dig in C#2 iterators basics.

Notice that a method can call several times the yield return keyword. For instance:

This program has the same output as previous examples:

Michel
Christine
Mathieu
Julien

It seems that each time the thread calls GetEnumerator() , it branches itself just after the previous call to yield return . We’ll check soon that this hunch is the right one.

Generics and iterators

In a language that supports generics, it would be a drag to still use the object IEnumerator.Current property. Therefore, interfaces IEnumerable and IEnumerator are both provided in a generic form in the .NET framework 2.0:

Notice that the interface IEnumerator<T> implements the interface IDisposable . Notice also that the method IEnumerator.Reset() has been removed. Thanks to generics, we can now tell the compiler that our enumerable is a collection of strings instead of a collection of object:

Several enumerators for a single enumerable

C#2 iterator is also a good mean to implement concisely several iterators on a single enumerable. For instance :

This program outputs:

-->Iterator Reverse
Julien
Mathieu
Christine
Michel
-->Iterator PositionsPaires
Michel
Mathieu
-->Iterator Concat
Julien
Mathieu
Christine
Michel
Michel
Mathieu

The keyword yield break

You might wish to enumerate a subset of an enumerable. In this case, the keyword yield break is the right way to tell the client that it should stop looping.

This program outputs:

Michel
Christine

As a consequence, the code written after a yield break instruction is not reachable. The C# compiler emits a warning when meeting such unreachable code.

Syntactic constraints on yield return and yield break keywords

  • yield break and yield return keywords can be used only inside the body of a method, the body of a property accessor or the body of an operator.
  • Whatever the kind of method that use yield break or yield return keywords, it must return one the following interfaces System.Collections.Generic.IEnumerable<T> , System.Collections.IEnumerable , System.Collections.Generic.IEnumerator<T> or System.Collections.IEnumerator .
  • yield break and yield return keywords can’t be used in the body of an anonymous method.
  • yield break and yield return keywords can’t be used in a finally block.
  • yield break and yield return keywords can’t be used in a try block that has at least a catch block.
  • yield break and yield return keywords can’t be used in a method that has ref or out arguments. More globally, a method that contains at least one of these keywords should not return any other information than the yielded item.

A recursive iterator example

The following example shows the powerfulness of C#2 iterators to enumerate items of a non-flat collection, such as a binary tree:

A representation of the binary tree built by the Main () method is:

This programs outputs:

B
A
D
C
E

The C#2 compiler and iterators

If you are a curious developer, you might be delighted by the work done by the C#2 compiler to interpret yield break and yield return keywords . This is the subject of the current section. Keep in mind that iterators are syntactic sugar. Unlike generics, no IL instructions have been added to implement iterators.

Enumerator classes are automatically built and used by the compiler

A method that uses at least one of the yield break or yield return keywords must return an enumerable or an enumerator (generic or not). In any cases, you can consider that an enumerator object is returned since the only purpose to implement an enumerable interface is to return an enumerator. If you’ve read the previous article concerning anonymous methods, you might have guessed that for each method that contains at least one of the yield break or yield return keywords, the compiler builds a class that can be seen as a lexical environment. This generated class implements the four interfaces System.Collections.Generic.IEnumerable<T>, System.Collections.IEnumerable, System.Collections.Generic.IEnumerator<T> and System.Collections.IEnumerator if the concerned method returns an enumerable object. The generated class implements only the two enumerator interfaces if the concerned method returns an enumerator object. Moreover, such a method is not directly compiled. The logic contained in its body is in the MoveNext() method of the generated class. Let’s check all these facts on a small example:

The following assembly is the compiled version of the previous program (the assembly is viewed with the tool Reflector provided freely by Lutz Roeder which supports .NET2 assemblies). :

To make things clear, here is the decompiled code of the Main() method. We can then check that an instance of the <AnIterateur>d__0 class is created. In the previous article concerning anonymous methods, we said that such an instance is a closure. Thus in C#2, an iterator is a special kind of closure.

Pay attention to the automatic and implicit use of the dispose pattern that surrounds the creation of the enumerator:

Here is another example that deserves some decompilation:

The following assembly is the compiled version of the previous program:

Notes on generated classes

By looking at the bodies of MoveNext() methods of the previous example, it seems that a state machine is built by the compiler when compiling a method that contains a yield break or yield return keyword. The state of the machine is held by the two instance fields <>1__state and <>2__current of the generated class. Remember that we observe that a thread executing an iterator is able to branch itself just after the previous call to yield return. This magic is possible thanks to the <>1__state field. A switch instruction on <>1__state is inserted at the beginning of the MoveNext() method while the value of <>1__state is properly set each time the running thread is about to leave the MoveNext() method. The field <>2__current holds the value computed by the previous call to MoveNext(). Thus the type of <>2__current is the type of enumerated elements (i.e int32 in both previous examples).

If the method that contains a yield instruction is an instance method, the generated class has an instance field called <>4__this . This field references the enumerable object.

Notice that if the method that contains a yield instruction has some local variables or some arguments, they are captured by the compiler. For instance, in the second example the local variable i is captured. As a consequence, the generated class has a field named <i>5__1 .

Advanced uses of C#2 iterators

From the previous compilation analysis, we can infer two interesting differences between C#1 and C#2 iterators:

  • By design, C#2 iterators support the 'lazy evaluation' pattern. Concretely, the value of a yielded item is computed only when the client asks for it. When dealing with C#1 iterators, the values of all elements of an enumerable must have been computed before looping on it with a foreach instruction.
  • C#2 iterators have no bound for the number of items yielded. When dealing with C#1 iterators, the number of items had to be fixed before any looping with a foreach instruction.

Definitions: coroutine and continuation

A coroutine is a function that resumes its execution at the point it has stopped last time it was called, as if nothing had happened between invocations. A subroutine is a function that starts its execution at the beginning of its body each time it is called. Clearly, all C# methods are subroutines except methods that contain a yield instruction that can be considered as coroutine.

Coroutines are a specialization of closures. Indeed, values of local variables and arguments must be captured by an object between invocations to make possible the magic of resuming at the point we quit last time. Notice also that the offset of the instruction in a coroutine body where the thread will resume its course, must also be captured between invocations. The closure that captures local variables values and the captured instruction offset is called a continuation. Thus, a continuation is an object created behind your back. Notice that a continuation is semantically equivalent to a thread's stack frame. This last remark will be soon relevant.

Harness the power of continuations and coroutines with iterators

From the definition, by coding two coroutines A and B where A calls B and B calls A, we can readily implement an infinite recursion that won't bloat the running thread's stack. This facility can be useful to develop a simplified chess game where the white treatment calls the black treatment and vice-versa. Suppose that some information must be stored by each treatment. In C#1, you would certainly design a class to store this information. In C#2 you can harness the power of continuations and coroutines to code this paradigm:

This programs outputs endlessly:

white move, whiteTreatmentInfo=0
black move
white move, whiteTreatmentInfo=1
black move
white move, whiteTreatmentInfo=2
...

In this example, the role of the static fields black and white is essential. Each time the White() method is called a new continuation is created. Consequently the White() method must be called only one time. The field white is used to reference the unique continuation created by the method White() . The same remark also works with the field black and the method Black() .

In C#, a goto instruction and the label it references must lie in the same method body. This example shows that you can consider coroutines/continuations as a mean to implement ‘goto’ between methods, without any risk to bloat the thread stack.

The Pipeline pattern

Iterators are perfect to implement the pipeline pattern, the one that you have used so many times in your shell windows. For instance:

This program outputs:

-4
-3
-2
6
12
18

Values -4,-3 and -2 are readily understandable. Here is a schema that sheds light on values 6, 12 and 18:

…you get the following output:

Yield:-4
-4
Yield:-3
-3
Yield:-2
-2
Yield:1
Yield:2
Yield:3
6
Yield:4
Yield:5
Yield:6
12
Yield:7
Yield:8
Yield:9
18
Yield:10

This experience exposes the fact that items yield by an iterator are never stored somehow. As soon as an integer is yield by PipelineIntegerRange , it is consumed by chained pipelines.

Iterators, functors and anonymous methods

In the previous article concerning anonymous methods, we have seen that anonymous methods are a good means to implement the C++ notion of functors. Functors mainly allow a seamless syntax to manipulate elements of a collection. Thanks to iterators, you can now use this syntax both for your pipelines and for your own collections. For instance:

This program outputs:

Hello:6
Hello:12
Hello:18

Continuation vs. Threading

The pipeline pattern can be seen as a way to implement the producer/consumer paradigm. We generally use this paradigm in a multithreaded environment. This is the second time that threads are quoted in this article. Indeed, we saw that information bundled in a continuation is the same as those contained in a thread stack frame.

The point is that continuations can help to implement some concurrency pattern. In the following example, a thread producer is computing Fibonacci numbers while the main thread is a consumer that prints yielded numbers on the console:

Notice that the state of the producer thread (i.e values of i1 and i2 ) is permanently stored on its own stack.

Here is the same problematic implemented with a single thread and an iterator. This time, values of i1 and i2 are permanently stored in the continuation that is created when calling the Fibo() method.

A limitation of C#2 iterators

Clearly, the limitation that we underline in the present section won’t hamper you in most of your algorithms. I found it while I was trying to do something like a ‘recursive iterator’. The idea was to use an iterator that calls itself to implement the sieve of Eratosthène. This sieve is an algorithm that allows computing prime numbers. It is explained by the following schema:

The first number of each step is a prime. We remove this first number and its multiples to reach the next step. Each step yields a prime number.

The idea is to use as many pipelines as prime number we want. The first element of the pipeline will yield integers between 2 and N. In order to asses the number of prime numbers between 1 and N, we use the following theorem that was conjectured by Gauss and that was proved by de la Vallée Poussin. This theorem says that if P(N) is the number of prime numbers smaller than N then we have the approximation:

Here is the program:

I left as an exercise to modify this program to have an iterator at the end of a chain of iterators that yields all prime numbers…

The limitation of C#2 iterators that is pinpointed here is the impossibility for an iterator object to reference and call itself. We can’t use the keyword this in the body of the method that contains a yield instruction. In the case of an instance method, the keyword this references the current object. In the case of a static method, the compiler emits an error. It’s like if the developer is not supposed to know what happens behind the scene. We could use the reflection but this solution wouldn’t be satisfying.

Conclusion

In this series of two articles, we walked through basics of anonymous methods and iterators. We discovered that these features are more complex but more useful than expected at first glance. In both cases, a class is built and instantiated under the hood. It is pure syntactic sugar since these classes could have been coded with C#1. However, we found out that it is elegant syntactic sugar that may allow developers to code a complex algorithm with a couple of lines. We also covered that it is appealing to distract these feature to ‘magically’ implement some algorithms such as the pipeline pattern. Nevertheless, you should use such code carefully because it might entangle the logic of the application and stump developers in charge of maintenance. Moreover, you should avoid overusing iterators when dealing with performance critical issues since some heap based objects are created behind your back.

References:

Design Patterns by GoF (Gang of Four) Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides
The C# programming language by Anders Hejlsberg, Scott Wiltamuth, Peter Golde
Generic algorithms for C# 2.0 using iterators and anonymous methods by Jonhathan de Halleux
Iterators, Not Just for Iteration Wesner Moise’s blog
Implementation of Iterators in C# 2.0 (Part 5) Roshan James’ blog
Create Elegant Code with Anonymous Methods, Iterators, and Partial Classes by Juval Lowy
Fun with Iterators and state machines Matt Pietrek’s blog
Delegate based APIs Krzysztof Cwalina’s blog
More Fun With Generics: Action, Converter, Comparison Scott Allen’s blog
More iterator fun with the producer/consumer pattern Joe Duffy’s blog
External Iterators c2 Wiki
CoRoutines c2 Wiki
Continuation Explanation c2 Wiki
Continuations And Coroutines c2 Wiki
Call With Current Continuation c2 Wiki

Authors

Patrick Smacchia is a .NET MVP involved in software development for over 15 years. After graduating in mathematics and computer science from the ENSEEIHT school, he has worked on software in a variety of fields including stock exchange at Société Générale, airline ticket reservation system at Amadeus as well as a satellite base station at Alcatel. He's currently a software consultant and trainer on .NET technologies as well as the author of the freeware NDepend which provides numerous metrics and caveats on any compiled .NET application.

News | Blogs | Discussions | Tech talks | White Papers | Downloads | Articles | Media kit | About
All Content Copyright ©2007 TheServerSide Privacy Policy
Site Map