
February 18, 2005
Introduction
As with anonymous methods, iterators are a non-obvious but powerful feature
introduced in the version 2 of C#. Iterators can be qualified as syntactic sugar
since they don’t imply any CLR/IL instruction set changes compared to
.NET 1.x. Just like anonymous methods, all the magic is in the compiler. To
understand the underlying motivations of C#2 iterators, we’ll first dig
into C#1 iterators. Then, we’ll cover the basics of this feature in version
2, following with an analysis of the compiler’s work. Finally, we’ll
drill into advanced uses of C#2 iterators.
It is necessary to get a strong sight on what are enumerables and enumerators, before covering C#1 and C#2 enumerators. An object is an enumerable if it contains or references a collection of objects and if elements of this collection can be enumerated with the C# keyword foreach . Instances of classes that implement System.Collections.IEnumerable are enumerables. As we’ll see, there exists some enumerables that don’t fulfill this condition.
An object is an enumerator if its class implements the interface System.Collections.IEnumerator . Here are definitions of these interfaces:

A client that wishes to enumerate items over an enumerable asks for an enumerator
to the enumerable with the method IEnumerator IEnumerable.GetEnumerator().
Then, the client uses methods IEnumerator.MoveNext()
and IEnumerator.Reset() on the enumerator to iterate
over items of the collection. An enumerator is a stateful object that keeps
tracks of an index on the collection. A client calls the get accessor of the
IEnumerator.Current property to access the item
referenced currently by the index of the iterator. If we try to represent enumerables,
enumerators and their clients in an UML diagram it would look like this:

For those who are acquainted with Design Patterns of the Gang of Four, it becomes obvious that iterators of C# implements the design pattern named iterator.
An example
Here is an example of implementing iterators in C#1. The Personn es class is the enumerable class. The PersonsEnumerator class is the enumerator class. Note that the class Person s could have played both roles of enumerable and enumerator if it would have implemented both interfaces. However, we’ll see that it is always preferable that each role are implemented in a dedicated class (after all, we follow a central object oriented programming tenet : one responsibility/one class).

This program outputs:
Michel
Christine
Mathieu
Julien
It's a fairly long program to code such a simple functionality. Most of lines
implement the index's behaviour of the enumerator. Hopefully, we'll see that
C#2 dramatically reduces the amount of code to express the same need.
Note that the C# compiler has generated something like the following to interpret the foreach keyword:

An enumerable class is not obliged to implement the IEnumerable interface. This responsibility can be delegated to a third class PersonsEnumerable , for instance:

This way, it is easy to implement several enumerator classes that target the same enumerable class. It can be quite useful to have an enumerator Reverse that yields elements in the reverse order or an enumerator EvenPosOnly that yields in order only items at even positions.
Clearly, the C#1 syntax for iterators is too heavy compared to the functionality provided. For this reason, most developers don’t use it. Moreover, with this syntax, trying to enumerate over items of a barely exotic collection (such as a binary tree) quickly becomes a nightmare.
C#2 has been enhanced with the yield return keyword to implement the iterator pattern seamlessly. Concretely, it relieves developers from the burden of implementing enumerator and enumerable classes. Here is our previous example rewritten:

The behaviour of the yield return keyword might
stump you. It looks like the yield return keyword is returning a string but
the type of the return value of the method GetEnumerator()
is IEnumerator. Moreover, which class implements
the IEnumerator returned since we don’t explicitly
supply such an implementation? As this article unfolds we’ll thoroughly
shed light on these mysteries but it’s still time to dig in C#2 iterators
basics.
Notice that a method can call several times the yield
return keyword. For instance:

This program has the same output as previous examples:
Michel
Christine
Mathieu
Julien
It seems that each time the thread calls GetEnumerator() , it branches itself just after the previous call to yield return . We’ll check soon that this hunch is the right one.
In a language that supports generics, it would be a drag to still use the object IEnumerator.Current property. Therefore, interfaces IEnumerable and IEnumerator are both provided in a generic form in the .NET framework 2.0:

Notice that the interface IEnumerator<T> implements the interface IDisposable . Notice also that the method IEnumerator.Reset() has been removed. Thanks to generics, we can now tell the compiler that our enumerable is a collection of strings instead of a collection of object:

C#2 iterator is also a good mean to implement concisely several iterators on a single enumerable. For instance :

This program outputs:
-->Iterator Reverse
Julien
Mathieu
Christine
Michel
-->Iterator PositionsPaires
Michel
Mathieu
-->Iterator Concat
Julien
Mathieu
Christine
Michel
Michel
Mathieu
You might wish to enumerate a subset of an enumerable. In this case, the keyword yield break is the right way to tell the client that it should stop looping.

This program outputs:
Michel
Christine
As a consequence, the code written after a yield break instruction is not reachable. The C# compiler emits a warning when meeting such unreachable code.
- yield break and yield return keywords can be used only inside the body of a method, the body of a property accessor or the body of an operator.
- Whatever the kind of method that use yield break or yield return keywords, it must return one the following interfaces System.Collections.Generic.IEnumerable<T> , System.Collections.IEnumerable , System.Collections.Generic.IEnumerator<T> or System.Collections.IEnumerator .
- yield break and yield return keywords can’t be used in the body of an anonymous method.
- yield break and yield return keywords can’t be used in a finally block.
- yield break and yield return keywords can’t be used in a try block that has at least a catch block.
- yield break and yield return keywords can’t be used in a method that has ref or out arguments. More globally, a method that contains at least one of these keywords should not return any other information than the yielded item.
The following example shows the powerfulness of C#2 iterators to enumerate items of a non-flat collection, such as a binary tree:

A representation of the binary tree built by the Main () method is:

This programs outputs:
B
A
D
C
E
If you are a curious developer, you might be delighted by the work done by the C#2 compiler to interpret yield break and yield return keywords . This is the subject of the current section. Keep in mind that iterators are syntactic sugar. Unlike generics, no IL instructions have been added to implement iterators.
A method that uses at least one of the yield break
or yield return keywords must return an enumerable
or an enumerator (generic or not). In any cases, you can consider that an enumerator
object is returned since the only purpose to implement an enumerable interface
is to return an enumerator. If you’ve read the previous article concerning
anonymous methods, you might have guessed that for each method that contains
at least one of the yield break or yield
return keywords, the compiler builds a class that can be seen as a lexical
environment. This generated class implements the four interfaces System.Collections.Generic.IEnumerable<T>,
System.Collections.IEnumerable, System.Collections.Generic.IEnumerator<T>
and System.Collections.IEnumerator if the concerned
method returns an enumerable object. The generated class implements only the
two enumerator interfaces if the concerned method returns an enumerator object.
Moreover, such a method is not directly compiled. The logic contained in its
body is in the MoveNext() method of the generated
class. Let’s check all these facts on a small example:

The following assembly is the compiled version of the previous program (the
assembly is viewed with the tool
Reflector provided freely by Lutz Roeder which supports .NET2 assemblies).
:

To make things clear, here is the decompiled code of the Main()
method. We can then check that an instance of the <AnIterateur>d__0
class is created. In the previous article concerning anonymous methods, we said
that such an instance is a closure. Thus in C#2, an iterator
is a special kind of closure.
Pay attention to the automatic and implicit use of the dispose pattern that surrounds the creation of the enumerator:

Here is another example that deserves some decompilation:

The following assembly is the compiled version of the previous program:

By looking at the bodies of MoveNext() methods
of the previous example, it seems that a state machine is built by the compiler
when compiling a method that contains a yield break or yield return keyword.
The state of the machine is held by the two instance fields <>1__state
and <>2__current of the generated class. Remember
that we observe that a thread executing an iterator is able to branch itself
just after the previous call to yield return. This magic is possible thanks
to the <>1__state field. A switch instruction
on <>1__state is inserted at the beginning
of the MoveNext() method while the value of <>1__state
is properly set each time the running thread is about to leave the MoveNext()
method. The field <>2__current holds the value
computed by the previous call to MoveNext(). Thus the type of <>2__current
is the type of enumerated elements (i.e int32 in both previous examples).
If the method that contains a yield instruction is an instance method, the generated class has an instance field called <>4__this . This field references the enumerable object.
Notice that if the method that contains a yield instruction has some local variables or some arguments, they are captured by the compiler. For instance, in the second example the local variable i is captured. As a consequence, the generated class has a field named <i>5__1 .
From the previous compilation analysis, we can infer two interesting differences
between C#1 and C#2 iterators:
- By design, C#2 iterators support the 'lazy evaluation' pattern. Concretely,
the value of a yielded item is computed only when the client asks for it.
When dealing with C#1 iterators, the values of all elements of an enumerable
must have been computed before looping on it with a foreach instruction.
- C#2 iterators have no bound for the number of items yielded. When dealing
with C#1 iterators, the number of items had to be fixed before any looping
with a foreach instruction.
A coroutine is a function that resumes its execution at the
point it has stopped last time it was called, as if nothing had happened between
invocations. A subroutine is a function that starts its execution
at the beginning of its body each time it is called. Clearly, all C# methods
are subroutines except methods that contain a yield
instruction that can be considered as coroutine.
Coroutines are a specialization of closures. Indeed, values
of local variables and arguments must be captured by an object between invocations
to make possible the magic of resuming at the point we quit last time. Notice
also that the offset of the instruction in a coroutine body where the thread
will resume its course, must also be captured between invocations. The closure
that captures local variables values and the captured instruction offset is
called a continuation. Thus, a continuation is an object created
behind your back. Notice that a continuation is semantically equivalent to a
thread's stack frame. This last remark will be soon relevant.
From the definition, by coding two coroutines A and B where A calls B and B calls A, we can readily implement an infinite recursion that won't bloat the running thread's stack. This facility can be useful to develop a simplified chess game where the white treatment calls the black treatment and vice-versa. Suppose that some information must be stored by each treatment. In C#1, you would certainly design a class to store this information. In C#2 you can harness the power of continuations and coroutines to code this paradigm:

This programs outputs endlessly:
white move, whiteTreatmentInfo=0
black move
white move, whiteTreatmentInfo=1
black move
white move, whiteTreatmentInfo=2
...
In this example, the role of the static fields black and white is essential. Each time the White() method is called a new continuation is created. Consequently the White() method must be called only one time. The field white is used to reference the unique continuation created by the method White() . The same remark also works with the field black and the method Black() .
In C#, a goto instruction and the label it references must lie in the same method body. This example shows that you can consider coroutines/continuations as a mean to implement ‘goto’ between methods, without any risk to bloat the thread stack.
Iterators are perfect to implement the pipeline pattern, the one that you have used so many times in your shell windows. For instance:

This program outputs:
-4
-3
-2
6
12
18
Values -4,-3 and -2 are readily understandable. Here is a schema that sheds light on values 6, 12 and 18:

…you get the following output:
Yield:-4
-4
Yield:-3
-3
Yield:-2
-2
Yield:1
Yield:2
Yield:3
6
Yield:4
Yield:5
Yield:6
12
Yield:7
Yield:8
Yield:9
18
Yield:10
This experience exposes the fact that items yield by an iterator are never stored somehow. As soon as an integer is yield by PipelineIntegerRange , it is consumed by chained pipelines.
In the previous article concerning anonymous methods, we have seen that anonymous
methods are a good means to implement the C++ notion of functors.
Functors mainly allow a seamless syntax to manipulate elements of a collection.
Thanks to iterators, you can now use this syntax both for your pipelines and
for your own collections. For instance:

This program outputs:
Hello:6
Hello:12
Hello:18
The pipeline pattern can be seen as a way to implement the producer/consumer
paradigm. We generally use this paradigm in a multithreaded environment. This
is the second time that threads are quoted in this article. Indeed, we saw that
information bundled in a continuation is the same as those contained in a thread
stack frame.
The point is that continuations can help to implement some concurrency pattern. In the following example, a thread producer is computing Fibonacci numbers while the main thread is a consumer that prints yielded numbers on the console:

Notice that the state of the producer thread (i.e values of i1 and i2 ) is permanently stored on its own stack.
Here is the same problematic implemented with a single thread and an iterator. This time, values of i1 and i2 are permanently stored in the continuation that is created when calling the Fibo() method.

Clearly, the limitation that we underline in the present section won’t hamper you in most of your algorithms. I found it while I was trying to do something like a ‘recursive iterator’. The idea was to use an iterator that calls itself to implement the sieve of Eratosthène. This sieve is an algorithm that allows computing prime numbers. It is explained by the following schema:

The first number of each step is a prime. We remove this first number and its multiples to reach the next step. Each step yields a prime number.
The idea is to use as many pipelines as prime number we want. The first element of the pipeline will yield integers between 2 and N. In order to asses the number of prime numbers between 1 and N, we use the following theorem that was conjectured by Gauss and that was proved by de la Vallée Poussin. This theorem says that if P(N) is the number of prime numbers smaller than N then we have the approximation:

Here is the program:

I left as an exercise to modify this program to have an iterator at the end of a chain of iterators that yields all prime numbers…
The limitation of C#2 iterators that is pinpointed here is the impossibility for an iterator object to reference and call itself. We can’t use the keyword this in the body of the method that contains a yield instruction. In the case of an instance method, the keyword this references the current object. In the case of a static method, the compiler emits an error. It’s like if the developer is not supposed to know what happens behind the scene. We could use the reflection but this solution wouldn’t be satisfying.
In this series of two articles, we walked through basics of anonymous methods
and iterators. We discovered that these features are more complex but more useful
than expected at first glance. In both cases, a class is built and instantiated
under the hood. It is pure syntactic sugar since these classes could have been
coded with C#1. However, we found out that it is elegant syntactic sugar that
may allow developers to code a complex algorithm with a couple of lines. We
also covered that it is appealing to distract these feature to ‘magically’
implement some algorithms such as the pipeline pattern. Nevertheless, you should
use such code carefully because it might entangle the logic of the application
and stump developers in charge of maintenance. Moreover, you should avoid overusing
iterators when dealing with performance critical issues since some heap based
objects are created behind your back.
References:
Design Patterns by GoF (Gang of Four) Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides
The C# programming language by Anders Hejlsberg, Scott Wiltamuth, Peter Golde
Generic algorithms for C# 2.0 using iterators and anonymous methods by Jonhathan de Halleux
Iterators, Not Just for Iteration Wesner Moise’s blog
Implementation of Iterators in C# 2.0 (Part 5) Roshan James’ blog
Create Elegant Code with Anonymous Methods, Iterators, and Partial Classes by Juval Lowy
Fun with Iterators and state machines Matt Pietrek’s blog
Delegate based APIs Krzysztof Cwalina’s blog
More Fun With Generics: Action, Converter, Comparison Scott Allen’s blog
More iterator fun with the producer/consumer pattern Joe Duffy’s blog
External Iterators c2 Wiki
CoRoutines c2 Wiki
Continuation Explanation c2 Wiki
Continuations And Coroutines c2 Wiki
Call With Current Continuation c2 Wiki
Authors
 |
Patrick Smacchia is a .NET MVP involved in software development for over 15 years. After graduating in mathematics and computer science from the ENSEEIHT school, he has worked on software in a variety of fields including stock exchange at Société Générale, airline ticket reservation system at Amadeus as well as a satellite base station at Alcatel. He's currently a software consultant and trainer on .NET technologies as well as the author of the freeware NDepend which provides numerous metrics and caveats on any compiled .NET application.
|
|