Geeks With Blogs
.NET Nomad What I've learned along the way
Back Links

LINQ Overview, part zero

Forward Links

LINQ Overview, part two (Lambda Expressions)

 

In part zero I stated my intentions, now it is time to act.

If you've ever programmed in C (no, I didn't forget the #) you may have had a function prototype laying around similar to:

int Deposit(struct account *acct, double amnt);

If one were to rewrite this today in C# you'd probably have a class to represent accounts and your method definition would just be:

public void Deposit(double amount)

You would have dropped the int because you can throw an exception if there is an issue and you no longer need to pass in a pointer (i.e. reference) to the account because our account object will contain all the state for us.  In reality what is the difference here? Well, it more or less comes down to syntax.  To execute this code in C I would have to say:

Deposit(&acct, 350.75);

In C# I'd write:

acct.Deposit(350.75);

The great thing about the C# version in terms of syntax is that there are no funky operators to deal with and it is more English like in terms of reading left to right.  In terms of flexibility, however, I have to give the edge to the C version.  Why? Well, because if I need to add a new operation on the account data type in C I can do it anywhere that I want.  All I need to have access to is the prototype of the account struct and I can introduce the following:

int Withdraw(struct account *acct, double amnt);

In C#, I simply can't add a "third party" method to a class that I don't have the source code to.  Sure, partial classes allow me to add methods, but they must be in the same namespace and assembly in order to work since they are a compile time construct.  I could also just take the C approach and say something like:

static void Withdraw(ref Account acct, double amnt)

That however, doesn't clean up the syntax issue from the caller's perspective as they'd now have to pass in the reference to an Account object like so:

Helper.Withdraw(ref acct, 350.75);

Crap! It is actually more to type than the C version.

Enter extension methods.  An extension method is a static method whose first parameter is decorated with the overloaded 'this' keyword.  For example:

public static void Withdraw(this Account acct, double amnt)

Alright, still not much shorter for me as the extension method's author, but how does it look when a piece of client code calls it?

acct.Withdraw(350.75);

As you can see, the call syntax is exactly the same as with the C# version of Deposit.  Under the hood all that is going on is the compiler is seeing our use of 'this' in front of the first parameter and saying, "OK, I know now that I can allow this to be called on Account objects".  Further, if Account was a base class of another type, e.g. BusinessBankAccount, then the extension method would also work on BusinessBankAccount objects.  Do I need to mention that the same holds true for extensions defined to operate on interfaces?  Heck, you can even make an extension method that makes use of Generics!

Another cool aspect of extension methods is that in the Visual Studio 2008 (and even Visual Studio 2005 if you have the .NET 3.0 CTP) environment they are fully supported by intellisense.

There are a couple of restrictions that should be obvious, but I'll list them here anyway.  First, since the extension method is still technically a member of a different class it will only have access to the public members exposed by the class being extended.  In other words, our Withdraw extension method can not access the private and protected members of the Account class.  This restriction places a definite limitation on what can be achieved through extension methods, but is necessary to avoid violating encapsulation.  Second, a consumer of the extension method needs to reference the assembly in which the extension lives or the compiler won't see it.

LINQ Tie In

Now that we know what extension methods are let's take a look at how they are utilized by LINQ.  LINQ is designed to extend query capability to .NET types using extension methods.  The standard query operators of LINQ operate on any type that implements IEnumerable<T>.  There are other technologies in the LINQ family that provide sets of extensions methods, for example LINQ To Dataset provides the same extensions, but for types derived from Dataset. 

If we focus for now on vanilla LINQ, i.e. the one that operates on in memory collections, we can think of at least three ways to construct it.

  1. methods that accept IEnumerable<T> parameters using existing, i.e. pre 3.0, syntax
  2. add methods to the IEnumerable<T> interface for things like Select, Join, etc
  3. provide extension methods for the IEnumerable<T> interface

Item one will probably work, but requires the clunky syntax we saw before.  It could possibly be hidden behind new C# keywords, but it may have required more work at the compiler level and would have made doing any type of dynamic LINQ an undue burden on the developer.

Item two is obviously out the window.  For starters, changing such a core interface like IEnumerable<T> would require so many rewrites not only in the framework, but third party code as well, that Microsoft would have had a developer mutiny on their hands.

Item three is what they ultimately went with and is basically the same as item one now that we know how extension methods actually work.  The advantage is the cleaner syntax offered to developers.

Naive, LINQ-like Extension

There are probably a million blogs out there that have the information you've already seen here so far.  So, I am not going to go over the actual LINQ syntax right now.  Instead I am going to show the method by which LINQ is constructed in a very limited, naive case. 

Let's say LINQ doesn't exist and your team was on a project where you were constantly searching through collections of objects using lots of basic criteria.  We could introduce an extension to the IEnumerable<T> type that allows us to specify our criteria and get back a new collection of items that match it.  Our method might look something like:

public static IEnumerable<TResult> SelectWhere<TResult>(     
     this IEnumerable<TResult> source,     
     Func<TResult, bool> filter)
{

     var results = new List<TResult>();

     foreach(var s in source)
          if(filter(s))
               results.Add(s);

     return results;

}

The above code defines an extension method named SelectWhere that accepts a generic argument called TResult.  TResult is used to flesh out the method's remaining arguments as well as its return type.

The return type and first parameter (the one decorated with 'this') are obvious to us at this point.  The second parameter, Func<TResult, bool>, is simply a generic delegate type.  For a method to match the delegate's signature it must accept a single TResult parameter and return a bool.

If we examine the implementation it is pretty straightforward.  We simply iterate over the entire collection and call the delegate for each item to determine if we need to save it.

Now, how can we call this from client code?  There are a few ways, each of which is legal so let's start with the most explicit and work our way towards the sugar.

First we need a method that matches the delegate to act as our filter criteria:

public static bool PersonIsRich(Account acct) 
{

     return acct.Balance >= 10000D;

}

Your definition of a rich person may be different than mine, but I am sure you get the idea.  We return true if the account has 10,000 or more.

Next we call the method on a collection of Accounts:

var richPeople = SelectExtension.SelectWhere<Account>(allPeople, PersonIsRich);

As we can see the above is our more verbose usage and takes us back to the C function from before.  We need to pass in both the collection we are operating on as well as our filter.

var richPeople = allPeople.SelectWhere<Account>(PersonIsRich);

In this call we've used the extension method syntax and are just specifying the generic argument and the filter.

var richPeople = allPeople.SelectWhere(PersonIsRich);

At this point, we are beyond extension methods.  What just happened above is that the C# compiler is smart enough to figure out what our generic argument has to be in order to satisfy the extension method and doesn't bother making us put it there ourselves.  Logically speaking, if I am calling SelectWhere on an IEnumerable<Account>, then there is only one type that TResult can be, i.e. Account.  This is called 'type inference' and is the same reason that I can use the 'var' keyword instead of specifying the type of richPeople statically.  The compiler determines the type for me at compile time.

Summary

We have now seen what an extension method is and how it relates to LINQ.  We also now know that they are not magic and that we could get the same behavior from a normal, static method.  The new syntax helps add clarity though, and in development, clarity is a good thing.

As a teaser for the next part, take a look at this:

var richPeople = allPeople.SelectWhere(acct => acct.Balance >= 10000D);

This is the final magic.  Where did our delegate function go? What is that crazy syntax?  Well, that is a lambda function and they will be the topic of my next post.

Download Solution - LinqOverview.zip

Posted on Monday, November 12, 2007 9:08 AM LINQ | Back to top


Comments on this post: LINQ Overview, part one (Extension Methods)

# re: LINQ Overview, part one (Extension Methods)
Requesting Gravatar...
I'd previously read part 3 (Lamda Expression) and thought it very helpfull. This is of the same ilk - the divide and conquer of complexity. Thank you.
Left by debo on Dec 11, 2009 11:33 AM

# re: LINQ Overview, part one (Extension Methods)
Requesting Gravatar...
Very good info.The way u have illustrated here is fantastic.Teaching should be like this.what touching me is the picturize of comparing the basic C and the elegants of C#
Thanks a lot
Left by Joseph Jelaskar on Sep 20, 2011 4:45 AM

Your comment:
 (will show your gravatar)


Copyright © newman | Powered by: GeeksWithBlogs.net