Entity Framework 4.1: Deep Fetch vs Lazy Load (3)


This is part of a series of blog post about Entity Framework 4.1.  The past blog entries are:

In this article, I’ll cover the control of what is getting loaded in queries.

EF 4.1 is able to manage relations.  Now which relations get loaded when you do a query?  Everyone “visible” to the object?  In some cases that might make sense (e.g. when you query an entity that has only a child entity), but in many cases you would end up load a good portion of the database or at least much more data that you would like.

By default, EF 4.1 loads only the entities in the query, but it supports two features to help you control what is loaded:

  • Deep Fetch
  • Lazy Load

—————————-

UPDATE:  Typically, deep fetch is referred to as “Eager loading”.  Sorry for inventing a term!

—————————-

I came up with the name deep fetch, I’m not sure there is one for that mechanism.  If there is feel free to tell me.  Anyhow, this mechanism allow you to specify entities you would like to be loaded along the way:

using (var context = new MyDomainContext())
{

  var orders = from o in context.Orders.Include("OrderDetails")
             where o.CustomerName == "Mac"
             select o;

Here I specify that I want to load certain orders (the ones with a Mac as customer name) and that I want the order details of those orders to be loaded along the way.

You can actually look at the generated SQL query:

Console.WriteLine(orders.ToString());

EF 4.1 doesn’t generate easy to read queries and they quickly become impossible to decipher, but this one you should be able to read and you’ll see that the order details are loaded with it.

Now this brings an issue of EF 4.1 in general regarding deep fetch:  query efficiency.  If you give the exercise to a graduate to write a query to retrieve the orders and order details, chances are they’re going to write something functionally equivalent to the generated query.  If you’re lucky they might be smarter and write a query returning two result sets:  one for the orders and one for the details.  This would be much more efficient since you wouldn’t repeat all the order information for each order details.  For some reason EF never supported that.  Probably because it isn’t supported across all database systems and it would be forced to generate two SQL queries for one LINQ query which would probably open a can of worms.  Anyway, keep this in mind as it can easily get ugly on the performance side.

Anyhow, you could request more than one sub collections be brought along:

var orders = from o in context.Orders.Include("OrderDetails").Include("Businesses")
             where o.CustomerName == "Mac"
             select o;

The other feature you can use to control what’s brought along is lazy loading.  By default, lazy loading is supported.  If you want to disable it, you need to do it explicitly.  The best place would be in the constructor of your db context:

public MyDomainContext()
{
    this.Configuration.LazyLoadingEnabled = false;
}

Now lazy loading works as you would expect:  you request an entity-set, it gets loaded and if you try to access a sub-collection of an entity, it gets loaded on the fly, auto-magically!

How does EF knows you’re trying to access a sub collection?  You collection are POCO collection (e.g. List<EntityType>), so no events are raised if you access them.  Anyone?  It’s generating a dynamic object deriving from your entities and override your sub-collection access properties.  Yes it does.  This is why you need to mark your sub-collection access properties with the keyword virtual in order for the magic to operate:

public class Order
{
    public int OrderID { get; set; }
    public string OrderTitle { get; set; }
    public string CustomerName { get; set; }
    public DateTime TransactionDate { get; set; }
    public virtual List<OrderDetail> OrderDetails { get; set; }
    public virtual List<Business> Businesses { get; set; }
}

Let’s give some characteristics of the two mechanisms.  For deep fetch:

  • Reduces Latency (it fetches all data in one trip to the DB server)
  • You need to know in advance what you’re going to need and be explicit about it

Lazy Loading:

  • Very forgiving, since it will just load the data on requests, you do not need to plan in advance
  • Could kill performance because of latency (think of a loop on parent entities and lazy load of the children of one parent in the body of the loop)

Now, when should we use which mechanism?  I’ll give you my guidelines here:  feel free to bring up other ones.  I would use lazy loading except when you have loops with a lazy load in the body of the loop.  It might create 2-3 server queries instead of one, but it is still acceptable, especially given the shortcoming of the deep fetch query mechanism.

About these ads

12 thoughts on “Entity Framework 4.1: Deep Fetch vs Lazy Load (3)

  1. Yah… nHibernate has the same problem with its eager loading. They do a join to fetch all the data instead of N queries, where N is the number of interconnected entities in your query graph. That pretty much sucks. At the time, I wrote an ORM that would do the exact opposite: build N queries, propagate the WHERE clause in each queries, execute them in the parent to child order, and do a single pass on each resultset. For each resultset, instantiate one object per row, and put it in an hashtable so that it could be retrieved by foreign-key by its child. When you do the same pass with one of the child resultset, retrieve the parent object from the hashtable, and add the child to the parent’s collection, and optionally assign the parent to the child. Put the mapping in an xml file, generate IL code at run-time, and boom, you’ve got the best ORM, one that does CRUD on any graph in O(n)!!! :-) Well, enough self-contemplation… In all cases, EF and nHibernate just don’t do it that way, and that’s about it. Deal with it ;-P

  2. My guideline is to let EF build the queries when doing simple data access where you need flexibility but do write stored procedures for complex data fetching operations but make sure that they return a model type or a elaborate view type that you will map to, so you can benefit from EF mapping.

    That gives you (almost) the best of both worlds: flexibility and speed on the bulk of the code and perfomance and control on the specific data bottlenecks of your application

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s