Introduction to TPL Dataflow

In Octobre 2010, Microsoft released a white paper on an oncoming library in the .NET Framework, the Task Parallel Library DataFlow.

imageThe dataflow library is built on top of the Task Parallel Library (TPL) included in .NET 4.0.  Basically, the TPL provides the building block for task-oriented Frameworks.  The dataflow library also integrates with new language support for tasks in C# (e.g. async).  Finally, and not the least, the library inherits the work done in the Concurrency & Coordination Runtime (CCR), from Microsoft Robotics.

The library attempts to model flows of data.  The building blocks are source and target components.  Basically, you pump data into a source and it pushes it to its network of target in an asynchronously fashion.

    In many ways, it does promote the actor / agent-oriented design.  This design style facilitates parallelism since state (messages) flows from agent to agent asynchronously and no states are shared.
    For now, only a CTP is available.  From the looks of it, it seems extremely code-intense.  I believe it will require a designer to make it useful.  It’s a bit like Workflow Foundation without the designer right-now:  a lot of good ideas but the amount of work to use it is way too high to make it worthwhile.
    The foundation is great though.  The parallel team at Microsoft obviously have a clear vision of where they want to go in tomorrow’s multi-core world.  They are seeding the building blocks for a new way to develop applications making it easy to leverage the multiple cores of modern devices.
    Today, parallelism is hard.  Sure, you can throw a couple of calls to TaskFactory or your beloved thread-pool.  But how do you handle IO parallelism?  For instance, you’re doing a call to your DB-server on an ASP.NET thread.  Do you use async-page model?  Have you tried doing that?  Not trivial, hen?  Well, unless you do that (and %99.99 ASP.NET applications don’t), you’re blocking an ASP.NET thread where you could simply wait on an I/O port.  Just for that type of scenario, it’s clear we need new tools to develop parallelism.
    The Task Parallel Library goes a long way to make it easy to generate work with threads, but it doesn’t go the whole way to alleviate the pain of synchronizing work.  This is where the CCR made a lot of headways:  it didn’t focus on threads, not even on work, it focused on coordination of work.  This is what the TPL Dataflow inherited.

One thought on “Introduction to TPL Dataflow

  1. Thank you for the sensible critique. Me and my cousin were just preparing to do a little research about this. We got a book from our area library but I think I learned better from this post. I am very glad to see such fantastic info being shared freely out there.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s