DocumentDB Async Querying & Streaming

UPDATE (31-08-2017):  This article is superseded by the new article Cosmos DB Async Querying & Streaming.

Working with the .NET client SDK of Azure DocumentDB, I couldn’t find a way to query the store asynchronously.

***This post relates to the version 0.9.1-preview of Microsoft Azure DocumentDB Client Library. If you work with another major version, this might not be relevant.***

That seemed odd since all the SDK is asynchronous, but when it came to querying, you only could form a LINQ query and once you either iterate on it or called ToArray or ToList¸ your process would block in a synchronous manner.

I was half surprised since asynchrony isn’t built into LINQ and must usually be bolted in more or less elegantly. I looked around on the web and couldn’t find a solution. I ended up finding it by myself. Most of you probably did too, but for those who haven’t yet, here is the solution.

Why Async?

Just before I dive in the solution, I just wanted to explain why you would want to implement asynchrony in querying. I keep finding bits on the web indicating that people do not understand why asynchrony is for in .NET so I always think it’s worthwhile to discuss it.

Let’s try the reverse psychology approach. Here is what asynchrony doesn’t bring you:

Asynchrony allows you to… SCALE your server code. It allows you to multiplex your server, to serve more concurrent requests at the same time. If you do not have scaling issues, you might not need asynchrony.

The reason why it allows you to scale? When you async / await on an I/O call (e.g. a DocumentDB remote call), it frees the current thread to be used by another request until the call comes back, allowing you to serve more requests with less threads and memory.

The solution

A LINQ query to DocumentDB would look something like this:

var query = from doc in _client.CreateDocumentQuery<MyDoc>(documentsLink) where doc.MyProperty==”My Criteria” select doc; var documents = query.ToArray();

Where _client is an instance of DocumentClient. Now if you can’t find the method CreateDocumentQuery on that object that is normal. Read this post to understand why.

As previously mention, the ToArray method call will block synchronously. So how do we modify this to be asynchronous? The full solution is embodied in this helper method:

private static async Task<IEnumerable<T>> QueryAsync<T>(IQueryable<T> query) { var docQuery = query.AsDocumentQuery(); var batches = new List<IEnumerable<T>>();

do { var batch = await docQuery.ExecuteNextAsync<T>();

batches.Add(batch); } while (docQuery.HasMoreResults);

var docs = batches.SelectMany(b => b);

return docs; }

You can pass the query variable from previous code snippet to this method since it is an IQueryable<MyDoc>.

The key is in the AsDocumentQuery method. This returns an instance of IDocumentQuery<T> which has asynchronous methods on it.

The beauty of this helper method is that it works for querying documents (CreateDocumentQuery) but also to querying document collection (CreateDocumentCollectionQuery) & databases (CreateDatabaseQuery).


As a bonus, the generic helper method could easily be modified to allow you to stream your results. This could be useful if your query returns a lot of documents that you do not want to keep in memory at the same time. Basically you would only keep the document of a batch (a service call to DocumentDB) at the time.


4 responses

  1. Bill Forney 2015-05-08 at 02:20

    You could rewrite this to return an IObservable and push the results out as they come in. You can then use the IObservable as an enumerable or subscribe to it and react. Just a thought.

  2. James Alexander 2015-08-30 at 15:02

    How would you change this to support streaming? I’ve tried a couple different things and can’t seem to get a handle on it.

  3. Anonymous 2015-12-22 at 16:15

    gjob mate

  4. Eric Pohl 2017-07-20 at 05:53

    Very helpful! I was having the exact same question and ran across your blog searching for an answer.

Leave a comment