DocumentDB Async Querying & Streaming
Solution ·UPDATE (31-08-2017): This article is superseded by the new article Cosmos DB Async Querying & Streaming.
Working with the .NET client SDK of Azure DocumentDB, I couldn’t find a way to query the store asynchronously.
***This post relates to the version 0.9.1-preview of Microsoft Azure DocumentDB Client Library. If you work with another major version, this might not be relevant.***
That seemed odd since all the SDK is asynchronous, but when it came to querying, you only could form a LINQ query and once you either iterate on it or called ToArray or ToList¸ your process would block in a synchronous manner.
I was half surprised since asynchrony isn’t built into LINQ and must usually be bolted in more or less elegantly. I looked around on the web and couldn’t find a solution. I ended up finding it by myself. Most of you probably did too, but for those who haven’t yet, here is the solution.
Why Async?
Just before I dive in the solution, I just wanted to explain why you would want to implement asynchrony in querying. I keep finding bits on the web indicating that people do not understand why asynchrony is for in .NET so I always think it’s worthwhile to discuss it.
Let’s try the reverse psychology approach. Here is what asynchrony doesn’t bring you:
- It doesn't make you client (e.g. browser) asynchronous ; for instance, if you implement it in a service call, it doesn't make the caller asynchronous (e.g. Ajax)
- It doesn't bring you performance per se
- It doesn't make your code run on multiple threads at once
Asynchrony allows you to… SCALE your server code. It allows you to multiplex your server, to serve more concurrent requests at the same time. If you do not have scaling issues, you might not need asynchrony.
The reason why it allows you to scale? When you async / await on an I/O call (e.g. a DocumentDB remote call), it frees the current thread to be used by another request until the call comes back, allowing you to serve more requests with less threads and memory.
The solution
A LINQ query to DocumentDB would look something like this:
var query = from doc in _client.CreateDocumentQuery<MyDoc>(documentsLink) where doc.MyProperty==”My Criteria” select doc; var documents = query.ToArray();
Where _client is an instance of DocumentClient. Now if you can’t find the method CreateDocumentQuery on that object that is normal. Read this post to understand why.
As previously mention, the ToArray method call will block synchronously. So how do we modify this to be asynchronous? The full solution is embodied in this helper method:
private static async Task<IEnumerable<T>> QueryAsync<T>(IQueryable<T> query) { var docQuery = query.AsDocumentQuery(); var batches = new List<IEnumerable<T>>();
do { var batch = await docQuery.ExecuteNextAsync<T>();
batches.Add(batch); } while (docQuery.HasMoreResults);
var docs = batches.SelectMany(b => b);
return docs; }
You can pass the query variable from previous code snippet to this method since it is an IQueryable<MyDoc>.
The key is in the AsDocumentQuery method. This returns an instance of IDocumentQuery<T> which has asynchronous methods on it.
The beauty of this helper method is that it works for querying documents (CreateDocumentQuery) but also to querying document collection (CreateDocumentCollectionQuery) & databases (CreateDatabaseQuery).
Streaming
As a bonus, the generic helper method could easily be modified to allow you to stream your results. This could be useful if your query returns a lot of documents that you do not want to keep in memory at the same time. Basically you would only keep the document of a batch (a service call to DocumentDB) at the time.
Enjoy!
4 responses