Tag Archives: .NET

Development using the Microsoft .NET platform.

Beyond 2 concurrent connections in .NET

I’m going to document this once and for all.

The Problem

Ethernet-Cable-icon[1]You want to an endpoint multiple times in parallel. Or maybe you want to call multiple endpoints under the same domain name. For instance, you might want to drill an API with multiple requests because the API doesn’t support batch mode.

The problem is that .NET, by default, supports only 2 TCP connections to the same IP address in parallel. If you async a bunch of web-calls, it’s going to simply queue them and run them 2-by-2. So you won’t scale much.

Sure this limit of two isn’t hardcoded in .NET and you can change it, can’t you!?

The Solution

Business-Parallel-Tasks-icon[1]Yes we can override that number.  It is driven by what is called the Service Point Manager in System.Net.

There are two ways to override it:  by configuration or by code.  I would suggest to use the configuration route if your needs are static and by code if you need to change it given an input.

Configuration

Here’s an example on how to override in in configuration:

<configuration>
 <system.net>
  <connectionManagement>
   <add address="myapi.com" maxconnection="12"/>
  </connectionManagement>
 </system.net>
</configuration>

Here I specify to have a maximum number of connections of 12 instead of 2 on the domain myapi.com only.

I could specify different rules for different domains:

<configuration>
 <system.net>
  <connectionManagement>
   <add address="myapi.com" maxconnection="12"/>
   <add address="yourapi.com" maxconnection="8"/>
   <add address="hisapi.com" maxconnection="4"/>
  </connectionManagement>
 </system.net>
</configuration>

Or I could do a blanket statement:

<configuration>
 <system.net>
  <connectionManagement>
   <add address="*" maxconnection="15"/>
  </connectionManagement>
 </system.net>
</configuration>

Configuration

In code, the easiest way is to do a blanket statement on all domains, using the static ServicePointManager.DefaultConnectionLimit property:

ServicePointManager.DefaultConnectionLimit = 15;

In order to go by domain, I would go through ServicePoint objects, BUT I NEVER TRIED IT:

var myApiServicePoint =
  ServicePointManager.FindServicePoint("myapi.com");

myApiServicePoint.ConnectionLimit = 12;

var yourApiServicePoint =
  ServicePointManager.FindServicePoint("yourapi.com");

yourApiServicePoint.ConnectionLimit = 8;

var hisApiServicePoint =
  ServicePointManager.FindServicePoint("hisapi.com");

hisApiServicePoint.ConnectionLimit = 4;

The Silver Bullet

Now that we have this solution, we feel all empowered, right?

I mean, we can now bombard our favorite APIs limitlessly.  Typically the heavy lifting is done on the API side so if we invoke APIs asynchronously, we can stream a lot of activity from a low-compute server, right?

The caveat is…  will the API let you?  If you hammer an API, there are two typical outcomes:

  • You’re gona crash it
  • It’s going to throttle you, or worse, it’s going to black list you for a while a actively refuse your connections

Basically the second outcome is from an API owner that didn’t want the first outcome to happened 😉

And I know it, because I’ve done it!  I’ve tried that trick on Azure Active Directory Graph API ; performance climbed for a few seconds then dropped drastically:  I got throttled.  Worse:  I got throttled for an hour.  For an entire hour the performance sucked because my IP got black listed.

I’ve done it with IMDB last week.  I was trying to download its entire catalog by hitting every movie page using Azure Batch and I got black listed (again!).

So be mindful about that and wield the connection limit sword carefully 😉

0D9BF61E08[1]

 

Advertisements

Major upgrade to Azure DocumentDB LINQ provider

IC791289[1]Early Septembre 2015, Microsoft has announced a major upgrade to the LINQ Provider of the .NET SDK of DocumentDb.

I know it does appear a bit confusing since when DocumentDb was released (a year ago now), it was said that it supported SQL.  Well, it supported some SQL.

Now the surface area of SQL it supports has increased.  In order for us to take advantage of this within a .NET application, the LINQ Provider must be upgraded to translate more operations into that SQL.

You see, DocumentDb’s SDK works the same way than Entity Fx or LINQ to SQL or LINQ to XML in that respect:  your c# LINQ query gets translated into an expression tree by the compiler, then the LINQ provider (an implementation of IQueryable) translates the expression tree into an SQL string (at runtime).

LINQ

The SQL is what is sent to the DocumentDb service.

Today the LINQ provider allows string manipulation, array manipulation (e.g. concatenation), order by and some hierarchical manipulation too.

So download the latest NuGet package of DocumentDb client (i.e. 1.4.1) and try to expand your LINQ queries.

 

Enjoy!

DocumentDB Async Querying & Streaming

UPDATE (31-08-2017):  This article is superseded by the new article Cosmos DB Async Querying & Streaming.

Working with the .NET client SDK of Azure DocumentDB, I couldn’t find a way to query the store asynchronously.

***This post relates to the version 0.9.1-preview of Microsoft Azure DocumentDB Client Library. If you work with another major version, this might not be relevant.***

That seemed odd since all the SDK is asynchronous, but when it came to querying, you only could form a LINQ query and once you either iterate on it or called ToArray or ToList¸ your process would block in a synchronous manner.

I was half surprised since asynchrony isn’t built into LINQ and must usually be bolted in more or less elegantly. I looked around on the web and couldn’t find a solution. I ended up finding it by myself. Most of you probably did too, but for those who haven’t yet, here is the solution.

Why Async?

Just before I dive in the solution, I just wanted to explain why you would want to implement asynchrony in querying. I keep finding bits on the web indicating that people do not understand why asynchrony is for in .NET so I always think it’s worthwhile to discuss it.

Let’s try the reverse psychology approach. Here is what asynchrony doesn’t bring you:

  • It doesn’t make you client (e.g. browser) asynchronous ; for instance, if you implement it in a service call, it doesn’t make the caller asynchronous (e.g. Ajax)
  • It doesn’t bring you performance per se
  • It doesn’t make your code run on multiple threads at once

Asynchrony allows you to… SCALE your server code. It allows you to multiplex your server, to serve more concurrent requests at the same time. If you do not have scaling issues, you might not need asynchrony.

The reason why it allows you to scale? When you async / await on an I/O call (e.g. a DocumentDB remote call), it frees the current thread to be used by another request until the call comes back, allowing you to serve more requests with less threads and memory.

The solution

A LINQ query to DocumentDB would look something like this:

var query = from doc in _client.CreateDocumentQuery<MyDoc>(documentsLink)
where doc.MyProperty==”My Criteria”
select doc;
var documents = query.ToArray();

Where _client is an instance of DocumentClient. Now if you can’t find the method CreateDocumentQuery on that object that is normal. Read this post to understand why.

As previously mention, the ToArray method call will block synchronously. So how do we modify this to be asynchronous? The full solution is embodied in this helper method:

private static async Task<IEnumerable<T>> QueryAsync<T>(IQueryable<T> query)
{
var docQuery = query.AsDocumentQuery();
var batches = new List<IEnumerable<T>>();

do
{
var batch = await docQuery.ExecuteNextAsync<T>();

batches.Add(batch);
}
while (docQuery.HasMoreResults);

var docs = batches.SelectMany(b => b);

return docs;
}

You can pass the query variable from previous code snippet to this method since it is an IQueryable<MyDoc>.

The key is in the AsDocumentQuery method. This returns an instance of IDocumentQuery<T> which has asynchronous methods on it.

The beauty of this helper method is that it works for querying documents (CreateDocumentQuery) but also to querying document collection (CreateDocumentCollectionQuery) & databases (CreateDatabaseQuery).

Streaming

As a bonus, the generic helper method could easily be modified to allow you to stream your results. This could be useful if your query returns a lot of documents that you do not want to keep in memory at the same time. Basically you would only keep the document of a batch (a service call to DocumentDB) at the time.

Enjoy!

Full Outer Join with LINQ to objects

Quite a few times it happened to me to be looking for a way to perform a full outer join using LINQ to objects.

To give a general enough example of where it is useful, I would say ‘sync’. If you want to synchronize two collections (e.g. two collections of employees), then an outer join gives you a nice collection to work with.

Basically, a full outer join returns you a collection of pairs. Every time you have both items in the pair, you are facing an update: i.e. the item was present in both collections so you need to update it to synchronize. If only the first item of the pair is available, you have a creation while if only the second item is you have a delete (I’m saying first and second, but it actually really depends on how you formulated the query but you get the meaning).

Whatever the reason (a sync is the best example I could find), here is the best way I found to do it. It is largely inspired on an answer I found on stack overflow.

public static IEnumerable<TResult> FullOuterJoin<TOuter, TInner, TKey, TResult>(

this IEnumerable<TOuter> outer,

IEnumerable<TInner> inner,

Func<TOuter, TKey> outerKeySelector,

Func<TInner, TKey> innerKeySelector,

Func<TOuter, TInner, TResult> resultSelector,

IEqualityComparer<TKey> comparer)

{

if (outer == null)
{

throw new ArgumentNullException("outer");

}

if (inner == null)

{

throw new ArgumentNullException("inner");

}

if (outerKeySelector == null)

{

throw new ArgumentNullException("outerKeySelector");

}

if (innerKeySelector == null)

{

throw new ArgumentNullException("innerKeySelector");

}

if (resultSelector == null)

{

throw new ArgumentNullException("resultSelector");

}

if (comparer == null)

{

throw new ArgumentNullException("comparer");

}

var innerLookup = inner.ToLookup(innerKeySelector);

var outerLookup = outer.ToLookup(outerKeySelector);

var allKeys = (from i in innerLookup select i.Key).Union(

from o in outerLookup select o.Key,

comparer);

var result = from key in allKeys

from innerElement in innerLookup[key].DefaultIfEmpty()

from outerElement in outerLookup[key].DefaultIfEmpty()

select resultSelector(outerElement, innerElement);

return result;

}

So here it is and it works.

You can easily optimize the signature by specializing for special cases (e.g. bumping the comparer, considering two collections of the same type hence requiring only one key selector, etc.).

For performance, I didn’t bother… but I wonder if creating those two lookups isn’t actually slower than doing a cross product (double loop) over both collection items and checking for key equality. My gut feeling is that it’s probably wasteful for small collections, worth it for big ones, hence if you optimize it, you do it for small collection which do not have performance problem anyway.

Enjoy!

ePub Factory NuGet package

I’ve been publishing this NuGet package.

Ok, so why do yet another ePub library on NuGet when there are already a few?

Well, there aren’t that many actually and none are Portable Class Library (PCL).

So I’ve built an ePub library portable to both Windows 8+ & .NET 4.5.1. Why not Windows Phone? My library is based on System.IO.Compression.ZipArchive which isn’t available on Silverlight in general. That being said, what would be a use case to generate an ePub archive on a smart phone?

I have in my possession a Kobo Touch (yes, my Canadian fiber got involved when I chose the Kobo). I love to read on it: it is SO much more relaxing for my eyes than a tablet. It’s like reading a book but where I can change the content all the time. You see I use it to read a bunch of technical articles on public transport, so I upload new stuff all the time.

I wanted to automatize parts of it and hence I needed an ePub library. I would like to embed that code in a Windows App at some point (this is mostly pedagogical for me you see) so I needed something PCL.

Anywho, two technical things to declare:

1. ePub is complicated!

If you ever want to handcraft an ePub, use an ePub validator such as the excellent http://www.epubconversion.com/ePub-validator-iBook.jsp. Otherwise the ePub just doesn’t work and ePub tools (either eReader or Windows App) are quite silent about the problems.

The biggest annoyance for me was the spec that says you should have your first file starting at byte 38. This is the mime type of ePub and is meant to be a sort of check, i.e. no need to open the archive (an ePub is a zip file underneath) for a client to check, simply go at byte 38 and check you have the ePub mime type to validate you have a valid ePub in your hand.

Well, for that you need to write the mime type file first AND not compress it. Apparently that’s too much for System.IO.Compression.ZipArchive. I really needed that library since it works in async mode. So I did a ‘prototype’ epub file with only the mime type using another zip library (the excellent DotNetZip) and used that prototype as the starting point of any future ePub!

2.  My first NuGet package

Yep! So I went easy on myself and downloaded a graphic tool, NuGet Package Explorer.

I didn’t use much NuGet feature besides embedding the XML comment file in the NuGet package.

Quite neat!

It’s quite cool to handle packages the NuGet way. You can update them at will completely independently…

The Missing Windows 8 Instructional Video

Scott Hanselman has produced an high quality video on YouTube.

The video is a nice and comprehensive introduction to Windows 8 for everyone (i.e. not geek only).

My experience with Windows 8 is that once you’ve figured out a few things (e.g. how to activate contextual Search), you can start appreciating the product.  Before that, it just looks and feels weird and annoying.

I’ve learned the ropes by myself and through all the blogs I’m reading.  But something tells me my wife won’t have that patience.  I’ll try that video on her!

Entity Framework with Asynchronous behaviours

They finally did it:  the future release of Entity Framework (version 6) will sport asynchronous behaviour based on .NET 4.0 Task Parallel Library (TPL).

The API is pretty neat.  First the SaveChanges gets an async brother SaveChangesAsync returning a Task.  So we can now write things like:

await context.SavesChangesAsync();

The more complicated topic is the queries.  LINQ was designed before TPL and doesn’t have the notion of asynchrony.  They got around it in a clever fashion:  LINQ describe the queries while EF allows you to enumerate the result in an asynchronous fashion:

var q = from e in context.Employees

  where e.Name.StartsWith("V")

  select e;

 

await q.ForEachAsync(e => Console.WriteLine(e.FirstName));

So the entire enumeration is done asynchronously hence Entity Framework can manage the moment when it needs to fetch the DB for new objects.

This new feature is quite powerful since DB access is typically a place where your thread blocks, waiting for something external.  For instance, a web service doing a query and returning data is typically written synchronously with the thread blocking waiting for the DB server.  Using this new asynchronous mode, we can as easily write an asynchronous version, much more scalable since no threads are blocking, hence more thread can be used to process requests.