Docker Containers on Windows Server

If you had any doubts about the increased pace in IT innovation, look at Docker Containers.  The project was open sources in March 2013 as a container technology for Linux and 1.5 years later, in Octobre 2014, Microsoft announced they were integrating that technology on Windows Server 2016!

That’s 1.5 years from toe in the water to major influence.  Impressive!

logo[4]

The first Windows Server Container Preview has been announced in August 2015 as part of the Technical Preview 3 of Windows Server.  The preview also comes with Visual Studio integration, in the form of Visual Studio tools for Docker.

Mark Russinovich also published a very good technical post about Docker containers on Windows:  what they are, what are the advantages & scenarios where they nicely apply to.

Basically, Docker Containers are standard packages to deploy solution on a host.  The main advantage of having Docker Containers are in the small footprint of the container which results in a higher density of applications on a given host and a very quick startup time, compare to a Virtual Machine where the entire OS must be loaded in memory and booted.

In Windows, hosts will come in two flavours:  Windows Server host & Hyper-V host.  The former will maximize resource utilization and container density on a host while the latter maximizes isolation.

At first the Hyper-V container sounds like it defies the purpose of having Docker Containers in the first place since they basically implement the container as an entire VM.  But if you think about it, on the long run it makes perfect sense.  The first version of Docker Container on Windows will likely have security holes in them.  Therefore if you have scenario with ‘hostile multi-tenants’, you’ll probably want to stick to Hyper-V.  But in time, the security of Docker on Windows will tighten and you’ll be able to move to normal containers as a configuration-change.

Service Fabric

We can imagine that once Windows Server 2016 roll out, we’ll see Docker Container appearing in Azure.  I wouldn’t be surprise to see them fuse with the App Services shortly after that.

They are also very likely to be part of the upcoming Azure Service Fabrik, Microsoft offering for rapidely building Micro Services.

Integration with Azure Service Bus

message[1]

I’ve been consulting 1.5 years for a customer embarking a journey leveraging Microsoft Azure as an Enterprise platform, helping them rethink their application park.

Characteristic of that customer:

  • Lots of Software as a Service (Saas) third parties
  • Business is extremely dynamic, in terms of requirements, transitions, partnerships, restructuring, etc.
  • Medium operational budget:  they needed to get it pretty much right the first time
  • Little transaction volume

One of the first thing we did was to think about the way different systems would integrate together given the different constraints of the IT landscape of the organization.

We settled on use Azure Service Bus to do a lot of the integrations.  Since then, I worked to help them actually implement that in their applications all the way to the details of operationalization.

Here I wanted to give my lessons learned on what worked well and what didn’t.  Hopefully, this would prove useful to others out there set out to do similar integration program.

Topics vs Queues

The first thing we decided was to use Topics & Subscriptions as opposed to queues.  Event Hubs didn’t exist when we started so it wasn’t considered.

They work in similar ways with one key difference:  a topic can have many subscribers.

This ended up being a really good decision.  It costs nearly nothing:  configuring a subscription takes seconds longer than just configuring a queue.  But it bought us the flexibility to add subscribers along the way as we evolved without disrupting existing integrations.

A big plus.

Meta Data

imageIn order to implement a meaningful publish / subscribe mechanism, you need a way to filter messages.  In Azure Service Bus, subscription filter topic messages on meta data, for instance:

  • Content Type
  • Label
  • To
  • Custom Properties

If you want your integration architecture to have long-term value and what you build today be forward compatible, i.e. you want to avoid rework when implementing new solutions, you need to make it possible for future consumers to filter today’s messages.

It’s hard to know what future consumers will need but you can try populating the obvious.  Also, make sure your consumers don’t mind if new meta data is added along the way.

For instance, you want to be able to publish new type of messages.  A topic might start with having orders published on it but with time you might want to publish price-correction messages.  If a subscription just take everything from the topic, it will swallow the price-correction and potentially blow the consumer.

One thing we standardized was the use of content-type.  A content type would tell what type of message the content is about.  The content-type would actually contain the major version of the message.  This way an old consumer wouldn’t break when we would change a message version.

We used labels to identity the system publishing a message.  This was often useful to stop a publishing loop:  if a system subscribes to a topic where it itself publishes, you don’t want it to consume its own message and potentially re-publish information.  This field would allow us to filter out message this way.

Custom Properties were more business specific and the hardest to guess in advance.  It should probably contain the main attributes contained in the message itself.  For an order message, the product ID, product category ID, etc.  should probably be in it.

Filtering subscription

filter[1]Always filter subscription!  This is the only way to ensure future compatibility.  Make sure you specify what you want to consume.

Also, and I noticed only too late while going into production:  filtering gives you a massive efficiency boost under load.

One of the biggest integration we developed did a lot of filtering on the consumer-side, i.e. the consumer C# code reading messages would discard messages based on criteria that could have been implemented in the filters.  That caused the subscriptions to catch way more messages than they should and take way more time to process.

Filtering is cheap on Azure Service Bus.  It takes minutes more to configure but accelerate your solution.  Use it!

Message Content

You better standardize on the format of messages you’re going to carry around.  Is it in XML, JSON, .NET binary serialized?

Again you want your systems to be decoupled so having a standard message format is a must.

Automatic Routing

There is a nice feature in Azure Service Bus:  Forward To.  This is a property of a subscription where you specify which topic (or queue) you want every message getting into the subscription to be routed to.

Why would you do that?

Somebody had a very clever idea that turned out to pay lots of dividend down the road.  You see, you may want to replay messages when they fail and eventually fall in the dead letter queue.  The problem with a publish / subscribe model is that when you replay a message, you replay it in the topic and all subscriptions get it.  Now if you have a topic with say 5 subscriptions and only one subscription struggles with a message and you replay it (after, for instance, changing the code of the corresponding consumer), then 4 subscriptions (previously successfully processing the message) will receive it again.

So the clever idea was to forward messages from every subscriptions to other topics where they could be replayed.  Basically we had two ‘types’ of topics, topic to publish messages and topic to consume messages.

Semantic of Topics

While you are at it, you probably want to define what your topics represent.

Why not put all messages under one topic?  Well, performance for one thing but probably management at some point.  At the other end of the spectrum, why not one topic per message type?

Order.

Service Bus guaranties order within the same topic, i.e. messages will be presented in the order they were delivered.  That is because you can choose to consume your messages (on your subscription) one by one.  But if messages are in different topics, you’ll consume them in different subscription and the order can be altered.

If order is important for some messages, regroup them under a same topic.

We ended up segmenting topics along enterprise data domains and it worked fine.  It really depends what type of data transits on your bus.

Multiplexing on Sessions

imageA problem we faced early on was due to caring a bit too much about order actually.

We did consume one message at the time.  That could have performance issues but the volume wasn’t big, so that didn’t hit us.

The problems started when you encounter a poison message though.  What do you do with it?  If you let it reach the dead letter queue then you’ll process the next message and violate order.  So we did put a huge retry count so this would never happen.

But then that meant blocking the entire subscription until somebody got tired and looked into it.

A suggestion came from Microsoft Azure Bus product team itself.  You can assign a session-ID to message.  Messages with the same session-ID would be grouped together and order properly while messages from different session can be process independently.  Your subscription needs to be session-ful for this to work.

This allowed us to have only one of the session to fail and the other messages to kept being processed.

Now how do you choose your session-ID?  You need to group messages that depend (order-wise) on each other together.  That typically boils down to the identifier of an entity in the message.

This can also speedup message processing since you are no longer bound to one-by-one.

After that failing messages will keep failing but that will only hold on correlated messages.  That is a nice “degraded service level” as opposed to completely failing.

Verbose Message Content

One of the thing we changed midway was the message content we passed.  At first we use the bus to really send data, not only events.

There are advantages in doing so:  you really are decoupled since the consumer gets the data with the message and the publishing system doesn’t even need to be up when the consumer process the message.

It has one main disadvantage when you use the bus to synchronize or duplicate data though:  the bus becomes this train of data and any time you disrupt the train (e.g. failing message, replaying message, etc.) you run into the risk of breaking things.  For instance, if you switch two updates, you’ll end up having old data updated in your target system.  It sounds far fetched but in operation it happens all the time.

Our solution was to simply send identifiers with the message.  The consumer would interrogate the source system to get the real data.  This way the data it would get would always be up to date.

I wouldn’t recommend using that approach all the time since you lose a lot of benefits from the publish / subscribe mechanism.  For instance, if your message represents an action you want another system to perform (e.g. process order), then having all the data in the message is fine.

Summary

This was the key points I learned from working with the Azure Service Bus.

I hope they can be useful to you & your organization.  If you have any question or comments, do not hesitate to hit the comments section!

Nuget WordPress REST API – Authentication

wordpress_logo[1]I use WordPress.com as my blog platform.  It hosts the WordPress CMS software and adds a few goodies.

I was curious about their API after noticing that my Blog App (Windows Live Writer) tended to create duplicate of pictures, leaving lots of unused assets in my Media library.  This really is a personal pet peeve since I’m still at less than %5 of my asset quota after 5 years.

There happens to be two APIs in WordPress.com.  The old XML RPC API, used by Windows Live Writer actually, and the new REST API.

The new API is what people would call a modern API:  its authentication is OAuth based, it is RESTful and has JSON payloads.

Surprisingly there didn’t seem to be any .NET client for it.  So I thought…  why not build one?

Enters WordPress REST API Nuget package.  So far, I’ve implemented the authentication, a get-user and a part of a search-post.

For the search-post, I took the not-so-easy-path of implementing a IQueryable<T> adapter in order to expose the Post API as a Linq interface.  I’ll write about that but for an heads-up:  not trivial, but it works and is convenient for the client.

I will release the source code soon, but for the moment you can definitely access the Nuget package.

You can trial the client on a site I’m assembling on https://wordpress-client.azurewebsites.net/Warning:  I do not do web-UI so the look-and-feel is non-existing Winking smile

Here I’ll give a quick how-to using the client.

Authentication

WordPress.com has the concept of application.  If you’re steep in Claim based authentication, this is what is typically referred to as a relying party.  It is also equivalent to an application in Azure Active Directory.

You setup application in https://developer.wordpress.com/apps/.  The three key information you need in order to get a user to authorize your application to access WordPress.com are:

  1. Client ID:  provided by WordPress.com, the identifier of your application
  2. Client Secret:  also provided by WordPress.com, a secret it expects you to pass around
  3. Redirect URL:  provided by you, where WordPress will send the user back after consent is given

Here is the authorization flow:

image

# Description
1 The user clicks on a ‘sign in’ link from your web site.
2 Your web redirect the user’s browser to a WordPress.com site passing the client-ID of your application and the return-url you’ve configured.  The URL will be:https://public-api.wordpress.com/oauth2/authorize?client_id=&lt;your value>;redirect_uri=<your value>;response_type=code
3 Assuming the user consent for your application to use WordPress.com, the user’s browser is redirected to the Redirect URL you provided to WordPress.com.  In the query string, your application is given a code.  This code is temporary and unique to that transaction.
4 Your application can now contact directly (without the browser) the WordPress.com API to complete the transaction.  You POST a request tohttps://public-api.wordpress.com/oauth2/token

You need to post the code, the client-ID and other arguments.

5 The API returns you a token you can use for future requests.
6 For any future request to the API, you pass the token in the HTTP request.

Now, this is all encapsulated in the WordPress REST API Nuget package.  You still need to do a bit of work to orchestrate calls.

The link to the authorization page you need to redirect the end-user to can be given by:

static string WordPressClient.GetUserAuthorizeUrl(string appClientID, string returnUrl)

You pass the client-ID of your application and its return-url and the method returns you the URL you need to redirect to user to (step 2).

Then on the return-url page, you need to take the code query string parameter and call

static Task<WordpressClient> WordPressClient.GetTokenAsync(string clientID, string clientSecret, string redirectURL, string code)

This method is async.  All methods interacting with WordPress API are async.  The method returns you an instance of the WordPressClient class.  This is the gateway class for all APIs.

That was step 4 & 5 basically.

Rehydrating a WordPress Client between requests

That is all nice and well until your user comes back.  You do not want them to authorize your application at every request.

The typical solution is to persist the token in the user’s cookies so that at each request you can recreate a WordPressClient object.

For that you can access the token information in

TokenInfo WordPressClient.Token { get; }

When you want to recreate a WordPressClient, simply use its constructor:

WordPressClient(TokenInfo token)

Getting user information

Just as an example of how to use the API beyond authorization, let’s look at how to get information about the user.

Let’s say the variable client is a WordPressClient instance, then the following line of code

var user = await client.User.GetMeAsync();

gets you a bunch of information about your end-user profile on WordPress.com, such as their display name, the date the user join the site, their email, etc. .  This methods wraps the API operation https://developer.wordpress.com/docs/api/1.1/get/me/.

Summary

This was a quick run around this new WordPress REST API Nuget package I just created.  I’ll put it on Codeplex soon if you want to contribute.

Corporate Cultures

It is said that Netflix represents the new I.T. corporation well.

Netflix-Xbox360-HiRes[1]

If you are interested in seeing what their corporate culture looks like, have a look a the slide deck they show to their job candidates.

It has all the flair of the typical Silicon Valley shop with their “we know better than those bunch of twits” attitude, but their critics and response to “normal” corporation’s values is a good read.

 

But if you want something even more drastic, check-out Valve’s.  Valve Corporation (Video Game distributor, e.g. Half-Life) implements a drastic departure from normal corporation, a flat organization.  Their employee manual has an even more pamphlet feel to it.

 

In the same vein but in a Brazilian company not related to IT, a good watch is the following TED Talk Video.  Semco CEO Ricardo Semler gives a very inspiring talk about how he deconstructed the “boarding school” aspects of his company, by removing arrival time, desk location, even salary, out of management hands with apparently great success:  his company’s revenue grew many folds under his lead.

 

Enjoy!

AzureML – Polynomial Regression with SQL Transformation

I meant to illustrate over fitting (discussed in a past blog) with AzureML.  An easy way to illustrate it is to fit a bunch of sample points near perfectly and the best tool for that is Polynomial Regression.

I was surprised to see that AzureML doesn’t support Polynomial Regression natively.  But…  while thinking about it, you can implement it using a linear regression.  In order to do that, I’ll introduce a useful module of AzureML:  Apply SQL Transformation.

So I will keep over fitting for a future blog and concentrate on polynomial regression for today!

Data Set

But first, let’s construct a data set.  I want something that looks like a linear pattern with a bit of a wave driving it to simulate a bit of noise on top of a linear pattern.

Let’s build it in Excel.  Very small data set:  20 points.  Two columns:  X & Y.  Column X goes from 1 to 20.  Column Y is a formula:  =SIN(3.5*(A2-1)/20*2*PI())+A2/2.  So basically, I add a sinusoid to a linear formula.  This gives the following data set:

X Y
1 0.5
2 2.526591
3 2.540285
4 2.092728
5 2.118427
6 2.365618
7 4.31284
8 5.546013
9 5.359401
10 5.086795
11 4.71177
12 5.876326
13 7.551295
14 8.585301
15 7.922171
16 7.470766
17 8.179313
18 9.332662
19 10.66634
20 11.06512

If you plot that in Excel:

image

As I’ve shown in a previous blog, we can take that data and import it in AzureML as a Data Set.  I’ll skip the details for that here.

From polynomial to linear

So AzureML doesn’t support polynomial regression.  What can we do?

If you think about it 2 minutes, a polynomial regression is polynomial only in terms of the observed data, not the parameters.  A polynomial regression, in one dimension, is

f(x) = a_0 + a_1*x + a_2*x^2 + a_3*x^3 + … + a_n*x^n

which looks like a linear regression with n dimensions in input and one in output.  The input vector would be (x, x^2, …, x^n).  The regression becomes:

f(x) = a_0 + (a_1, a_2, …, a_n) * (x, x^2, …, x^n)

where the multiplication here is a scalar multiplication between two vectors.  Therefore we are back to a linear regression!

The trick is simply to augment the data set for it to contain the square, cube, etc.  of the observed (independent) variable.

Apply SQL Transformation

We could augment the data set directly in the data set but that’s a bit clunky as it pollutes the data set.  Ideally we would do it “dynamically”, i.e. within an experiment.  Enters Apply SQL Transformation.

Let’s start a new experiment and drop the data set we just created on it.  Then, let’s drop a Apply SQL Transformation module (you can search for SQL) and link the two together:

image

Apply SQL Transformation has three entry points but only one is mandatory.  The entry points are like SQL tables you would feed it.  In our case, we only have one data set.

If you select the module you’ll see it takes an SQL expression in parameter:

image

t1 in the SQL expression refers to the first entry point of the module.  You could also use t2 and t3 if you would connect the other entry points.

It is said in the documentation that it understands SQLite.  For our needs the limitation of SQLite vs TSQL won’t be any problem.

We will input this SQL:

SELECT
X,
X*X AS X2,
Y
FROM t1

Basically we do a typical SQL projection using the SQL syntax.  This is quite powerful and can easily replace Project Columns and Metadata Editor modules in one go.

You can run the experiment and then look at the results by right clicking at the bottom of the SQL module.

image

You can see that the column we added is there with square value.

image

Doing the linear regression

We can then continue and drop a linear regression, train model & score model on the experiment and connect them like this:

image

By selecting the train model module, we can click its “Launch column selector” button and select the Y variable.  That is the variable the linear regression will predict.

We can now run the experiment and then look at the result of the Train Model module (not the score model one).

image

This is an interesting result.  AzureML is associating a very weak weight to x^2.  That means it isn’t really using it.  Why?

Well, if you think about it, a polynomial degree 2 is a parabola and a parabola isn’t much better than a line to predict our sample set.  Therefore AzureML reverts to a linear predictor despite we gave it the power to use more degrees of freedom!

It’s important you develop the reflex to read results like this.  In my toy sample, we can visualize it all and deduce it at a glance, but with typical data sets, the dimension is high and you can’t visualize it.

Here AzureML is telling is:  your data is linear!

Over fitting

Remember I talked about over fitting?  That is, the tendency of a learning algorithm to try to fit the sample data perfectly if it can.  This typically happens when the learning algorithm has a lot of parameters and the sample data set has little information, i.e. it’s either small or contain records that do not add information (e.g. once you’ve given two points on a line, giving another thousand on the same line doesn’t add information about the data).

Here my data roughly has 8 tops and bottoms if you will.  So if I would go and have a polynomial degree 9th, we should be able to match the curve more perfectly.

Let’s go back to the Apply SQL Transformation module and change the SQL expression for

SELECT
X,
X*X AS X2,
X*X*X AS X3,
X*X*X*X AS X4,
X*X*X*X*X AS X5,
X*X*X*X*X*X AS X6,
X*X*X*X*X*X*X AS X7,
X*X*X*X*X*X*X*X AS X8,
X*X*X*X*X*X*X*X*X AS X9,
Y
FROM t1

Let’s run the experiment again and look at the result of the Train Model module.

image

Now we see that AzureML used the parameters we made available.  It is normal that the weight value go down since the data goes up (i.e. x^9 >> x for x>1).

Could we visualize the result?

Visualizing

The method of visualizing the prediction of an algorithm I found is quite clunky so if you have a better one, please let me know in the commentary.

Basically, you drop a Writer module that you connect to the output of the Score Model module.

image

Then you can configure the writer to write in a blob container as CSV.  You then take the CSV and paste the last column (the score column) in excel next to the input data.  As I said…   clunky.

Anyway, if you plot that you’ll get:

image

The blue dots are the sample set and the orange dots are the predicted points.  As you see, the learning algorithm is getting closer to the training data.

Is that a good thing?  It depends!  If your data was really linear with noise in it, you are training AzureML to predict noise which rarely is possible nor useful.  That is over fitting.  If your data really has those bumps in it, then there you go.

Summary

It is possible to implement a Polynomial Regression using Linear Regression and Apply SQL Transformation modules.

The latter module is quite powerful and can replace both Project Columns and Metadata Editor modules.  You could even do some data clean up using that module (via a where clause).

SOA vs Mobile APIs

I recently read an article from Bill Appleton of Dream Factory with the provocative title SOA is not a Mobile Backend.

It raised quite a few good points that were in the back of my mind for more than a year.

Basically, what is the difference between SOA and API?

inclusion-229302_640[1]To an extent it is largely the domain of the buzzword department but as you think about it, it is more profound.

SOA really is an Enterprise Creature.  It’s a system integration strategy (despite what SOA purist will tell you).  As Bill mentions in his article, SOA also typically comes with its heavy Enterprise artillery:  Enterprise Service Bus, XML Message Translation, Service Locator, etc.  .  But it also comes with a set of useful practices:  domain knowledge, reusable services, etc.  .

API is an internet beast.  How do you interact with a service on the cloud?  You talk to its API.  API are typically simpler in terms of protocols:  HTTP, REST, JSON, simple messages, etc.  .  They are also messy:  is an API about manipulating a specific entity or doing a consisting set of functionalities?

To me, they spawn from the same principles, i.e. standard interface to exchange information / commands in a loosely couple way between remote components.  SOA is the Enterprise & earlier result of those principles.  API is the internet / mobile later result.

SOA was tried by some big enterprises, forged by comity with expensive consultants and executives trying to solve the next 10 years problem.

Integration_logo[1]API was put forward by a myriad of small companies and consumed by even more entities.  They figured out the simplest way to expose / consume services quickly and learned from each other.  In a few years a few set of practices were observed and documented and standards are even emerging.

Bill, in his article, contrasts the approaches in a way that reminds me of the old SOA debate of top-bottom vs bottom-top approaches, that is, do you discover your services by laying down your business processes and drilling down discover a need for services or by looking at your applications and which services they expose and hope that one day you can reuse them?

There is a lot of that in the issues brought by Bill around APIs.  Like in SOA if you just spawn new APIs ‘On demand’, you’ll end up with a weird mosaic with overlapping concepts or functionalities.  I agree that practices developed from SOA can definitely helped.  Service Taxonomy, for instance, forces you to think of how your services will align and where their boundaries will be drawn before you start.

But for an organization, I believe it is nearly a forced therapy to implement one or two APIs, experiment them in full operation before you can start having serious discussion around other SOA aspects.  Once you’ve tried it, you can have a much more informed discussion about what changes and at which in a service (while discussing versioning), what type of security rules make sense and a bunch of other aspects.

Otherwise you fall victim of the good old analysis paralysis and will host meetings after meetings for deciding about something because everyone has a different a priori perspective on it.

 

So my suggestion is yes, API are a little messier, but experimenting with them, even if you end up with a lot of rework, will bring much value to your organization.  So go, and create simple API, expose them and operate them!