SQL Azure Federation: Automating your database partitioning (sharding)


Microsoft is about to introduce the federation concept into SQL Azure.  You can read about it on Microsoft SQL Azure Program Manager Cihan Biyikoglu’s blog:

The feature addresses scale-out scenarios for databases in the cloud.  For context, when you want to increase the storage capacity of a system, you can either scale-up (buy a bigger database server connected on a bigger SAN) or you can scale-out (buy another cheap database server and put it next to the old one).

There are many pros & cons for both approach.  Basically with scale-up:

  • You increase capacity without changing much on the applications:  they still connect to one server.
  • The cost structure is geometric and passed a limit there’s just no hardware available.

With scale-out:

  • Depending on how you do it, you can theoretically scale forever and cheaply.
  • Your applications need to change:  they need to know which server to interrogate depending on what data they are manipulating.

In the cloud in general as with Microsoft Cloud offering specifically, the philosophy is scale-out:  massive amount of commodity hardware.

With Microsoft offering we had two scenarios for storage so far:

  • Azure Storage (Table, Queue & Blobs), 100 Tb of capacity limit.
  • SQL Azure, 50 GB of capacity limit.

The former is build for scale but is a new paradigm to use and in all fairness, it doesn’t have the maturity of the SQL Server, it offers basic storage operations.

SQL Azure is easy to use, it’s basically using SQL Server in the cloud if you forget a couple of caveat (such as transaction management & connection stability).  But SQL Azure is your good old relational database and hence is scale-up in nature:  you start at 1GB with the cheapest web edition and end at 50GB with the most expensive business edition.

SQL Azure Federation blurs the cards a little.  It offers a scale-out option to SQL Azure.

Now, don’t get too hyped up:  it’s not like a federation switch you turn on and get infinite scale automatically.  You still have to change your application.  But it’s a huge improvement from managing partitioning yourself.

Let’s look at what it takes to scale-out a database doing horizontal partitioning (or sharding), which is what SQL Azure Federation manages for you.

imageBasically, we partition the data in different partition database, according to a partition key.  A typical example is to take a customer table and partition it by region (regionID would be a partition key) so that all the customers of a given region are in the same database.

Now we need a partition directory in order to map a partition key to a partition database.  Using the same customer example, if we only need 3 partition database, each database will contain more than one region, so we need a mechanism to map the partition key to the partition database.

Complexity emerges when you want to add, remove or rebalance partitions.  You need to shove data from one partition-DB to another and keep the partition directory up-to-date in order to maintain the data integrity of your system.

Partition!

You can do that today and actually, this was Microsoft recommendation with SQL Azure until the introduction of SQL Azure Federation.  What does that product / feature brings to the table?  Basically, it manages the partition:  you specify the partition key range for a given partition and SQL Azure Federation will create a partition DB, move data into it and keep the directory up-to-date.  What it doesn’t do?  It doesn’t determine the partition key, you have to specify them.  It doesn’t determined to partitions (partition key ranges) by itself, you have to do that.  Finally, when you want to do SQL commands, you first have to issue a «USE FEDERATION» command where you basically tell which federation key to use, which routes you (via a partition directory) to the right DB.

So not only you need to do maintenance on the partitions by monitoring the partition DB size but you also need to modify your application to issue this partition-routing command before each query.  A side effect of those constraints is that you can’t do a query spanning more than one partition key, which can mean complexity in your application.

On the other hand, SQL Azure Federation takes care of all the data movement and partition directory for you.  This is no small task.

So if you started to think about how you would manage your sharding strategy by yourself, SQL Azure Federation is a life saver.  If you were wondering how you could have a more than 50GB SQL Azure DB, then SQL Azure Federation must sound pretty complicated to you!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s