Digest: DocumentDB Resource Model and Concepts


Azure DocumentDB has release a few weeks ago and with it an early, in small quantity, of good quality documentation.

One of those article is DocumentDB Resource Model and Concepts. That article goes through the different concepts of the inner model of DocumentDB.

That article sheds some light on the product but also reveal the extent of the announced features and their limitations. It’s definitely recommended reading if you want to understand the product.

Here I’m gona focus on a few key points I found important.

For starter, here is the concept map of DocumentDB.

Self Links

You notice the partial uris under each box in the diagram (e.g. /dbs/{id}, /users/{id}, etc.)? Those are part of the self link.

Each object in DocumentDB is addressable via a self link, a URI. For instance, from an account base URI you can reach the stored procedure MyProc in the collection MyCollection in the database MyDatabase by the uri <account base URI>/dbs/MyDatabase/colls/MyCollection/sprocs/MyProc.

This is of course a reflection of the fact that DocumentDB exposes a REST API. The SDK reflects the REST API faithfully so you do not need to traverse the object model to get to a sproc, you can get directly to it by its self link.

Capacity Units: Account

We configure the number of capacity units (that come with CPUs and storage, basically, VMs) at the account level.

So why would you configure multiple accounts in an architecture? To isolate capacity between workloads. For instance, if you have two workloads requiring a lot of torque that shouldn’t interfere with each other, put them in two different DocumentDB accounts.

Scaling Unit: Collection

DocumentDB collection is a scaling unit. It is the ultimate transaction border: a transaction can’t span two collections (left alone two databases). It is also the one with the size limit: 10GB in the preview.

We can guess (although it isn’t explicitly stated as such) that a collection is contained within one VM only, hence it’s capacity to hold a transaction efficiently and its finite size. Collections are likely replicated across capacity units but one replica of a collection can’t span two capacity units.

Hence if you want more storage, add collections… and start managing partitions yourself unfortunately.

And there comes my first product request: collections with eventual consistency policy shouldn’t have size limit and the sharding should be managed by DocumentDB itself (hidden from the consumer).

SSD backed Document Storage

It is mentioned at a few places the storage is backed by Solid State Drive (SSD). There is no mention of tiering so does it mean the entire DB is stored on SSD?

Automatic but configurable indexing

You do not need to hint DocumentDB at how to construct its indexes. It figures it out by optimizing your query plans.

One thing you can do though is set indexing policies. For instance you could tell DocumentDB it’s alright to update its indexes on a collection asynchronously, hence boosting write performances.

Features like this make DocumentDB look quite sophisticated for a V1 product.

Javascript as the language

Yes Javascript is a popular language these days. But in the case of DocumentDB it serves another purpose than following fashion.

Its documents are made of JSON, which is basically Javascript objects. Hence Javascript is the natural language to manipulate those objects, removing any mismatch between the data and the language manipulating it. Compare this with C# for instance, all JSON objects manipulations would have meant string manipulation.

Attachments

Unclear in the initial brochure, DocumentDB can store more than just JSON. It can attach Binary Large Objects (blobs) to the documents. The document then act as metadata to the attachment.

Users… more roles than users

Users in DocumentDB are aggregation of permissions. As a user you do not authenticate per se against DocumentDB. Hence the concept is more akin to role.

 

Conclusion

DocumentDB is a quite complete and elegant product. Despite being in preview mode and being a “0.9” version, it feels —by its feature set— like a strong v1 or even v2.

I hope this digest gave you a few pointers.

Advertisements

5 thoughts on “Digest: DocumentDB Resource Model and Concepts

  1. Pingback: NoSQL implementation concepts | Vincent-Philippe Lauzon's blog

  2. Pingback: DocumentDB Studio | Vincent-Philippe Lauzon's blog

  3. Pingback: Creating an Azure DocumentDB account | Vincent-Philippe Lauzon's blog

  4. Pingback: Querying Collections with DocumentDB Studio | Vincent-Philippe Lauzon's blog

  5. Pingback: DocumentDB Async Querying & Streaming | Vincent-Philippe Lauzon's blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s