Tag Archives: NoSQL

Not Only SQL (NoSQL) ; Azure DocumentDB, HDInsight HBase & Table Storage

Invoking a Stored Procedure from a partitioned CosmosDB collection from Logic Apps

I struggled a little to make that work, so I thought I would share the learning in order to accelerate your future endeavour.

I was looking at a way to populate a CosmosDB quickly with random data.

Stored Procedures came to mind since they would skip client-server latency.  We can call a stored procedure creating hundreds of documents with random data.

Each Stored Procedure runs in a partition, so we need something external to the stored procedure to loop and decide of the partition key.

Enter Logic Apps:  cheap to run and quick to setup.

Stored Procedure

Something important to realize is that some portal features aren’t supported when we deal with a partitioned collection.

One of them is to update the content of a stored procedure (same thing for triggers).  We therefore need to delete it and re-create it.

Here is the stored procedure we used:

function createRecords(recordCount) {
    var context = getContext();
    var collection = context.getCollection();
    var createdIds = [];

    for (i = 0; i < recordCount; i++) {
        var documentToCreate = { part: "abc", name: "sample" + i };
        var accepted = collection.createDocument(
            collection.getSelfLink(),
            documentToCreate,
            function (err, documentCreated) {
                if (err) {
                    throw new Error('Error' + err.message);
                }
                else {
                    createdIds.push(documentCreated.id);
                }
            });

        if (!accepted)
            return;
    }

    context.getResponse().setBody(createdIds)
}

We take the number of documents to create in parameter, loop & create documents.  We return the document IDs in a list in the output.

The documents we create are trivial:  no random data.

Logic App

On the canvas, let’s type Cosmos in the search box for actions.

image

Let’s choose Execute stored procedure.

We are prompted to create a new Cosmos DB connection.  We need to:

  • Type a name for the connection (purely for readability, can be anything)
  • Select an existing Cosmos DB collection

We can then pick the database ID, the collection ID & the stored procedure ID.

image

Stored Procedure parameters are expressed as a JSON array.  For instance here, we want to pass 1000 as the recordCount parameter, so we type [1000]:  no parameter name and always square brackets.

If we would run the app now we would get an error stating the operation requires the partition key.

In order to set the partition key, we need to Show advanced options.

image

In order to specify the partition key value, we simply type its value:  no square bracket, no quotes.

Now we can run the Logic App and it should execute the stored procedure and get its output in the action’s output.

Summary

Invoking a Cosmos DB stored procedure from Logic App isn’t rocket science but there are a few items to get straight in order for it to work properly.

Advertisements

Hacking: changing Cosmos DB Portal experience from Graph to SQL

In the last article, we looked at how we could access a graph using the SQL (aka DocumentDB) API.

Here we’ll explore how we can switch the Portal experience from one to the other.

Portal Experience

The Portal Experience refers to the way the portal lets us interact with Cosmos DB Data.  It’s basically the Data Explorer experience.

Here we have the Cosmos DB Graph experience:

image

The Data Explorer lets us access the Graph using Gremlin and displays results in a Graph UI experience (i.e. showing vertices & edges).

Let’s compare this to the Cosmos DB SQL (aka DocumentDB) experience:

image

Here we query collections using SQL queries and results are shown as JSON documents.

CosmosDB in ARM

The schema for JSON ARM template of CosmosDB Database Account is documented here.

There are two important properties for Cosmos DB model (i.e. SQL, Graph, Table or MongoDB):  kind and defaultExperience (on fourth and seventh line respectively).

{
  "apiVersion": "2015-04-08",
  "type": "Microsoft.DocumentDB/databaseAccounts",
  "kind": "[parameters('kind')]",
  "name": "[parameters('databaseAccountName')]",
  "tags": {
    "defaultExperience": "[parameters('experience')]"
  },
  "location": "[resourceGroup().location]",
  "properties": {
    "name": "[parameters('databaseAccountName')]",
    "databaseAccountOfferType": "[variables('offerType')]",
    "consistencyPolicy": {
      "defaultConsistencyLevel": "[parameters('consistencyLevel')]",
      "maxStalenessPrefix": "[parameters('maxStalenessPrefix')]",
      "maxIntervalInSeconds": "[parameters('maxIntervalInSeconds')]"
    }
  }
}

Kind takes the following values:  GlobalDocumentDB, MongoDB & Parse.  It defines how the database engine is configured.  This property must be supplied at creation time and can’t be changed after.

DefaultExperience takes the following values:  DocumentDB, MongoDB,
Graph & Table.  It influences only how the portal behaves.  This property is optional and can be changed in any update deployments.

When creating a Cosmos DB account in the Portal, here is the mapping of the values.  The left-hand side column API refers to the drop down value selected in the portal at the account creation.

API Kind Default Experience
SQL (DocumentDB) GlobalDocumentDB DocumentDB
MongoDB MongoDB MongoDB
Gremlin (graph) GlobalDocumentDB Graph
Table (key-value) GlobalDocumentDB Table

We notice the Kind value Parse isn’t yet used with any model.  It is used for the Parse Server offering.

Changing the experience

With all that said, we can easily change the default experience from one ARM Deployment to another.  Template is available in GitHub.

Also, since the experience is a simple tag, it can be changed using PowerShell or even the Portal.

image

Summary

Although the fundamental database engine is set at the creation of the account, the portal experience can be changed.

Therefore, if it is convenient to change the experience in order to execute some tasks, it is possible to do so without impacting the underlying database.

Hacking: accessing a graph in Cosmos DB with SQL / DocumentDB API

pexels-photo-264635[1]Azure Cosmos DB is Microsoft’s globally distributed multi-model database service.

At this point in time (August 2017) there are four supported models:  DocumentDB (also named SQL because the query language is similar to T-SQL), MongoDB, Tabular & Gremlin.

We’ve seen how to use Cosmos DB with Gremlin in a past article.

Now here’s a little secret:  although we choose the “model” (e.g. Gremlin) at the Cosmos DB account level, we can use other models to query the data.

Not all combination are possible, but many are.  Specifically, we can query a Gremlin graph using DocumentDB / SQL query language.

The graph is then projected into documents.

We will explore that in this article.

Why is that interesting?  Because there are a lot of tools out there we might be familiar with to manipulate DocumentDB (or MongoDB).  Having to possibility to look at a Graph with other APIs extends our toolset from Gremlin-based ones.

Creating a simple graph in Gremlin

Let’s create a simple graph in a Cosmos DB using Gremlin.  In a past article we’ve looked at how to setup Gremlin with Cosmos DB.


gremlin> :remote connect tinkerpop.server conf/remote-secure.yaml

gremlin> :> g.addV('person').property('id', 'Alice').property('age', 42).property('department', 'stereotype')

gremlin> :> g.addV('person').property('id', 'Bob').property('age', 24).property('department', 'support character')

gremlin> :> g.V('Alice').addE('communicatesWith').property('id', 'AliceToBob').property('language', 'English').to(g.V('Bob'))

The first line is there to connect to the remote server we configured in remote-secure.yaml.  For details see the setup article.

We now have a toy graph with two vertices connected with one edge.  Nothing too fancy but that will be enough for our purpose.

image

We can note the following:

  • We provided the ids of objects ; this isn’t always possible in graph databases but is possible with Cosmos DB (if we don’t provide it, a randomly generated GUID is automatically provisioned)
  • We did provide a custom property (i.e. language) on the edge
  • The graph partition key is department hence we provided it for each vertex

Document Query

The code is available on GitHub, more specifically in the Program.cs file.

Here we build on the code from the Cosmos DB async streaming article.  We simply read all the documents in the graph with DocumentDB API and output them in JSON format:


private async static Task ListAllDocumentsAsync(
    DocumentClient client,
    Uri collectionUri)
{
    var query = client.CreateDocumentQuery(
        collectionUri,
        new FeedOptions
        {
            EnableCrossPartitionQuery = true
        });
    var queryAll = query.AsDocumentQuery();
    var all = await GetAllResultsAsync(queryAll);

    Console.WriteLine($"Collection contains {all.Length} documents:");

    foreach (var d in all)
    {
        var json = GetJson(d);

        if (d.Id == "CarolToAlice")
        {
            await client.DeleteDocumentAsync(
                d.SelfLink,
                new RequestOptions
                {
                    PartitionKey = new PartitionKey(d.GetPropertyValue<string>("department"))
                });
        }

        Console.WriteLine(json);
    }

    Console.WriteLine();
}

The output should be the following:


{
   "id": "Bob",
   "_rid": "smp9AKyqeQADAAAAAAAABA==",
   "_self": "dbs/smp9AA==/colls/smp9AKyqeQA=/docs/smp9AKyqeQADAAAAAAAABA==/",
   "_ts": 1504096168,
   "_etag": "\"00001c04-0000-0000-0000-59a6afad0000\"",
   "label": "person",
   "age": [
     {
       "_value": 24,
       "id": "88a659bf-84d1-4c13-8450-ee57b426b7b3"
     }
   ],
   "department": "support character"
}
 {
   "id": "Alice",
   "_rid": "smp9AKyqeQAKAAAAAAAABg==",
   "_self": "dbs/smp9AA==/colls/smp9AKyqeQA=/docs/smp9AKyqeQAKAAAAAAAABg==/",
   "_ts": 1504096164,
   "_etag": "\"0000ed09-0000-0000-0000-59a6afa60000\"",
   "label": "person",
   "age": [
     {
       "_value": 42,
       "id": "78109dc8-587f-4d87-9d2e-e4a1731dec2b"
     }
   ],
   "department": "stereotype"
 }
 {
   "id": "AliceToBob",
   "_rid": "smp9AKyqeQALAAAAAAAABg==",
   "_self": "dbs/smp9AA==/colls/smp9AKyqeQA=/docs/smp9AKyqeQALAAAAAAAABg==/",
   "_ts": 1504096178,
   "_etag": "\"0000ee09-0000-0000-0000-59a6afb40000\"",
   "label": "communicatesWith",
   "language": "English",
   "_sink": "Bob",
   "_sinkLabel": "person",
   "_sinkPartition": "support character",
   "_vertexId": "Alice",
   "_vertexLabel": "person",
   "_isEdge": true,
   "department": "stereotype"
 }

We can learn a lot from this projection:

  • Vertices are pretty close to simple DocumentDB document ; the properties starting with an underscore (_) are our usual DocumentDB metadata (e.g. _self)
  • Vertex Properties (e.g. age) are represented as an array of complex sub structures (_value and an id) ; this is because in Gremlin a vertex’ (or edge’s) properties can have multiple values
  • Edges are more complex
    • A metadata property _isEdge seems to be the discriminator between a vertex and an edge
    • _vertexId & _vertexLabel identify the “source” of the edge (the starting point)
    • _sink, _sinkLabel & _sinkPartition identify the “target” of the edge (the destination point)
    • The partition of the edge is the same as the “source” vertex, even if we didn’t specify it in Gremlin
    • The custom property language is a flat property, not a complex one with arrays as in the vertices

Given that information, we can easily write queries, for instance, to list only vertices:


private class MinimalDoc
{
    public string id { get; set; }
    public bool? _isEdge { get; set; }
}

private async static Task ListOnlyVerticesAsync(
    DocumentClient client,
    Uri collectionUri)
{
    var query = client.CreateDocumentQuery<MinimalDoc>(
        collectionUri,
        new FeedOptions
        {
            EnableCrossPartitionQuery = true
        });
    var queryVertex = (from d in query
                        where !d._isEdge.HasValue
                        select d).AsDocumentQuery();
    var all = await GetAllResultsAsync(queryVertex);

    Console.WriteLine($"Collection contains {all.Length} documents:");

    foreach (var d in all)
    {
        Console.WriteLine(d.id);
    }

    Console.WriteLine();
}

This should list Alice & Bob but not the edge between them.

Can we write?

Querying is all nice and good, but what about writing?

Let’s try to simply add a document in the graph:


private async static Task AddTrivialVertexAsync(
    DocumentClient client,
    Uri collectionUri)
{
    var response = await client.CreateDocumentAsync(
        collectionUri,
        new
        {
            id = "Carol",
            label = "person",
            department = "support character"
        });
    var json = GetJson(response.Resource);

    Console.WriteLine(json);
}

If we use the Gremlin Console to look at it:


gremlin> :> g.V("Carol")

==>[id:Carol,label:person,type:vertex,properties:[department:[[id:Carol|department,value:support character]]]]

Hence we see the new document as a vertex.  That makes sense since we’ve seen that vertices are projected as simple documents.

If we add other simple properties (like we did with label) this will not work.  Those properties won’t show up in Gremlin.  That is because, as we’ve seen, in Gremlin, properties are always collections.  We can do that:


private async static Task AddVertexWithPropertiesAsync(
    DocumentClient client,
    Uri collectionUri)
{
    var response = await client.CreateDocumentAsync(
        collectionUri,
        new
        {
            id = "David",
            label = "person",
            age = new[] {
                new
                {
                    id = Guid.NewGuid().ToString(),
                    _value = 48
                }
            },
            department = "support character"
        });
    var json = GetJson(response.Resource);

    Console.WriteLine(json);
}

and in Gremlin:


gremlin> :> g.V("David").valueMap()

==>[age:[48],department:[support character]]

So it appears we can successfully write vertices in a graph using the DocumentDB API.

This is obviously useful to mass import graphs since there are a lot of tools out there that can import into DocumentDB.

Writing an edge

We can write vertices.  That is only half the equation for importing data in a graph.  What about edges?

It turns out we simply have to mimic what we’ve seen with existing edges:


private static async Task AddEdgeAsync(DocumentClient client, Uri collectionUri)
{
    var response = await client.CreateDocumentAsync(
        collectionUri,
        new
        {
            _isEdge = true,
            id = "CarolToAlice",
            label = "eavesdropOn",
            language = "English",
            department = "support character",
            _vertexId = "Carol",
            _vertexLabel = "person",
            _sink = "Alice",
            _sinkLabel = "person",
            _sinkPartition = "stereotype"
        });
    var json = GetJson(response.Resource);

    Console.WriteLine(json);
}

It is important for the edge’s partition to be the same as the source vertex, otherwise the edge won’t be seen by Gremlin.

We can validate the edge is now present in Gremlin:


gremlin> :> g.E()

==>[id:CarolToAlice,label:eavesdropOn,type:edge,inVLabel:person,outVLabel:person,inV:Alice,outV:Carol,properties:[language:English]]
 ==>[id:AliceToBob,label:communicatesWith,type:edge,inVLabel:person,outVLabel:person,inV:Bob,outV:Alice,properties:[language:English]]

gremlin> :> g.V("Carol").out("eavesdropOn")

==>[id:Alice,label:person,type:vertex,properties:[age:[[id:78109dc8-587f-4d87-9d2e-e4a1731dec2b,value:42]],department:[[id:Alice|department,value:stereotype]]]]

Summary

We’ve seen it is possible to both read and write to a Cosmos DB graph using the DocumentDB API.

It would also be possible to do so using the MongoDB API.

An obvious use is to leverage DocumentDB (or MongoDB) tools to manipulate a graph, e.g. for an initial load.

Cosmos DB Async Querying & Streaming

pexels-photo-223022[1]I wrote an article back in January 2015 about async querying Azure DocumentDB using the .NET SDK.

The service was still in preview back then.

Since then DocumentDB has been superseded by Azure Cosmos DB and the SDK has changed a bit so I thought I would rewrite that article.  Here it is.

LINQ was built before async into .NET / C#.  That is probably the #1 reason why doing LINQ queries on asynchronously fetched data source is so awkward today.  This will likely change one day but until then…

Why Async?

Before we dive in the solution, let’s see why we would want to implement asynchrony in querying.

This was true in 2015 and I hope it is less so today:  a lot of people do not understand why asynchrony is for in .NET.  I always think it’s worthwhile to discuss it.

Let’s try the reverse psychology approach. Here is what asynchrony doesn’t bring us:

  • It doesn’t make our client (e.g. browser) asynchronous ; for instance, if we implement it in a service call, it doesn’t make the caller asynchronous (e.g. Ajax)
  • It doesn’t bring us performance per se
  • It doesn’t make our code run on multiple threads at once

Asynchrony allows us to… SCALE our server code. It allows you to multiplex your server, to serve more concurrent requests at the same time. If we do not have scaling issues, we might not need asynchrony.

The reason it allows us to scale is that when we async / await on an I/O call (e.g. a Cosmos DB remote call), it frees the current thread to be used by another request until the call comes back, allowing us to serve more requests with less threads and memory.

Solution

The code is available on GitHub, more specifically in the Program.cs file.

The important part is to recognize that the query object (IDocumentQuery<T>) from the SDK is an asynchronous interface.  It fetches new results in batches.  So we can write a method to fetch all the results like this one:

private async static Task<T[]> GetAllResultsAsync<T>(IDocumentQuery<T> queryAll)
{
    var list = new List<T>();

    while (queryAll.HasMoreResults)
    {
        var docs = await queryAll.ExecuteNextAsync<T>();

        foreach (var d in docs)
        {
            list.Add(d);
        }
    }

    return list.ToArray();
}

Or one that allows us to process all the items in the query with an action:

private async static Task<int> ProcessAllResultsAsync<T>(
    IDocumentQuery<T> queryAll,
    Action<T> action)
{
    int count = 0;

    while (queryAll.HasMoreResults)
    {
        var docs = await queryAll.ExecuteNextAsync<T>();

        foreach (var d in docs)
        {
            action(d);
            ++count;
        }
    }

    return count;
}

We can create a query object with no fancy LINQ expression, i.e. basically querying the entire collection, like this:

var client = new DocumentClient(new Uri(SERVICE_ENDPOINT), AUTH_KEY);
var collectionUri = UriFactory.CreateDocumentCollectionUri(DATABASE, COLLECTION);
var query = client.CreateDocumentQuery(
    collectionUri,
    new FeedOptions
    {
        EnableCrossPartitionQuery = true
    });
var queryAll = query.AsDocumentQuery();

That code basically queries the entire collection and return an array of Document object.

We could also serialize into a custom object and filter the query:

var query = client.CreateDocumentQuery<MinimalDoc>(
    collectionUri,
    new FeedOptions
    {
        EnableCrossPartitionQuery = true
    });
var queryNoDog = (from d in query
                    where d.id != "Dog"
                    select d).AsDocumentQuery();

In the code sample there are 4 examples using different variations.

Summary

Asynchrony is a powerful to scale service-side code.

Cosmos DB allows us to do that in an easy way as was demonstrated in this article.

Cosmos DB & Graph with Gremlin – Getting Started

gremlin-apache[1]Azure Cosmos DB is Microsoft’s globally distributed multi-model database service.

One of the paradigm it supports is Graph:  Cosmos DB can be used to store and query graphs.

At the time of this writing, it supports one interface, Gremlin, which is part of the Apache TinkerPop project.

This means we can use any Gremlin Console to connect to a Cosmos DB graph.

That is well documented.  I won’t reproduce the steps here.  Instead, I’m going to point to documentation.

Understanding Gremlin

First thing, let’s understand Gremlin.  Gremlin is to graph data what SQL is to relational data ; it is a graph traversal language.  Except the debate hasn’t fully settled in the graph world and Gremlin has meaningful competition (e.g. Cypher).

TinkerPop project site contains a very good documentation for getting started with Gremlin.  Their sales pitch is “learn it in 30 minutes” and it’s pretty accurate.

Once we’ve absorbed that, we can go deeper with the online exhaustive documentation.

Gremlin with Cosmos DB

cosmos-db[1]Azure documentation has a good guide to both create a Cosmos DB graph and connect to it with a Gremlin Console.

We can download the Gremlin Console from the Tinkerpop’s site.  It contains both Windows & Unix consoles.

Personally, I’ve installed it in the Linux subsystem on Windows 10 (when in Rome…).

Only trick is, that isn’t a app-get package and we need Java 1.8 to run the files.  See Oracle’s instruction to install it properly.  There seems to have been a split between version 1.7 and 1.8 and the package for 1.7 doesn’t upgrade to 1.8.

Using Gremlin on Cosmos DB

It is pretty straightforward by following the instructions.

Only counterintuitive aspect is that we need to prefix every Gremlin command with :> in order to access Cosmos DB (or any remote service in general from within Gremlin Console).

Summary

Cosmos DB supports Gremlin as an interface to command & query its graphs.

This article was meant to simply list the links to quickly get started in that scenario.

DocumentDB protocol support for MongoDB

pexels-photo-91413Microsoft announced, in the wake of many DocumentDB announcement, that DocumentDB would support MongoDB protocol.

What does that mean?

It means you can now swap a DocumentDB for a MongoDB and the client (e.g. your web application) will work the same.

This is huge.

It is huge because Azure, and the cloud in general, have few Databases as a Service.

Azure has SQL Database, SQL Data Warehouse, Redis Cache, Search & DocumentDB.  You could argue that Azure Storage (Blob, Table, queues & files) is also one.  HBase under HDInsight could be another.  Data Lake Store & Data Lake Analytics too.

Still, compare that to any list of the main players in NoSQL and less than 10 services isn’t much.  For all the other options, you need to build it on VMs.  Since those are database workloads, optimizing their performance can be tricky.

MongoDB is a leader in the document-oriented NoSQL databases space.

With the recent announcement, this means all MongoDB clients can potentially / eventually run on Azure with much less effort.

And this is why this is a huge news.

A different account

For the time being, DocumentDB supports MongoDB through a different type of DocumentDB account.

You need to create your DocumentDB account as a DocumentDB – Protocol Support for MongoDB.

You’ll notice the portal interface is different for such accounts.

You can then access those accounts using familiar MongoDB tool such as MongoChef.

But you can still use DocumentDB tools to access your account too.

Summary

In a way you could say that Azure now has MongoDB as a Service.

A big caveat is that the protocol surface supported isn’t %100.  CRUDs are supported and the rest is prioritized and worked on.

Yet, the data story in Azure keep growing.

UPDATETo get started, check out:

 

Azure DocumentDB Demo

December the 1st, 2015, I’m doing a presentation to a Montreal User Group, MS DEV MTL. Here is the script of each demo.  Enjoy!

UPDATE:  You can see the presentation slides here.

 

Account Creation & Adding Documents

For the creation of an Azure DocumentDB account, allow me to refer to myself in Creating an Azure DocumentDB account.

In order to add a database, in your DocumentDB account blade, click “Add Database”, name it Demo-DB.

Select that database ; that will open the database blade. Click “Add Collection”, name it demo. Change the price tier to S1.

Select the collection you just created. That will open the collection blade. We are going to create two documents. For that, click “Create Document” on top of the collection blade. First document:

{
"firstName" : "Vincent-Philippe",
"lastName" : "Lauzon",
"office" : "MTL"
}

Second document:

{
"office" : "MTL",
"address" :
{
"streetNumber" : 2000,
"streetName" : "McGill College",
"streetType" : "Avenue",
"inBuilding" : "Suite 500",
"postalCode" : "H3A 3H3"
}
}

Now, let’s look at those document within the collection. In the collection blade, click “Document Explorer” (at the bottom). You will notice a few things:

  • Both documents were added an id property containing a generated GUID
  • Both documents didn’t have the same schema
  • JavaScript types string and integer were used

Let’s add a third document:

{
"firstName" : "John",
"lastName" : "Smith",
"office" : "Calgary",
"id" : "emp-john-smith",
"phoneNumber" : "123-456-7890"
}

You can go ahead and look at the document and observe that:

  • We manually inserted the id of the document here ; DocumentDB used the id
  • The schema was slightly different that the other employee

Simple Querying

For querying, in the collection blade, click “Query Explorer”. Leave the query as is, i.e.

 

SELECT * FROM c

 

Let’s observe a few things:

  • In the query, c stands for the collection. It is a variable name: you can replace c by whatever literal you fancy
  • The result is a JSON array containing the original documents in each
  • The documents have more “metadata”, i.e. properties starting with _, such as _ts, the timestamp

Let’s try something slightly less trivial:

 

SELECT *
FROM c
WHERE c.firstName != null

 

Now we have only the employees, i.e. we skipped the MTL office document.

The following query does a projection or a JSON transformation:

 

SELECT
{"firstName":c.firstName, "lastName":c.lastName} AS name,
c.office
FROM c
WHERE c.firstName!=null

 

This yields the following results:

[
{
 "name": {
 "firstName": "Vincent-Philippe",
 "lastName": "Lauzon"
 },
 "office": "MTL"
 },
 {
 "name": {
 "firstName": "John",
 "lastName": "Smith"
 },
 "office": "Calgary"
 }
]

 

This demonstrates how DocumentDB merges the power of T-SQL with the JavaScript language seamlessly.

To explore more about querying, go to the querying playground where you can explore interactively (web browser).

Indexing Policy

To look at the current indexing policy of a collection, in the collection blade, click “Indexing Policy”. Typically, you’ll see the following:

 

{
 "indexingMode": "consistent",
 "automatic": true,
 "includedPaths": [
 {
 "path": "/*",
 "indexes": [
 {
 "kind": "Range",
 "dataType": "Number",
 "precision": -1
 },
 {
 "kind": "Hash",
 "dataType": "String",
 "precision": 3
 },
 {
 "kind": "Spatial",
 "dataType": "Point"
 }
 ]
 },
 {
 "path": "/\"_ts\"/?",
 "indexes": [
 {
 "kind": "Range",
 "dataType": "Number",
 "precision": -1
 },
 {
 "kind": "Hash",
 "dataType": "String",
 "precision": 3
 }
 ]
 }
 ],
 "excludedPaths": []
}

 

where you can observe

  • Indexing is consistent (done synchronously with changes)
  • Indexing is automatic
  • Includes all properties
  • Numbers have range indexes, strings hashes and point spatial
  • Timestamp are both range & hash
  • No paths are excluded

Looking at consistency level

Go in you DocumentDB account blade, at the bottom, in “Configuration”, click “Default consistency”.

You can actually see the definitions of each level in the portal.

SDK Demo

Start up a new Console App project. Get the NuGet package Microsoft.Azure.DocumentDB.

Everything orbits around the DocumentClient component. To instantiate one, you need information from your DocumentDB account. In the account blade, click the key icon.

You’ll need:

  • URI (serviceEndPoint in the SDK)
  • Primary key (authKey in the SDK)

In the code, simply instantiate it as:

private static readonly DocumentClient _docClient = new DocumentClient(
new Uri(ENDPOINT),
AUTH_KEY,
ConnectionPolicy.Default,
ConsistencyLevel.Session);

Here you see that you can override the connection policy (see this post for details) and the consistency level for the connection.

The rest of the code will use the method “QueryAsync” defined in this post.

First, let’s find our collection, in purely scalable way:

 

private async static Task<DocumentCollection> GetCollectionAsync()
 {
 var dbQuery = from db in _docClient.CreateDatabaseQuery()
 where db.Id == DB_NAME
 select db;
 var database = (await QueryAsync(dbQuery)).FirstOrDefault();
 var collectionQuery = from col in _docClient.CreateDocumentCollectionQuery(database.AltLink)
 where col.Id == COLLECTION_NAME
 select col;
 var collection = (await QueryAsync(collectionQuery)).FirstOrDefault();
 return collection;
 }

 

What we do here is basically search our database among databases within the account by querying the database list, then do the same thing with collection.
The interesting points to notice here is that we do everything async, including querying. There is nothing blocking here.
Let’s define an employee object, a PONO:
public class Employee
 {
 [JsonProperty("id")]
 public string ID { get; set; }

 [JsonProperty("firstName")]
 public string FirstName { get; set; }

 [JsonProperty("lastName")]
 public string LastName { get; set; }

 [JsonProperty("office")]
 public string Office { get; set; }

 [JsonProperty("phoneNumber")]
 public string PhoneNumber { get; set; }
 }
Here we use attributes to map property names to bridge the gap of JavaScript and C# in terms of naming convention, i.e. the fact that JavaScript typically starts with lowercase while C# starts with uppercase. Other approach could have been used.
Let’s define a method to find me:

 

private async static Task<Employee> QueryVinceAsync(DocumentCollection collection)
 {
 var employees = from e in _docClient.CreateDocumentQuery<Employee>(collection.SelfLink)
 where e.FirstName == "Vincent-Philippe"
 select e;
 var vincent = (await QueryAsync(employees)).FirstOrDefault();
 return vincent;
 }

 

Here, we again do a query, this time on documents within a collection. We strong type the query for employee’s type. That doesn’t filter out non-employees though. The filter on the query does that: it searches for document having a property firstName being equaled to Vincent-Philippe. Document without such a property obviously fail that filter.
Then we can look at the code of the demo:

 

private static async Task DemoAsync()
 {
 var collection = await GetCollectionAsync();
 var vincent = await QueryVinceAsync(collection);
 var newEmployee = new Employee
 {
 FirstName = "Jessica",
 LastName = "Jones",
 Office = "Hell's Kitchen",
 PhoneNumber = "Unknown"
 };
 var newEmployeeResponse =
 await _docClient.CreateDocumentAsync(collection.SelfLink, newEmployee);
 // ID of the created employee document
 Console.WriteLine(newEmployeeResponse.Resource.Id);
 }

 

Interesting point here is the return type of document creation method. Since the SDK is a thin wrapper around REST calls, the return type returns all the stuff returned by the REST call. Of interest: newEmployeeResponse.RequestCharge. This is 6.1 and this is in Request Units (RUs). This helps you figure out the pricing tier you should look after.