Hacking: accessing a graph in Cosmos DB with SQL / DocumentDB API

pexels-photo-264635[1]Azure Cosmos DB is Microsoft’s globally distributed multi-model database service.

At this point in time (August 2017) there are four supported models:  DocumentDB (also named SQL because the query language is similar to T-SQL), MongoDB, Tabular & Gremlin.

We’ve seen how to use Cosmos DB with Gremlin in a past article.

Now here’s a little secret:  although we choose the “model” (e.g. Gremlin) at the Cosmos DB account level, we can use other models to query the data.

Not all combination are possible, but many are.  Specifically, we can query a Gremlin graph using DocumentDB / SQL query language.

The graph is then projected into documents.

We will explore that in this article.

Why is that interesting?  Because there are a lot of tools out there we might be familiar with to manipulate DocumentDB (or MongoDB).  Having to possibility to look at a Graph with other APIs extends our toolset from Gremlin-based ones.

UPDATE (01-11-2017):  A deeper discussion about changing the underlying API can be found in the article Hacking: accessing a graph in Cosmos DB with SQL / DocumentDB API.

Creating a simple graph in Gremlin

Let’s create a simple graph in a Cosmos DB using Gremlin.  In a past article we’ve looked at how to setup Gremlin with Cosmos DB.

[code language=”groovy”]

gremlin> :remote connect tinkerpop.server conf/remote-secure.yaml

gremlin> :> g.addV(‘person’).property(‘id’, ‘Alice’).property(‘age’, 42).property(‘department’, ‘stereotype’)

gremlin> :> g.addV(‘person’).property(‘id’, ‘Bob’).property(‘age’, 24).property(‘department’, ‘support character’)

gremlin> :> g.V(‘Alice’).addE(‘communicatesWith’).property(‘id’, ‘AliceToBob’).property(‘language’, ‘English’).to(g.V(‘Bob’))

[/code]

The first line is there to connect to the remote server we configured in remote-secure.yaml.  For details see the setup article.

We now have a toy graph with two vertices connected with one edge.  Nothing too fancy but that will be enough for our purpose.

image

We can note the following:

Document Query

The code is available on GitHub, more specifically in the Program.cs file.

Here we build on the code from the Cosmos DB async streaming article.  We simply read all the documents in the graph with DocumentDB API and output them in JSON format:

[code language=”groovy”]

private async static Task ListAllDocumentsAsync( DocumentClient client, Uri collectionUri) { var query = client.CreateDocumentQuery( collectionUri, new FeedOptions { EnableCrossPartitionQuery = true }); var queryAll = query.AsDocumentQuery(); var all = await GetAllResultsAsync(queryAll);

Console.WriteLine($"Collection contains {all.Length} documents:");

foreach (var d in all)
{
    var json = GetJson(d);

    if (d.Id == "CarolToAlice")
    {
        await client.DeleteDocumentAsync(
            d.SelfLink,
            new RequestOptions
            {
                PartitionKey = new PartitionKey(d.GetPropertyValue<string>("department"))
            });
    }

    Console.WriteLine(json);
}

Console.WriteLine(); }

[/code]

The output should be the following:

[code language=”javascript”]

{ "id": "Bob", "_rid": "smp9AKyqeQADAAAAAAAABA==", "_self": "dbs/smp9AA==/colls/smp9AKyqeQA=/docs/smp9AKyqeQADAAAAAAAABA==/", "_ts": 1504096168, "_etag": "\"00001c04-0000-0000-0000-59a6afad0000\"", "label": "person", "age": [ { "_value": 24, "id": "88a659bf-84d1-4c13-8450-ee57b426b7b3" } ], "department": "support character" } { "id": "Alice", "_rid": "smp9AKyqeQAKAAAAAAAABg==", "_self": "dbs/smp9AA==/colls/smp9AKyqeQA=/docs/smp9AKyqeQAKAAAAAAAABg==/", "_ts": 1504096164, "_etag": "\"0000ed09-0000-0000-0000-59a6afa60000\"", "label": "person", "age": [ { "_value": 42, "id": "78109dc8-587f-4d87-9d2e-e4a1731dec2b" } ], "department": "stereotype" } { "id": "AliceToBob", "_rid": "smp9AKyqeQALAAAAAAAABg==", "_self": "dbs/smp9AA==/colls/smp9AKyqeQA=/docs/smp9AKyqeQALAAAAAAAABg==/", "_ts": 1504096178, "_etag": "\"0000ee09-0000-0000-0000-59a6afb40000\"", "label": "communicatesWith", "language": "English", "_sink": "Bob", "_sinkLabel": "person", "_sinkPartition": "support character", "_vertexId": "Alice", "_vertexLabel": "person", "_isEdge": true, "department": "stereotype" }

[/code]

We can learn a lot from this projection:

Given that information, we can easily write queries, for instance, to list only vertices:

[code language=”csharp”]

private class MinimalDoc { public string id { get; set; } public bool? _isEdge { get; set; } }

private async static Task ListOnlyVerticesAsync( DocumentClient client, Uri collectionUri) { var query = client.CreateDocumentQuery<MinimalDoc>( collectionUri, new FeedOptions { EnableCrossPartitionQuery = true }); var queryVertex = (from d in query where !d._isEdge.HasValue select d).AsDocumentQuery(); var all = await GetAllResultsAsync(queryVertex);

Console.WriteLine($&quot;Collection contains {all.Length} documents:&quot;);

foreach (var d in all)
{
    Console.WriteLine(d.id);
}

Console.WriteLine(); }

[/code]

This should list Alice & Bob but not the edge between them.

Can we write?

Querying is all nice and good, but what about writing?

Let’s try to simply add a document in the graph:

[code language=”csharp”]

private async static Task AddTrivialVertexAsync( DocumentClient client, Uri collectionUri) { var response = await client.CreateDocumentAsync( collectionUri, new { id = "Carol", label = "person", department = "support character" }); var json = GetJson(response.Resource);

Console.WriteLine(json); }

[/code]

If we use the Gremlin Console to look at it:

[code language=”groovy”]

gremlin> :> g.V("Carol")

==>[id:Carol,label:person,type:vertex,properties:[department:[[id:Carol department,value:support character]]]]

[/code]

Hence we see the new document as a vertex.  That makes sense since we’ve seen that vertices are projected as simple documents.

If we add other simple properties (like we did with label) this will not work.  Those properties won’t show up in Gremlin.  That is because, as we’ve seen, in Gremlin, properties are always collections.  We can do that:

[code language=”csharp”]

private async static Task AddVertexWithPropertiesAsync( DocumentClient client, Uri collectionUri) { var response = await client.CreateDocumentAsync( collectionUri, new { id = "David", label = "person", age = new[] { new { id = Guid.NewGuid().ToString(), _value = 48 } }, department = "support character" }); var json = GetJson(response.Resource);

Console.WriteLine(json); }

[/code]

and in Gremlin:

[code language=”groovy”]

gremlin> :> g.V("David").valueMap()

==>[age:[48],department:[support character]]

[/code]

So it appears we can successfully write vertices in a graph using the DocumentDB API.

This is obviously useful to mass import graphs since there are a lot of tools out there that can import into DocumentDB.

Writing an edge

We can write vertices.  That is only half the equation for importing data in a graph.  What about edges?

It turns out we simply have to mimic what we’ve seen with existing edges:

[code language=”csharp”]

private static async Task AddEdgeAsync(DocumentClient client, Uri collectionUri) { var response = await client.CreateDocumentAsync( collectionUri, new { _isEdge = true, id = "CarolToAlice", label = "eavesdropOn", language = "English", department = "support character", _vertexId = "Carol", _vertexLabel = "person", _sink = "Alice", _sinkLabel = "person", _sinkPartition = "stereotype" }); var json = GetJson(response.Resource);

Console.WriteLine(json); }

[/code]

It is important for the edge’s partition to be the same as the source vertex, otherwise the edge won’t be seen by Gremlin.

We can validate the edge is now present in Gremlin:

[code language=”groovy”]

gremlin> :> g.E()

==>[id:CarolToAlice,label:eavesdropOn,type:edge,inVLabel:person,outVLabel:person,inV:Alice,outV:Carol,properties:[language:English]] ==>[id:AliceToBob,label:communicatesWith,type:edge,inVLabel:person,outVLabel:person,inV:Bob,outV:Alice,properties:[language:English]]

gremlin> :> g.V("Carol").out("eavesdropOn")

==>[id:Alice,label:person,type:vertex,properties:[age:[[id:78109dc8-587f-4d87-9d2e-e4a1731dec2b,value:42]],department:[[id:Alice department,value:stereotype]]]]

[/code]

Summary

We’ve seen it is possible to both read and write to a Cosmos DB graph using the DocumentDB API.

It would also be possible to do so using the MongoDB API.

An obvious use is to leverage DocumentDB (or MongoDB) tools to manipulate a graph, e.g. for an initial load.


4 responses

  1. Razvan Goga (@WhiteRGR) 2017-10-31 at 15:49

    So, if the Graph is actually stored (and queries / written) as JSONs which would be the difference between selecting Graph or SQL when creating a new Cosmo instance (the API setting)? Is there any under the hood process that relies on this? (maybe some indexing…).

    As far as I can tell from the docs you can select SQL and read/write via Gremlin and select Graph and read/write like with any other DocumentDB. Am I missing something?

  2. Vincent-Philippe Lauzon 2017-11-01 at 09:07

    Hi Razvan,

    The main difference would be the Portal experience. In the case of MongoDB, the underlying engine is different.

    Have a look at https://vincentlauzon.com/2017/09/10/hacking-changing-cosmos-db-portal-experience-from-graph-to-sql/ for a more complete discussion.

  3. Razvan Goga (@WhiteRGR) 2017-11-02 at 01:37

    pfff… spot on explanation. Exactly what I wanted to know - this should be part of the CosmosDB docs somehow. The docs (and Channel9 / youtube vids i could find) only hint at this but never clearly explain the issue.

    Thank you very much! Razvan

  4. Vincent-Philippe Lauzon 2017-11-02 at 05:30

    Thanks!

Leave a comment