Setup for populating Cosmos DB with random data using Logic Apps

pexels-photo-267968[1]We recently published an article about Cosmos DB Performance with Geospatial Data.

In this article, we’re going to explain how to setup the environment in order to run those performance test.

More importantly, we believe this article is interesting on its own as it shows how to use Logic Apps to populate a Cosmos DB collection with random data in a very efficient way.

For this we will use a stored procedure as we explored in a past article.

The ARM Template is available on GitHub.

Azure Resources

We want to create three main Azure Resources:

We will also need to create artefacts within the Cosmos DB account.  Namely:

ARM Template Deployment

Let’s create the Azure resource using the ARM template deployment available on GitHub (see deployment buttons at the bottom of the page).

The template has four parameters.  The first one is mandatory, the other three have default values:

If we leave the default as is, we’ll have 4000 x 300 = 1.2 million documents, with a third of them (i.e. 400 000) with geospatial locations.  This corresponds to what we used for performance test.

Creating a Collection

Unfortunately, the Cosmos DB resource provider doesn’t expose sub components in the ARM model.  So we can’t create the collection within the ARM template.

We could use the Command Line Interface (CLI) for Cosmos DB.  Here we will use the Portal.

Let’s open the Cosmos DB Account resource created by the ARM template.

Let’s go to the Data Explorer tab.

image

Let’s then select New Collection and fill the form this way:

image

There are a few important fields in there:

Modifying Index Policy

While still being in the Data Explorer, let’s select the Scale & Settings of our collection:

image

At the bottom of the pane, let’s edit the Indexing Policy:

image

Geospatial data isn’t indexed by default.  We therefore need to add at least the “Point” data type for indexing.

The procedure is explained in the public documentation.

It is important to do this before loading the data so the data is indexed on load instead of asynchronously indexed after a change of policy.

Creating Stored Procedure

While still being in the Data Explorer, let’s select New Stored Procedure:

image

Let’s enter createRecords as Stored Procedure Id.

For the body, let’s copy-paste the content of the CreateRecords.js.

Click Save.

Increase RUs

Before going into the Logic App, let’s beef up the Request Units (RUs) of our collection.

image

We suggest boosting it to the maximum, i.e. 100 000.

Then click Save.

Executing the Logic Apps

Let’s open the Logic App in the same Azure Resource Group.

image

The whole point for using Logic Apps here is to have a component that will invoke Cosmos DB stored procedures in parallel in a reliable fashion.

Let’s click on Run Trigger and then manual (in the sub menu).

image

A run calls the stored procedures 4000 times and take about 5-6 minutes to do so.

Reduce RUs

Do not forget to scale the collection back.  The scale is the main driver for the cost of a collection.

Summary

Loading random data quickly in a Cosmos DB is best done by leveraging stored procedure as they run close to the data and can create documents very quickly.

Stored procedures run within a partition.  So we also need something to loop among partition and this is what Logic Apps does here.

Logic App is also very cost effective since it is a server-less resource which incur costs only when used.


4 responses

  1. Kannan G 2018-05-10 at 05:25

    Hi Vincent-Philippe, Is it possible to increase the RUs using logics apps ?

  2. Vincent-Philippe Lauzon 2018-05-10 at 06:30

    I don’t think the Cosmos DB connector expose any management capabilities. You would need to use the Management API. More likely you could implement a PowerShell script doing that in Azure Automation and expose the runbook as a webhook which you could invoke from Logic App.

    Alternatively, if time is no issue, you can simply pump the documents in Event Hub and then use Stream Analytics to direct the documents to Cosmos DB. Stream Analytics will take care of backing off if Cosmos DB with low RU can’t ingest the documents.

  3. gktouch 2018-05-15 at 23:35

    Thanks for input Vincent… will you be able to help me to create powershell script to increase/decrease the RU’s using automation/function.

  4. Vincent-Philippe Lauzon 2018-05-16 at 07:58

    Excellent.

Leave a comment