How to use Azure Data Lake Storage REST API


Azure Data Lake Storage (ADLS) Generation 2 has been around for a few months now.

That new generation of Azure Data Lake Storage integrates with Azure Storage. This makes it a service available in every Azure region. It also makes it easier to access as it is built on foundation well known to Azure users.

Unfortunately, there are no SDK yet (at the time of this writing, mid-May 2019). To add insult to injury, in its current form, the Blob API isn’t compatible with ADLS API. For instance, we can’t simply create a container using the Blob API and expect to see a file system within the account. This would actually fail.

Until this gets easier and / or APIs get compatible, we need to use the REST API in order to automate / programmatically access an account.

Azure Storage Explorer and AzCopy are also ADLS gen 2 aware.

In this article, we’ll show how to use the ADLS gen 2 REST API. We will use a Logic Apps with Managed System Identity (MSI) for simplicity. We are going to explore the fine-grained access control using Azure AD RBAC as well.

The same could be done using other compute (e.g. Function, AKS, VMs). We definitely recommend using MSI (or Pod Identity for AKS). MSI allows the acquisition of bearer token without the need to store Service Principal secrets anywhere. Of course, the same could be done with a Service Principal.

As usual, code is in GitHub.

Deploying demo

Let’s deploy our demo:

Deploy button

There is one parameter: the storage account name. This must be unique as storage account name always must be.

There should be three deployed resources:

Resources

The storage account is what the two logic apps are going to target.

Both Logic Apps have Managed System Identity (MSI) associated to them.

There is another resource deployed with those 3 is a role assignment. Straight from the ARM template, we give the role Storage Blob Data Contributor to the MSI of create-file-systems-adls-api-app. This will allow the Logic App to create file systems.

Create File Systems

Let’s then start by create file systems using create-file-systems-adls-api-app.

Let’s open the logic app in edit mode. We have two http calls: create-blue and create-red. Like Neo, we’re going to choose between the red pill and the blue pill.

Let’s look at one of them:

create-blue

Here we are using the File System / Create API.

This requires an HTTP PUT. We pass the storage account name as a Logic App parameter. We pass blue as a parameter in the URL. We pass the API version as HTTP header. We also use the Manage Identity to fetch a token for the https://storage.azure.com audience (or scope, or resource, depending on the oauth lexicon you look at).

The create-red is virtually identical, except passing red in parameter.

We can run this Logic App. We can see each HTTP tasks has an HTTP code of 201 (created) returned to them from the API.

We can validate in Azure Storage Explorer that two file systems were created.

List path

Now if we turn to the list-adls-api-app Logic App, we can see two HTTP tasks again: list-blue & list-red:

List

Those are using the Path / List API.

Focussing on list-blue, an HTTP Get method is used. We passed the blue file system in the URL. We also pass the URL “/” (root, url encoded to %2F). We pass the recursive=false parameter. We again use the managed identity.

Since this is another Logic App, it has a different managed identity. That identity didn’t have any role assignment.

If we run that logic app, we’ll see both HTTP call failing with a code 403 and the following payload:

{
  "error": {
    "code": "AuthorizationPermissionMismatch",
    "message": "This request is not authorized to perform this operation using this permission.\nRequestId:XYZ\nTime:XYZ"
  }
}

This is because the Logic App managed identity doesn’t have any role or access permission.

Give access permission

Instead of assigning a role to the storage account resource, we’ll give permission at the file system level.

For that we’ll turn to Azure Storage Explorer. We’ll right-click on the blue file system (we choose the blue pill) and select “Manage Access…”.

First, we’ll need the Object ID of our managed identity. We could get it from the output of the ARM template deployment but an easy way to get it is to go to the Identity pane of the Logic App and pick the Object ID right there.

object-id

We are going to input this ID in the “Add user or group” text box of Azure Storage Explorer and press Add.

We are then going to add the following permissions:

permissions

We need the MSI to be able to read a folder and traverse it (execute).

We need to save those permissions in Azure Storage Explorer.

We can then run Logic App again and see the blue file system is now accessible while red remains inaccessible.

partial success

This is inline with our configuration: we gave access to the blue file system but not the red.

ADLS gen 2 allows us to define access control at a granular level (even blob level).

Summary

We were able to invoke two different APIs from the ADLS gen 2 APIs.

We used Azure Managed System Identity (MSI) on Logic Apps.

All APIs can be accessed similarly.

Using a Service Principal instead would be slightly different. We would first need to do an authentication call and then use the bearer token for the authorization header of the ADLS gen 2 API call.

Hopefully this can be useful until an SDK for ADLS gen 2 becomes available.


6 thoughts on “How to use Azure Data Lake Storage REST API

      1. Thank you! and it would be great if you could also add details related to cross tenant (b2b) authentication – i believe its not supported today by Gen2.

  1. Hi Vincent, We are unable to replicate your example in our domain. It errors out with the message “Managed Identity authentication type is not supported”. Has something changed? I see there is a mandatory header value – content-type= application/json which is not added in your example.
    Let me know.
    Thanks.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s