Azure Data Lake Storage Logic App with Managed Identities

Last time we discussed some gotcha with Azure Data Lake Storage (ADLS) and access control. Those intricacies are useful when accessing ADLS using Azure AD authentication.

Unfortunately, Azure AD authentication is a little more than one year old, so a lot of tools are still using the good old storage account access keys. Those have major inconvenience, chief of those being they give access to everything and they do not allow traceability.

I’m a big Azure Logic App fan and use it for many tasks. One of the great feature of Azure Logic App is Managed Service Identity where the Logic App is given an identity, a Service Principal, which we can use within the App. Unfortunately, we can’t use that identity with ADLS connector.

So today, I’m going to show you how to do that using the ADLS REST API within Logic App. This is a little like the sample app I’ve done a year ago but this time the app is reusable as a blob list app.

To change things a bit we’re going to use a user assigned instead of a system assigned identity. I find user assigned useful Logic Apps since multiple apps can share the same identity if they require the same type of access control. In this case we are deploying a list-blob app, but a read-blob app could share the same identity.

As usual, code is in GitHub.

Deploy the App

Let’s start by deploying the Logic App:

Deploy button

The ARM template doesn’t take any parameter and deploys two resources:

resources

We have a Managed Identity and a Logic App. The Managed Identity is the User Assigned Managed Identity we discussed in the previous section. It is bound to an Azure AD Service Principal and used by the Logic App.

Logic App

Let’s open the Logic App.

Logic App

This app is calling the ADLS path list REST API. It has the following parameters:

Parameter Type Mandatory Description
storageAccount string Yes Name of the storage account we want to list blobs from
container string Yes Name of the container, within the storage account, we want to list blobs from
directory string No Directory we want to look into. By default, it goes to the root of the container
suffix string No The suffix of the blobs we’re interested in. This would filter out the output.
doListDirectories boolean No Do we want to have the list of traversed directories as well as the blobs? Default is false

Beside processing the parameters, the only “complexity” of the app is to handle potential continuation over the REST API. That is, if there are a lot of blobs returned, the app needs to call the API multiple times.

If we open the until-continuation shape, we can find the data-lake-list inside it. This is the shape invoking the REST API:

Authentication

We can see the authentication section uses Managed Identity and more specifically, it uses the user defined managed identity accompanying the app.

Authorizing the identity

Before using the app, we need to authorize it to read a data lake storage.

The easiest way is to give it Storage Blob Data Reader role:

RBAC

We reviewed last time the different intricacies of access control.

Testing the API

We are going to fetch the URL from the Logic App:

URL

We are going to use Postman but as usual, any HTTP-posting capable tool will do.

As usual with Logic App, it is important to add the following two headers:

Header Value
Content-Type application/json
Accept application/json

We can then test listing all the blobs in a given container by posting the following body to the Logic App URL:

{
  "storageAccount": "myaccount",
  "container": "mycontainer"
}

This should return the list of blobs our identity has access to. This is unedited what the REST API returns.

We can also get the directories by changing the body slightly:

{
  "storageAccount": "myaccount",
  "container": "mycontainer",
  "doListDirectories" : true
}

Finally, we can also retrieve only, for instance, the parquet files:

{
  "storageAccount": "myaccount",
  "container": "mycontainer",
  "suffix" : ".parquet"
}

Summary

We’ve implemented a simple reusable Logic App encapsulating the ADLS path list REST API.

The key part is that it uses an Azure AD Managed Identity to invoke the API as opposed to the Access Keys.


Leave a comment