Azure Data Explorer (Kusto)

Let’s talk about Azure Data Explorer (ADX ADX) also known as Kusto.

If you ask me that is the best kept secret in Azure.

Well, it isn’t exactly a secret but most people do not know about it or if they do, they just think of it as the back-end engine behind Azure Monitor.

ADX is an Azure Analytics Service. It is great at analyzing large volume of near real time telemetry such as logs and IoT.

Isn’t that what Azure Datawarehouse is supposed to do? Or Azure Databricks?

In this article, I’ll go around characteristics of the service: what its strength are and where it is complemented by other services.

I started with a huge essay trying to cover every aspects but I was bored writing it so I guess it wouldn’t have been very exciting to reading material. I went with a much lighter version. I’ll explore it further in future articles.

Update 23-06-2020: To see Kusto in action, I recommend the article Exploring a data set with Kusto.

Scale & Performance

The online documentation says it scales to terabytes of data in minutes.

That is true but it is also true of many distributed data services.

The uniqueness comes in what we can do at that scale.

At heart Azure Data Explorer (ADX) is about… Data Exploration. It is a real challenge to explore data at the Terabyte scale with little data preparation, i.e. no defined indexes & no pre-computed aggregations.

Near real time

Integration

ADX has an impressive gallery of integration for such a young service:

The list is growing and doesn’t contain only Azure technology. ADX can therefore easily be part of a bigger solution.

What ADX isn’t optimal for / stretch scenarios

The public cloud brought a lot of fragmentation in the Data services. Although part of the reasons for that is the youth of the public cloud technologies, it is also due to inherent characteristics of big data analytics in the cloud:

Since we do not own the hardware the workloads are running on, we do not have to get married with one technology and run everything on it to amortise the cost of said hardware / licence. We can use the best tool for the job.

This is a balancing act as we need to take the skill set of people into account.

Most of the scenarios we are citing here can be done with ADX but it wouldn’t be the best platform to do so.

Scenario Why Azure PaaS Alternatives
Data warehouse For starter, ADX is mostly an append-only store. It isn’t transactional, doesn’t have log journals, etc. . This is part of the reasons it is so fast, but also part of the reasons it is a poor fit for a Datawarehouse. Also, although it is very fast, pre-computed aggregations would be better for dashboards. For the sceptics, the rumors of data warehousing’s dead have been greatly exaggerated. Azure Synapse & Power BI Premium
Application Back end Similar to Data warehousing, ADX isn’t built as a transactional workload. Cosmos DB, Azure SQL DB, Azure PostgreSQL, Azure MySQL, Azure MariaDB
Machine Learning (ML) Training ADX supports some built-in ML algorithms (mostly clustering algorithms and statistical tools at the time of this writing, i.e. February 2020), it isn’t an ML training platform. It is excellent for running prediction on a pre-training model though. Azure ML, Spark (Azure Databricks or Azure HD Insight), Azure Batch & Data Science Virtual Machine (DSVM)
Sub-second streaming ADX can go as low as seconds of latency in ingesting data and be able to do analytics (i.e. events are still indexed and can be queried). Most “near real time” scenarios fall comfortably within that window. But it isn’t a sub-second streaming platform (e.g. for low-latency-trading). Structured Streaming in Continuous Mode in Spark (Azure Databricks or Azure HD Insight), Kafka Streams on Azure HD Insight, Flink on Azure HD Insight

Concrete scenarios

Here are some scenarios we’ve seen in different industries. This is by no mean an exhaustive list but the popular scenarios.

Quite a few customers are using ADX / Kusto to analyze unified logs, i.e. logs from on-premise systems and different clouds. This is typical log analysis, so it could be for security, reliability engineering, forecasting, etc. .

IoT telemetry analysis is quite popular. As customers capture telemetry, they want to mine that data.

We see different businesses using it to analyze transactions (sales) to understand customer behaviours, predict trends or spike and optimize go-to-market strategy. What if in days of deploying a new product we could figure out what customer segment is having traction and which ones are lagging?

In general, we see customers starting with historical analysis and then move to more and more real time analysis as the teams are getting more comfortable with the service.

Summary

We hope we manage to give a good idea of what ADX can do.

It is also important to note that it is the data platform for other Azure Services:


Leave a comment