Azure Data Lake Analytics Quick Start
Solution ·UPDATE (19-01-2016): Have a look at Azure Data Lake series for more posts on Azure Data Lake.
Azure Data Lake (both Storage & Analytics) has been in public preview for a month or two.
It already has surprisingly good documentation:
- Overview of U-SQL ; walks you through diverse scenarios, ramping you up quickly
- U-SQL Language Reference ; this is full-on details, including the context-free grammar expression of the language!
Azure Data Lake Analytics (ADLA) is a really great technology. It combines the power of Hadoop with the simplicity of the like of Azure SQL Azure. It’s super productive and easy to use while still being pretty powerfull.
At the core of this productivity is a new language: U-SQL. USQL is based on SCOPE, an internal (Microsoft) / research language and aims at unifying the declarative power of SQL with the imperative capacity of C#.
I like to call it Hive for .NET developers.
It’s the ultimately managed Hadoop: you submit U-SQL & the number of processing unit you want it to run it on and that’s it. No cluster to configure, no patching, no nodes to take up or down, etc. . Nodes are provisioned for you to run your script and returned to a pool afterwards.
I would recommend it for the following scenarios:
- Exploration of data sets: load your data in and start running ad hoc queries on to learn what your data is made of
- Data processing: process (or pre-process) your data into a shape useful for Machine Learning, reporting, search or online algorithms
I thought I would kick some posts about more complex scenarios to display what’s possibile with that technology.
I won’t cover the basics-basics, so please read the Logistic / Get Started articles.