U-SQL Cross Apply

UPDATE (19-01-2016):  Have a look at Azure Data Lake series for more posts on Azure Data Lake.

Azure Data Lake (both Storage & Analytics) has been in public preview for a month or two.

You can get started by reading this.

I thought I would kick some posts about more complex scenarios to display what’s possibile with that technology.

In this post I’ll write about a specific U-SQL operation since its syntax is a bit tricky.

The Problem

U-SQL supports arrays in the form of the .NET type SQL.Array<T>.  This is very convenient in many scenarios, especially when you are parsing inputs.

We often want to take an array and explode it on different rows.  This is useful not only with arrays but with SQL.MAP<T,V>, or other complex types, e.g. XML or JSON.

This is accomplished by using the CROSS APPLY & EXPLODE operators.

The Solution

Let’s look at the solution.

@content =
	SELECT *
	FROM(
		VALUES
		(
			12,
			"a, b, c"
		),
		(
			75,
			"f, g, h, i, j"
		)) AS t([ID], LetterList);

@inArray =
	SELECT [ID],
		SqlArray.Create(LetterList.Split(',')) AS LetterArray
	FROM @content;

@exploded =
	SELECT [ID],
		letter.Trim() AS Letter
	FROM @inArray
	CROSS APPLY
	EXPLODE(LetterArray) AS r(letter);

OUTPUT @exploded
TO "/Outputs/explosion.tsv"
USING Outputters.Tsv();

In the first expression, I create a table from a string:

ID LetterList
12 "a, b, c"
75 "f, g, h, j, k"

In the second expression, I transform the second column into an array:

ID LetterList
12 {"a", " b", " c"}
75 {"f", " g", " h", " j", " k"}

In the third and final expression, I explode the arrays on different rows (and trim the string to remove trailing spaces):

ID Letter
12 "a"
12 "b"
12 "c"
75 "f"
75 "g"
75 "h"
75 "i"
75 "j"

Once you know it the syntax is pretty simple but it isn’t trivial to deduce that from the documentation, hence this post.

Conclusion

Very useful to basically pivot raw arrays to rows, CROSS APPLY with EXPLODE is another tool to help you parse semi-structured data into structured data (tables).


2 responses

  1. dhruba 2016-01-26 at 08:51

    hi Vincent,

    How can I do the reverse thing I create a file of final expression and now I need to create In the first expression

  2. Vincent-Philippe Lauzon 2016-01-26 at 09:30

    I’m not sure I get what you mean by final vs first expression. Tell me a bit more about your context please.

Leave a comment