Azure Databricks – Transforming Data Frames in Spark

In previous weeks, we’ve looked at Azure Databricks, Azure’s managed Spark cluster service. We then looked at Resilient Distributed Datasets (RDDs) & Spark SQL / Data Frames. We wanted to look at some more Data Frames, with a bigger data set, more precisely some transformation techniques.  We often say that most of the leg work … More Azure Databricks – Transforming Data Frames in Spark

Azure Databricks – Spark SQL – Data Frames

We looked at Azure Databricks a few weeks ago. Azure Databricks is a managed Apache Spark Cluster service. More recently we looked at how to analyze a data set using Resilient Distributed Dataset (RDD).  We used the Social characteristics of the Marvel Universe public dataset, replicating some experiments we did 2 years ago with Azure … More Azure Databricks – Spark SQL – Data Frames

Azure Databricks – RDD – Resilient Distributed Dataset

We looked at Azure Databricks a few weeks ago.  Azure Databricks is a managed Apache Spark Cluster service. In this article, we are going to look at & use a fundamental building block of Apache Spark:  Resilient Distributed Dataset or RDD.  We are going to use the Python SDK. It is important to note that … More Azure Databricks – RDD – Resilient Distributed Dataset

Azure Databricks – Getting Started

Apache Spark is rising in popularity as a Big Data platform.  It exists on this accelerated timeline for such an impactful technology. Think about it: 2009, started as a Berkeley’s University project. 2010, open sourced 2013, donated to Apache Foundation 2014, becomes Top-Level Apache Project In 2013, the creators of Spark founded Databricks.  Databricks has … More Azure Databricks – Getting Started

Disaster Recovery with VM Scale Sets & Geo-Replicated DBs

Last year we posted an article about different options available in Azure to implement a disaster recovery strategy. We strongly suggest to review that article as it gives good insights about what a disaster recovery strategy is within an already resilient Cloud Environment but also clear out a few misconceptions people have around DR-capability of … More Disaster Recovery with VM Scale Sets & Geo-Replicated DBs

Setup for populating Cosmos DB with random data using Logic Apps

We recently published an article about Cosmos DB Performance with Geospatial Data. In this article, we’re going to explain how to setup the environment in order to run those performance test. More importantly, we believe this article is interesting on its own as it shows how to use Logic Apps to populate a Cosmos DB … More Setup for populating Cosmos DB with random data using Logic Apps