We’ve recently looked Azure Databricks:
- Getting Started
- Resilient Distributed Dataset
- Spark SQL – Data Frames
- Transforming Data Frames in Spark
- Parsing escaping CSV files in Spark
In most cases we did share notebooks on GitHub.
Here we wanted to show how easy it is to import those notebooks.
Choosing a Notebook
First, let’s choose a notebook. We can pick a notebook from our own computer but we wanted to show how easy it is to import one from GitHub. GitHub exposes public URLs which makes it real easy.
Let’s go into the ted folder of our GitHub repo: https://github.com/vplauzon/databricks/tree/master/ted.
From there we can click on ted.ipynb. GitHub actually renders Jupyter notebooks, which is nice. But let’s take the raw version of the file by clicking the raw button:
This will lead to the raw content of the file (which happens to be JSON based). Let’s copy the URL (https://raw.githubusercontent.com/vplauzon/databricks/master/ted/ted.ipynb).
Import in Databricks workspace
In Databricks’ portal, let’s first select the workspace menu.
Let’s pull down the Workspace menu and select Import.
We get an Import Notebooks pop-up. Default configuration imports from File, i.e. local file. This is where we could import a Jupyter notebook from our local file system.
We want to import from GitHub, so let’s select the URL option.
From there we can paste the notebook raw URL from GitHub and click Import.
This imports the notebook file and creates a notebook in our workspace.
It is pretty easy to import a Notebook from GitHub or other public URLs. We can also save notebooks on our computer and import them from files.
Databricks allows collaboration within a team via workspaces. It also allows collaboration across teams by importing / exporting notebooks.