In the last few months, we’ve looked at Azure Databricks:
- Getting Started
- Resilient Distributed Dataset
- Spark SQL – Data Frames
- Transforming Data Frames in Spark
- Parsing escaping CSV files in Spark
- Import Notebooks in Databricks
Python 2 vs Python 3
There are a lot of discussions online around Python 2 and Python 3. We won’t try to reproduce it here.
We’ll only refer to the Python’s wiki discussion and quote their short description:
Python 2.x is legacy, Python 3.x is the present and future of the language
In general, we would want to use version 3+. We would fall back on version 2 if we are using legacy packages.
Python Version in Azure Databricks
The Python version running in a cluster is a property of the cluster:
As the time of this writing, i.e. end-of-March 2018, the default is version 2.
We can also see this by running the following command in a notebook:
import sys sys.version
We can change that by editing the cluster configuration. It requires the cluster to restart to take effect.
Python runtime version is critical.
We’ve seen here how to do that.