Python Version in Databricks

In the last few months, we’ve looked at Azure Databricks:

In those articles, we used the Python SDK (also a bit of Spark SQL).  In this article, we’ll discuss the version of Python deployed in the Cluster.

Python 2 vs Python 3

There are a lot of discussions online around Python 2 and Python 3.  We won’t try to reproduce it here.

We’ll only refer to the Python’s wiki discussion and quote their short description:

Python 2.x is legacy, Python 3.x is the present and future of the language

In general, we would want to use version 3+.  We would fall back on version 2 if we are using legacy packages.

Python Version in Azure Databricks

The Python version running in a cluster is a property of the cluster:


As the time of this writing, i.e. end-of-March 2018, the default is version 2.

We can also see this by running the following command in a notebook:

import sys


We can change that by editing the cluster configuration.  It requires the cluster to restart to take effect.


Python runtime version is critical.

Running certain packages requires a specific version.  Even some native language features are bound to runtime version.  We need to control the runtime version.

We’ve seen here how to do that.

One response

  1. Saquib Ali 2019-04-16 at 02:29

    How to update python version on Azure Databricks?

Leave a comment