Python Version in Databricks

In the last few months, we’ve looked at Azure Databricks:

In those articles, we used the Python SDK (also a bit of Spark SQL).  In this article, we’ll discuss the version of Python deployed in the Cluster.

Python 2 vs Python 3

There are a lot of discussions online around Python 2 and Python 3.  We won’t try to reproduce it here.

We’ll only refer to the Python’s wiki discussion and quote their short description:

Python 2.x is legacy, Python 3.x is the present and future of the language

In general, we would want to use version 3+.  We would fall back on version 2 if we are using legacy packages.

Python Version in Azure Databricks

The Python version running in a cluster is a property of the cluster:


As the time of this writing, i.e. end-of-March 2018, the default is version 2.

We can also see this by running the following command in a notebook:

import sys


We can change that by editing the cluster configuration.  It requires the cluster to restart to take effect.


Python runtime version is critical.

Running certain packages requires a specific version.  Even some native language features are bound to runtime version.  We need to control the runtime version.

We’ve seen here how to do that.

2 thoughts on “Python Version in Databricks

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s