Python Version in Databricks


In the last few months, we’ve looked at Azure Databricks:

In those articles, we used the Python SDK (also a bit of Spark SQL).  In this article, we’ll discuss the version of Python deployed in the Cluster.

Python 2 vs Python 3

There are a lot of discussions online around Python 2 and Python 3.  We won’t try to reproduce it here.

We’ll only refer to the Python’s wiki discussion and quote their short description:

Python 2.x is legacy, Python 3.x is the present and future of the language

In general, we would want to use version 3+.  We would fall back on version 2 if we are using legacy packages.

Python Version in Azure Databricks

The Python version running in a cluster is a property of the cluster:

image

As the time of this writing, i.e. end-of-March 2018, the default is version 2.

We can also see this by running the following command in a notebook:

import sys

sys.version

We can change that by editing the cluster configuration.  It requires the cluster to restart to take effect.

Summary

Python runtime version is critical.

Running certain packages requires a specific version.  Even some native language features are bound to runtime version.  We need to control the runtime version.

We’ve seen here how to do that.

Advertisements

One thought on “Python Version in Databricks

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s