Training a model to predict failures


Today a quick entry to talk about a twist on Machine Learning for the predictive maintenance problem.

The Microsoft Cortana Intelligence team wrote an interesting blog the other day:  Evaluating Failure Prediction Models for Predictive Maintenance.

When you listen to all the buzz around Machine Learning, it sometimes feels as if we’ve solved all the ML problems and you just need to point your engine in the general direction of your data for it to spew some insights.  Not yet.

thumb-794696_640That article highlights a challenge with predictive maintenance.  Predictive maintenance is about predicting when a failure (of a device, hardware, process, etc.) will occur.  But…  you typically have very few failures:  your data set is completely imbalanced between success samples and failure samples.

If you go heads on and train a model with your data set, a likely outcome is that your model will predict success %100 of the time.  This is because it doesn’t get penalized that much by ignoring the few failures and gets rewarded by not missing any success.

Remember that Machine Learning is about optimizing a cost function over a sample set (training set).  I like to draw the comparison with humans on a social system with metrics (e.g. bonus in a job, laws in society, etc.):  humans will find any loophole in a metric in order to maximize it and rip the reward.  So does Machine Learning.  You therefore have to be on the lookout for those loopholes.

With predictive maintenance, failures often cost a lot of money, sometimes more than a false positive (i.e. a non-failure identified as a failure).  For this reason, you don’t want to miss failures:  you want to compensate for the imbalance in your data set.

The article, which I suggest you read in full when you’ll need to apply it, suggests 3 ways to compensate for the lack of failures in your data set:

  1. You can resample the failures to increase their occurrence ; for instance by using the Synthetic Minority Oversampling Technique (SMOTE) readily available in Azure ML.
  2. Tune the hyper-parameters of your model (e.g. by using a parameter sweep, also readily available in Azure ML) to optimize for recall.
  3. Change the metric to penalize false negative more than false positive.

This is just one of the many twists you have to think of when creating a predictive model.

My basic advice:  make sure you not only ask the right question but with the right carrot & stick Winking smile  If a failure, to you, costs more than a success identified as a failure (i.e. a false positive), then factor this in to your model.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s