Data Engineering & Analytics

Predict Industrial Equipment Health with AWS SageMaker Algorithms

Raghavan Madabusi

Jan 6, 2023 • 5 min read

Predicting the health of industrial equipment using vibration data from sensors is a common application of machine learning in the industrial sector. By analyzing the vibrations emitted by the equipment, it is possible to identify patterns and anomalies that can indicate the presence of potential problems.

One approach to predicting the health of industrial equipment using vibration data is to use a machine learning model that is trained to classify the equipment as either healthy or unhealthy based on the vibration data.

AWS SageMaker

AWS SageMaker provides a number of built-in algorithms that can be used to build, train, and deploy machine learning models. For predicting the health of industrial equipment using vibration data, you could use a supervised learning algorithm such as Linear Learner or DeepAR.

Linear Learner:

A supervised learning algorithm that is well-suited for tasks such as regression and binary classification. It is based on linear regression and uses a linear model to make predictions based on a set of input features.
To use linear learner for predicting the health of industrial equipment using vibration data, you would first need to prepare the data by extracting relevant features from the vibration data and labeling the data as healthy or unhealthy. You could then use SageMaker’s linear learner algorithm to train a model on this data.
Once the model has been trained, you could use SageMaker’s managed training and deployment services to deploy the model in a production environment, where it can be used to make real-time predictions on new vibration data.
You can also use SageMaker’s tools for evaluating the model’s performance and fine-tuning its hyperparameters to ensure that it is accurate and reliable.

import sagemaker

# set up the session and the role
sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()

# load the vibration data
df = pd.read_csv('vibration_data.csv')

# define the target variable (equipment failure)
y = df['failure']

# select the features for training the model
X = df.drop(columns=['failure'])

# split the data into a training set and a test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# convert the data to the format required by SageMaker
train_data = sagemaker.predictor.RealTimePredictor.serialize_data(X_train.values, y_train.values, 'libsvm')
test_data = sagemaker.predictor.RealTimePredictor.serialize_data(X_test.values, y_test.values, 'libsvm')

# upload the data to S3
train_s3 = sagemaker_session.upload_data(path='train_data.libsvm', bucket=sagemaker_session.default_bucket(), key_prefix='sagemaker/data')
test_s3 = sagemaker_session.upload_data(path='test_data.libsvm', bucket=sagemaker_session.default_bucket(), key_prefix='sagemaker/data')

# create a linear learner estimator
linear_learner = sagemaker.LinearLearner(role=role,
                                        train_instance_count=1,
                                        train_instance_type='ml.c4.xlarge',
                                        predictor_type='binary_classifier',
                                        num_classes=2)

# fit the model to the data
linear_learner.fit({'train': train_s3, 'test': test_s3})

# deploy the model
linear_learner_predictor = linear_learner.deploy(instance_type='ml.t2.medium', initial_instance_count=1)

# make predictions on the test data
y_pred = linear_learner_predictor.predict(X_test.values)

# evaluate the model's performance
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', accuracy)

# delete the predictor to avoid incurring additional charges
linear_learner_predictor.delete_endpoint()

This code trains a linear learner model on the vibration data to predict whether or not the equipment will fail. The model is trained on 80% of the data and tested on the remaining 20%. The accuracy of the model is then printed as a measure of its performance. The model is deployed using SageMaker’s real-time prediction endpoint, which allows you to make predictions.

DeepAR Forecasting:

Another supervised learning algorithm that is available in SageMaker. It is a deep learning-based algorithm that is specifically designed for forecasting time series data.
To use DeepAR for predicting the health of industrial equipment using vibration data, you would again need to prepare the data by extracting relevant features and labeling the data as healthy or unhealthy. You could then use SageMaker’s DeepAR algorithm to train a model on this data.
Like with Linear Learner, you could use SageMaker’s managed training and deployment services to deploy the DeepAR model in a production environment, where it can be used to make real-time predictions on new vibration data.
You can also use SageMaker’s tools for evaluating the model’s performance and fine-tuning its hyperparameters to ensure that it is accurate and reliable.

import sagemaker

# set up the session and the role
sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()

# load the vibration data
df = pd.read_csv('vibration_data.csv')

# define the target variable (equipment failure)
y = df['failure']

# select the features for training the model
X = df.drop(columns=['failure'])

# split the data into a training set and a test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# convert the data to the format required by SageMaker
train_data = sagemaker.predictor.RealTimePredictor.serialize_data(X_train.values, y_train.values, 'json')
test_data = sagemaker.predictor.RealTimePredictor.serialize_data(X_test.values, y_test.values, 'json')

# upload the data to S3
train_s3 = sagemaker_session.upload_data(path='train_data.json', bucket=sagemaker_session.default_bucket(), key_prefix='sagemaker/data')
test_s3 = sagemaker_session.upload_data(path='test_data.json', bucket=sagemaker_session.default_bucket(), key_prefix='sagemaker/data')

# create a DeepAR estimator
deepar = sagemaker.estimator.Estimator(image_name='524661435032.dkr.ecr.us-east-1.amazonaws.com/forecasting-deepar:1',
                                      role=role,
                                      train_instance_count=1,
                                      train_instance_type='ml.c4.xlarge',
                                      sagemaker_session=sagemaker_session)

# set the hyperparameters for the model
deepar.set_hyperparameters(time_freq='H',
                           epochs=100,
                           early_stopping_patience=10,
                           prediction_length=1,
                           num_cells=40,
                           num_layers=2,
                           mini_batch_size=32,
                           learning_rate=0.001,
                           num_batches_per_epoch=1000)

# fit the model to the data
deepar.fit({'train': train_s3, 'test': test_s3})

# deploy the model
deepar_predictor = deepar.deploy(instance_type='ml.t2.medium', initial_instance_count=1)

# make predictions on the test data
y_pred = deepar_predictor.predict(X_test.values)

# evaluate the model's performance
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', accuracy)

# delete the predictor to avoid incurring additional charges
deepar_predictor.delete_endpoint

Custom Algorithms:

Apart from SageMaker built-in algorithms, you can use other deep learning models to train it on SageMaker

# import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.optimizers import Adam
from sklearn.metrics import accuracy_score

# load the vibration data
df = pd.read_csv('vibration_data.csv')

# define the target variable (equipment failure)
y = df['failure']

# convert the target variable to a categorical form
y = to_categorical(y)

# select the features for training the model
X = df.drop(columns=['failure'])

# split the data into a training set and a test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# define the model architecture
model = Sequential()
model.add(Dense(64, input_dim=X_train.shape[1], activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(2, activation='softmax'))

# compile the model
model.compile(loss='categorical_crossentropy', optimizer=Adam(learning_rate=0.001), metrics=['accuracy'])

# set up early stopping
es = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

# train the model
history = model.fit(X_train, y_train, epochs=100, batch_size=32, validation_data=(X_test, y_test), callbacks=[es])

# make predictions on the test data
y_pred = model.predict(X_test)

# evaluate the model's performance
accuracy = accuracy_score(y_test, y_pred.argmax(axis=1))
print('Accuracy:', accuracy)

This code trains a multilayer perceptron on the vibration data to predict whether or not the equipment will fail. The model is trained on 80% of the data and tested on the remaining 20%. The accuracy of the model is then printed as a measure of its performance. The model uses early stopping to prevent overfitting.

Conclusion

Overall, predicting the health of industrial equipment using vibration data with machine learning can help organizations to proactively identify and address potential problems before they lead to costly failures or downtime. By leveraging the power of machine learning, organizations can improve the reliability and efficiency of their equipment and reduce the risk of costly disruptions.

Author: Raghavan Madabusi