TensorFlow Predictions on test data using dataframe
Introduction
Machine learning models are trained by data scientists but too often predictions made are not analyzed to check for bias, fairness and insights. This step is helpful to do what-if analysis on model performance. Google’s what-if tool is very useful in doing this. We need predictions on large sample of test data. In this article I will share simple way for making batch predictions on test data using test dataframe.
Convert dataframe and predict
The following function is used to convert training data to tf.dataset. We will reuse the same function so that our test data will be in sync with training data.
def df_to_dataset(dataframe, shuffle=True, batch_size=32):
dataframe = dataframe.copy()
labels = dataframe.pop(target_column)
ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
if shuffle:
ds = ds.shuffle(buffer_size=len(dataframe))
ds = ds.batch(batch_size)
return ds
You can reuse the function on test dataframe by adding target_column if your test data does not have it.
actuals_available = Trueif target_column not in list(test_df.columns):
test_df[target_column] = 0 #any value - we will drop
actuals_available = False
Now convert dataframe to dataset but make sure shuffling is turned off.
test_ds = df_to_dataset(test_df, shuffle=False, batch_size=len(test_df))
Here is the code — please note that predictions should not be shuffled (that is why we turned shuffling off while building dataset)
test_df[”pred”] = model.predict(test_ds).ravel()
Conclusion
Using TF dataset predictions can be made on a large sample of test data. Having both actual value and prediction value in dataframe will help us measure accuracy and do what-if analysis on various facets of data.