Sentiment Prediction in IMDB Reviews using an LSTM¶
import tempfile
import os
import numpy as np
import torch
import torch.nn as nn
import fastestimator as fe
from fastestimator.dataset.data import imdb_review
from fastestimator.op.numpyop.univariate.reshape import Reshape
from fastestimator.op.tensorop.loss import CrossEntropy
from fastestimator.op.tensorop.model import ModelOp, UpdateOp
from fastestimator.trace.io import BestModelSaver
from fastestimator.trace.metric import Accuracy
from fastestimator.backend import load_model
MAX_WORDS = 10000
MAX_LEN = 500
batch_size = 64
epochs = 10
train_steps_per_epoch = None
eval_steps_per_epoch = None
Building components
Step 1: Prepare training & evaluation data and define a Pipeline
¶
We are loading the dataset from tf.keras.datasets.imdb which contains movie reviews and sentiment scores. All the words have been replaced with the integers that specifies the popularity of the word in corpus. To ensure all the sequences are of same length we need to pad the input sequences before defining the Pipeline
.
train_data, eval_data = imdb_review.load_data(MAX_LEN, MAX_WORDS)
pipeline = fe.Pipeline(train_data=train_data,
eval_data=eval_data,
batch_size=batch_size,
ops=Reshape(1, inputs="y", outputs="y"))
Step 2: Create a model
and FastEstimator Network
¶
First, we have to define the neural network architecture, and then pass the definition, associated model name, and optimizer into fe.build:
class ReviewSentiment(nn.Module):
def __init__(self, embedding_size=64, hidden_units=64):
super().__init__()
self.embedding = nn.Embedding(MAX_WORDS, embedding_size)
self.conv1d = nn.Conv1d(in_channels=64, out_channels=32, kernel_size=3, padding=1)
self.maxpool1d = nn.MaxPool1d(kernel_size=4)
self.lstm = nn.LSTM(input_size=125, hidden_size=hidden_units, num_layers=1)
self.fc1 = nn.Linear(in_features=hidden_units, out_features=250)
self.fc2 = nn.Linear(in_features=250, out_features=1)
def forward(self, x):
x = self.embedding(x)
x = x.permute((0, 2, 1))
x = self.conv1d(x)
x = torch.relu(x)
x = self.maxpool1d(x)
output, _ = self.lstm(x)
x = output[:, -1] # sequence output of only last timestamp
x = torch.tanh(x)
x = self.fc1(x)
x = torch.relu(x)
x = self.fc2(x)
x = torch.sigmoid(x)
return x
Network
is the object that defines the whole training graph, including models, loss functions, optimizers etc. A Network
can have several different models and loss functions (ex. GANs). fe.Network
takes a series of operators, in this case just the basic ModelOp
, loss op, and UpdateOp
will suffice. It should be noted that "y_pred" is the key in the data dictionary which will store the predictions.
model = fe.build(model_fn=lambda: ReviewSentiment(), optimizer_fn="adam")
network = fe.Network(ops=[
ModelOp(model=model, inputs="x", outputs="y_pred"),
CrossEntropy(inputs=("y_pred", "y"), outputs="loss"),
UpdateOp(model=model, loss_name="loss")
])
Step 3: Prepare Estimator
and configure the training loop¶
Estimator
is the API that wraps the Pipeline
, Network
and other training metadata together. Estimator
also contains Traces
, which are similar to the callbacks of Keras.
In the training loop, we want to measure the validation loss and save the model that has the minimum loss. BestModelSaver
is a convenient Trace
to achieve this. Let's also measure accuracy over time using another Trace
:
model_dir = tempfile.mkdtemp()
traces = [Accuracy(true_key="y", pred_key="y_pred"), BestModelSaver(model=model, save_dir=model_dir)]
estimator = fe.Estimator(network=network,
pipeline=pipeline,
epochs=epochs,
traces=traces,
train_steps_per_epoch=train_steps_per_epoch,
eval_steps_per_epoch=eval_steps_per_epoch)
Training
estimator.fit()
______ __ ______ __ _ __ / ____/___ ______/ /_/ ____/____/ /_(_)___ ___ ____ _/ /_____ _____ / /_ / __ `/ ___/ __/ __/ / ___/ __/ / __ `__ \/ __ `/ __/ __ \/ ___/ / __/ / /_/ (__ ) /_/ /___(__ ) /_/ / / / / / / /_/ / /_/ /_/ / / /_/ \__,_/____/\__/_____/____/\__/_/_/ /_/ /_/\__,_/\__/\____/_/ FastEstimator-Start: step: 1; logging_interval: 100; num_device: 0; FastEstimator-Train: step: 1; loss: 0.6982045; FastEstimator-Train: step: 100; loss: 0.69076145; steps/sec: 4.55; FastEstimator-Train: step: 200; loss: 0.6970146; steps/sec: 5.49; FastEstimator-Train: step: 300; loss: 0.67406845; steps/sec: 5.6; FastEstimator-Train: step: 358; epoch: 1; epoch_time: 69.22 sec; FastEstimator-BestModelSaver: Saved model to /var/folders/lx/drkxftt117gblvgsp1p39rlc0000gn/T/tmpds6dz9wa/model_best_loss.pt FastEstimator-Eval: step: 358; epoch: 1; accuracy: 0.6826793843485801; loss: 0.59441286; min_loss: 0.59441286; since_best_loss: 0; FastEstimator-Train: step: 400; loss: 0.579373; steps/sec: 5.39; FastEstimator-Train: step: 500; loss: 0.5601772; steps/sec: 4.79; FastEstimator-Train: step: 600; loss: 0.3669433; steps/sec: 5.2; FastEstimator-Train: step: 700; loss: 0.5050458; steps/sec: 4.86; FastEstimator-Train: step: 716; epoch: 2; epoch_time: 71.36 sec; FastEstimator-BestModelSaver: Saved model to /var/folders/lx/drkxftt117gblvgsp1p39rlc0000gn/T/tmpds6dz9wa/model_best_loss.pt FastEstimator-Eval: step: 716; epoch: 2; accuracy: 0.7672230652503793; loss: 0.48858097; min_loss: 0.48858097; since_best_loss: 0; FastEstimator-Train: step: 800; loss: 0.43962425; steps/sec: 5.57; FastEstimator-Train: step: 900; loss: 0.33729357; steps/sec: 5.71; FastEstimator-Train: step: 1000; loss: 0.31596264; steps/sec: 5.23; FastEstimator-Train: step: 1074; epoch: 3; epoch_time: 77.79 sec; FastEstimator-BestModelSaver: Saved model to /var/folders/lx/drkxftt117gblvgsp1p39rlc0000gn/T/tmpds6dz9wa/model_best_loss.pt FastEstimator-Eval: step: 1074; epoch: 3; accuracy: 0.8103186646433991; loss: 0.4192897; min_loss: 0.4192897; since_best_loss: 0; FastEstimator-Train: step: 1100; loss: 0.33041656; steps/sec: 3.22; FastEstimator-Train: step: 1200; loss: 0.41677344; steps/sec: 5.75; FastEstimator-Train: step: 1300; loss: 0.43493804; steps/sec: 5.68; FastEstimator-Train: step: 1400; loss: 0.26938343; steps/sec: 5.34; FastEstimator-Train: step: 1432; epoch: 4; epoch_time: 64.02 sec; FastEstimator-BestModelSaver: Saved model to /var/folders/lx/drkxftt117gblvgsp1p39rlc0000gn/T/tmpds6dz9wa/model_best_loss.pt FastEstimator-Eval: step: 1432; epoch: 4; accuracy: 0.823845653587687; loss: 0.3995199; min_loss: 0.3995199; since_best_loss: 0; FastEstimator-Train: step: 1500; loss: 0.323763; steps/sec: 5.76; FastEstimator-Train: step: 1600; loss: 0.21561582; steps/sec: 5.84; FastEstimator-Train: step: 1700; loss: 0.20746922; steps/sec: 5.59; FastEstimator-Train: step: 1790; epoch: 5; epoch_time: 63.49 sec; FastEstimator-Eval: step: 1790; epoch: 5; accuracy: 0.8291784088445697; loss: 0.4008124; min_loss: 0.3995199; since_best_loss: 1; FastEstimator-Train: step: 1800; loss: 0.2219275; steps/sec: 5.12; FastEstimator-Train: step: 1900; loss: 0.2188505; steps/sec: 5.11; FastEstimator-Train: step: 2000; loss: 0.14373234; steps/sec: 5.53; FastEstimator-Train: step: 2100; loss: 0.20883155; steps/sec: 1.96; FastEstimator-Train: step: 2148; epoch: 6; epoch_time: 100.15 sec; FastEstimator-Eval: step: 2148; epoch: 6; accuracy: 0.8313461955343594; loss: 0.41437832; min_loss: 0.3995199; since_best_loss: 2; FastEstimator-Train: step: 2200; loss: 0.20082837; steps/sec: 5.64; FastEstimator-Train: step: 2300; loss: 0.22870378; steps/sec: 5.65; FastEstimator-Train: step: 2400; loss: 0.28569937; steps/sec: 5.7; FastEstimator-Train: step: 2500; loss: 0.16878708; steps/sec: 5.69; FastEstimator-Train: step: 2506; epoch: 7; epoch_time: 63.07 sec; FastEstimator-Eval: step: 2506; epoch: 7; accuracy: 0.8314762627357468; loss: 0.42922923; min_loss: 0.3995199; since_best_loss: 3; FastEstimator-Train: step: 2600; loss: 0.20338291; steps/sec: 5.77; FastEstimator-Train: step: 2700; loss: 0.17639604; steps/sec: 5.68; FastEstimator-Train: step: 2800; loss: 0.12155069; steps/sec: 5.7; FastEstimator-Train: step: 2864; epoch: 8; epoch_time: 62.75 sec; FastEstimator-Eval: step: 2864; epoch: 8; accuracy: 0.8294818989811402; loss: 0.46396694; min_loss: 0.3995199; since_best_loss: 4; FastEstimator-Train: step: 2900; loss: 0.20103803; steps/sec: 5.34; FastEstimator-Train: step: 3000; loss: 0.10518805; steps/sec: 5.71; FastEstimator-Train: step: 3100; loss: 0.10425654; steps/sec: 5.64; FastEstimator-Train: step: 3200; loss: 0.13740686; steps/sec: 5.5; FastEstimator-Train: step: 3222; epoch: 9; epoch_time: 64.67 sec; FastEstimator-Eval: step: 3222; epoch: 9; accuracy: 0.8254498157381314; loss: 0.5149529; min_loss: 0.3995199; since_best_loss: 5; FastEstimator-Train: step: 3300; loss: 0.080922514; steps/sec: 5.77; FastEstimator-Train: step: 3400; loss: 0.088989146; steps/sec: 5.41; FastEstimator-Train: step: 3500; loss: 0.1620798; steps/sec: 5.3; FastEstimator-Train: step: 3580; epoch: 10; epoch_time: 64.87 sec; FastEstimator-Eval: step: 3580; epoch: 10; accuracy: 0.8214177324951225; loss: 0.5555562; min_loss: 0.3995199; since_best_loss: 6; FastEstimator-Finish: step: 3580; model_lr: 0.001; total_time: 1124.45 sec;
Inferencing
For inferencing, first we have to load the trained model weights. We previously saved model weights corresponding to our minimum loss, and now we will load the weights using load_model()
:
model_name = 'model_best_loss.pt'
model_path = os.path.join(model_dir, model_name)
load_model(model, model_path)
Let's get some random sequence and compare the prediction with the ground truth:
selected_idx = np.random.randint(10000)
print("Ground truth is: ",eval_data[selected_idx]['y'])
Ground truth is: 1
Create data dictionary for the inference. The Transform()
function in Pipeline and Network applies all the operations on the given data:
infer_data = {"x":eval_data[selected_idx]['x'], "y":eval_data[selected_idx]['y']}
data = pipeline.transform(infer_data, mode="infer")
data = network.transform(data, mode="infer")
Finally, print the inferencing results.
print("Prediction for the input sequence: ", np.array(data["y_pred"])[0][0])
Prediction for the input sequence: 0.91389465
Using your own dataset¶
This example can be used for any custom dataset that requires a sequence-to-vector
task. The Pipeline
in this code example assumes a tokenized sentence (every word represented by an index) in each sample with a fixed length (0-padded).
If you have a tokenized dataset already, you can create a Dataset
class that produces an output similar to the code example. If your dataset is not yet tokenized (aka in words), you can use a Tokenize
Operator similar to This example.