Sentiment Prediction in IMDB Reviews using an LSTM¶

[Notebook] [TF Implementation] [Torch Implementation]

In [1]:

Copied!





import tempfile
import os
import numpy as np
import torch
import torch.nn as nn
import fastestimator as fe
from fastestimator.dataset.data import imdb_review
from fastestimator.op.numpyop.univariate.reshape import Reshape
from fastestimator.op.tensorop.loss import CrossEntropy
from fastestimator.op.tensorop.model import ModelOp, UpdateOp
from fastestimator.trace.io import BestModelSaver
from fastestimator.trace.metric import Accuracy
from fastestimator.backend import load_model
import tempfile
import os
import numpy as np
import torch
import torch.nn as nn
import fastestimator as fe
from fastestimator.dataset.data import imdb_review
from fastestimator.op.numpyop.univariate.reshape import Reshape
from fastestimator.op.tensorop.loss import CrossEntropy
from fastestimator.op.tensorop.model import ModelOp, UpdateOp
from fastestimator.trace.io import BestModelSaver
from fastestimator.trace.metric import Accuracy
from fastestimator.backend import load_model

In [2]:

parameters

Copied!





MAX_WORDS = 10000
MAX_LEN = 500
batch_size = 64
epochs = 10
train_steps_per_epoch = None
eval_steps_per_epoch = None
MAX_WORDS = 10000
MAX_LEN = 500
batch_size = 64
epochs = 10
train_steps_per_epoch = None
eval_steps_per_epoch = None

Building components

Step 1: Prepare training & evaluation data and define a `Pipeline`¶

We are loading the dataset from tf.keras.datasets.imdb which contains movie reviews and sentiment scores. All the words have been replaced with the integers that specifies the popularity of the word in corpus. To ensure all the sequences are of same length we need to pad the input sequences before defining the Pipeline.

In [3]:

Copied!





train_data, eval_data = imdb_review.load_data(MAX_LEN, MAX_WORDS)
pipeline = fe.Pipeline(train_data=train_data,
                       eval_data=eval_data,
                       batch_size=batch_size,
                       ops=Reshape(1, inputs="y", outputs="y"))
train_data, eval_data = imdb_review.load_data(MAX_LEN, MAX_WORDS)
pipeline = fe.Pipeline(train_data=train_data,
                       eval_data=eval_data,
                       batch_size=batch_size,
                       ops=Reshape(1, inputs="y", outputs="y"))

Step 2: Create a `model` and FastEstimator `Network`¶

First, we have to define the neural network architecture, and then pass the definition, associated model name, and optimizer into fe.build:

In [4]:

Copied!





class ReviewSentiment(nn.Module):
    def __init__(self, embedding_size=64, hidden_units=64):
        super().__init__()
        self.embedding = nn.Embedding(MAX_WORDS, embedding_size)
        self.conv1d = nn.Conv1d(in_channels=64, out_channels=32, kernel_size=3, padding=1)
        self.maxpool1d = nn.MaxPool1d(kernel_size=4)
        self.lstm = nn.LSTM(input_size=125, hidden_size=hidden_units, num_layers=1)
        self.fc1 = nn.Linear(in_features=hidden_units, out_features=250)
        self.fc2 = nn.Linear(in_features=250, out_features=1)

    def forward(self, x):
        x = self.embedding(x)
        x = x.permute((0, 2, 1))
        x = self.conv1d(x)
        x = torch.relu(x)
        x = self.maxpool1d(x)
        output, _ = self.lstm(x)
        x = output[:, -1]  # sequence output of only last timestamp
        x = torch.tanh(x)
        x = self.fc1(x)
        x = torch.relu(x)
        x = self.fc2(x)
        x = torch.sigmoid(x)
        return x
class ReviewSentiment(nn.Module):
    def __init__(self, embedding_size=64, hidden_units=64):
        super().__init__()
        self.embedding = nn.Embedding(MAX_WORDS, embedding_size)
        self.conv1d = nn.Conv1d(in_channels=64, out_channels=32, kernel_size=3, padding=1)
        self.maxpool1d = nn.MaxPool1d(kernel_size=4)
        self.lstm = nn.LSTM(input_size=125, hidden_size=hidden_units, num_layers=1)
        self.fc1 = nn.Linear(in_features=hidden_units, out_features=250)
        self.fc2 = nn.Linear(in_features=250, out_features=1)

    def forward(self, x):
        x = self.embedding(x)
        x = x.permute((0, 2, 1))
        x = self.conv1d(x)
        x = torch.relu(x)
        x = self.maxpool1d(x)
        output, _ = self.lstm(x)
        x = output[:, -1]  # sequence output of only last timestamp
        x = torch.tanh(x)
        x = self.fc1(x)
        x = torch.relu(x)
        x = self.fc2(x)
        x = torch.sigmoid(x)
        return x

Network is the object that defines the whole training graph, including models, loss functions, optimizers etc. A Network can have several different models and loss functions (ex. GANs). fe.Network takes a series of operators, in this case just the basic ModelOp, loss op, and UpdateOp will suffice. It should be noted that "y_pred" is the key in the data dictionary which will store the predictions.

In [5]:

Copied!





model = fe.build(model_fn=lambda: ReviewSentiment(), optimizer_fn="adam")
network = fe.Network(ops=[
    ModelOp(model=model, inputs="x", outputs="y_pred"),
    CrossEntropy(inputs=("y_pred", "y"), outputs="loss"),
    UpdateOp(model=model, loss_name="loss")
])
model = fe.build(model_fn=lambda: ReviewSentiment(), optimizer_fn="adam")
network = fe.Network(ops=[
    ModelOp(model=model, inputs="x", outputs="y_pred"),
    CrossEntropy(inputs=("y_pred", "y"), outputs="loss"),
    UpdateOp(model=model, loss_name="loss")
])

Step 3: Prepare `Estimator` and configure the training loop¶

Estimator is the API that wraps the Pipeline, Network and other training metadata together. Estimator also contains Traces, which are similar to the callbacks of Keras.

In the training loop, we want to measure the validation loss and save the model that has the minimum loss. BestModelSaver is a convenient Trace to achieve this. Let's also measure accuracy over time using another Trace:

In [6]:

Copied!





model_dir = tempfile.mkdtemp()
traces = [Accuracy(true_key="y", pred_key="y_pred"), BestModelSaver(model=model, save_dir=model_dir)]
estimator = fe.Estimator(network=network,
                         pipeline=pipeline,
                         epochs=epochs,
                         traces=traces,
                         train_steps_per_epoch=train_steps_per_epoch,
                         eval_steps_per_epoch=eval_steps_per_epoch)
model_dir = tempfile.mkdtemp()
traces = [Accuracy(true_key="y", pred_key="y_pred"), BestModelSaver(model=model, save_dir=model_dir)]
estimator = fe.Estimator(network=network,
                         pipeline=pipeline,
                         epochs=epochs,
                         traces=traces,
                         train_steps_per_epoch=train_steps_per_epoch,
                         eval_steps_per_epoch=eval_steps_per_epoch)

Training

In [7]:

Copied!

estimator.fit()
estimator.fit()

    ______           __  ______     __  _                 __            
   / ____/___ ______/ /_/ ____/____/ /_(_)___ ___  ____ _/ /_____  _____
  / /_  / __ `/ ___/ __/ __/ / ___/ __/ / __ `__ \/ __ `/ __/ __ \/ ___/
 / __/ / /_/ (__  ) /_/ /___(__  ) /_/ / / / / / / /_/ / /_/ /_/ / /    
/_/    \__,_/____/\__/_____/____/\__/_/_/ /_/ /_/\__,_/\__/\____/_/     
                                                                        

FastEstimator-Start: step: 1; logging_interval: 100; num_device: 0;
FastEstimator-Train: step: 1; loss: 0.6982045;
FastEstimator-Train: step: 100; loss: 0.69076145; steps/sec: 4.55;
FastEstimator-Train: step: 200; loss: 0.6970146; steps/sec: 5.49;
FastEstimator-Train: step: 300; loss: 0.67406845; steps/sec: 5.6;
FastEstimator-Train: step: 358; epoch: 1; epoch_time: 69.22 sec;
FastEstimator-BestModelSaver: Saved model to /var/folders/lx/drkxftt117gblvgsp1p39rlc0000gn/T/tmpds6dz9wa/model_best_loss.pt
FastEstimator-Eval: step: 358; epoch: 1; accuracy: 0.6826793843485801; loss: 0.59441286; min_loss: 0.59441286; since_best_loss: 0;
FastEstimator-Train: step: 400; loss: 0.579373; steps/sec: 5.39;
FastEstimator-Train: step: 500; loss: 0.5601772; steps/sec: 4.79;
FastEstimator-Train: step: 600; loss: 0.3669433; steps/sec: 5.2;
FastEstimator-Train: step: 700; loss: 0.5050458; steps/sec: 4.86;
FastEstimator-Train: step: 716; epoch: 2; epoch_time: 71.36 sec;
FastEstimator-BestModelSaver: Saved model to /var/folders/lx/drkxftt117gblvgsp1p39rlc0000gn/T/tmpds6dz9wa/model_best_loss.pt
FastEstimator-Eval: step: 716; epoch: 2; accuracy: 0.7672230652503793; loss: 0.48858097; min_loss: 0.48858097; since_best_loss: 0;
FastEstimator-Train: step: 800; loss: 0.43962425; steps/sec: 5.57;
FastEstimator-Train: step: 900; loss: 0.33729357; steps/sec: 5.71;
FastEstimator-Train: step: 1000; loss: 0.31596264; steps/sec: 5.23;
FastEstimator-Train: step: 1074; epoch: 3; epoch_time: 77.79 sec;
FastEstimator-BestModelSaver: Saved model to /var/folders/lx/drkxftt117gblvgsp1p39rlc0000gn/T/tmpds6dz9wa/model_best_loss.pt
FastEstimator-Eval: step: 1074; epoch: 3; accuracy: 0.8103186646433991; loss: 0.4192897; min_loss: 0.4192897; since_best_loss: 0;
FastEstimator-Train: step: 1100; loss: 0.33041656; steps/sec: 3.22;
FastEstimator-Train: step: 1200; loss: 0.41677344; steps/sec: 5.75;
FastEstimator-Train: step: 1300; loss: 0.43493804; steps/sec: 5.68;
FastEstimator-Train: step: 1400; loss: 0.26938343; steps/sec: 5.34;
FastEstimator-Train: step: 1432; epoch: 4; epoch_time: 64.02 sec;
FastEstimator-BestModelSaver: Saved model to /var/folders/lx/drkxftt117gblvgsp1p39rlc0000gn/T/tmpds6dz9wa/model_best_loss.pt
FastEstimator-Eval: step: 1432; epoch: 4; accuracy: 0.823845653587687; loss: 0.3995199; min_loss: 0.3995199; since_best_loss: 0;
FastEstimator-Train: step: 1500; loss: 0.323763; steps/sec: 5.76;
FastEstimator-Train: step: 1600; loss: 0.21561582; steps/sec: 5.84;
FastEstimator-Train: step: 1700; loss: 0.20746922; steps/sec: 5.59;
FastEstimator-Train: step: 1790; epoch: 5; epoch_time: 63.49 sec;
FastEstimator-Eval: step: 1790; epoch: 5; accuracy: 0.8291784088445697; loss: 0.4008124; min_loss: 0.3995199; since_best_loss: 1;
FastEstimator-Train: step: 1800; loss: 0.2219275; steps/sec: 5.12;
FastEstimator-Train: step: 1900; loss: 0.2188505; steps/sec: 5.11;
FastEstimator-Train: step: 2000; loss: 0.14373234; steps/sec: 5.53;
FastEstimator-Train: step: 2100; loss: 0.20883155; steps/sec: 1.96;
FastEstimator-Train: step: 2148; epoch: 6; epoch_time: 100.15 sec;
FastEstimator-Eval: step: 2148; epoch: 6; accuracy: 0.8313461955343594; loss: 0.41437832; min_loss: 0.3995199; since_best_loss: 2;
FastEstimator-Train: step: 2200; loss: 0.20082837; steps/sec: 5.64;
FastEstimator-Train: step: 2300; loss: 0.22870378; steps/sec: 5.65;
FastEstimator-Train: step: 2400; loss: 0.28569937; steps/sec: 5.7;
FastEstimator-Train: step: 2500; loss: 0.16878708; steps/sec: 5.69;
FastEstimator-Train: step: 2506; epoch: 7; epoch_time: 63.07 sec;
FastEstimator-Eval: step: 2506; epoch: 7; accuracy: 0.8314762627357468; loss: 0.42922923; min_loss: 0.3995199; since_best_loss: 3;
FastEstimator-Train: step: 2600; loss: 0.20338291; steps/sec: 5.77;
FastEstimator-Train: step: 2700; loss: 0.17639604; steps/sec: 5.68;
FastEstimator-Train: step: 2800; loss: 0.12155069; steps/sec: 5.7;
FastEstimator-Train: step: 2864; epoch: 8; epoch_time: 62.75 sec;
FastEstimator-Eval: step: 2864; epoch: 8; accuracy: 0.8294818989811402; loss: 0.46396694; min_loss: 0.3995199; since_best_loss: 4;
FastEstimator-Train: step: 2900; loss: 0.20103803; steps/sec: 5.34;
FastEstimator-Train: step: 3000; loss: 0.10518805; steps/sec: 5.71;
FastEstimator-Train: step: 3100; loss: 0.10425654; steps/sec: 5.64;
FastEstimator-Train: step: 3200; loss: 0.13740686; steps/sec: 5.5;
FastEstimator-Train: step: 3222; epoch: 9; epoch_time: 64.67 sec;
FastEstimator-Eval: step: 3222; epoch: 9; accuracy: 0.8254498157381314; loss: 0.5149529; min_loss: 0.3995199; since_best_loss: 5;
FastEstimator-Train: step: 3300; loss: 0.080922514; steps/sec: 5.77;
FastEstimator-Train: step: 3400; loss: 0.088989146; steps/sec: 5.41;
FastEstimator-Train: step: 3500; loss: 0.1620798; steps/sec: 5.3;
FastEstimator-Train: step: 3580; epoch: 10; epoch_time: 64.87 sec;
FastEstimator-Eval: step: 3580; epoch: 10; accuracy: 0.8214177324951225; loss: 0.5555562; min_loss: 0.3995199; since_best_loss: 6;
FastEstimator-Finish: step: 3580; model_lr: 0.001; total_time: 1124.45 sec;

Inferencing

For inferencing, first we have to load the trained model weights. We previously saved model weights corresponding to our minimum loss, and now we will load the weights using load_model():

In [8]:

Copied!

model_name = 'model_best_loss.pt'
model_path = os.path.join(model_dir, model_name)
load_model(model, model_path)
model_name = 'model_best_loss.pt'
model_path = os.path.join(model_dir, model_name)
load_model(model, model_path)

Let's get some random sequence and compare the prediction with the ground truth:

In [9]:

Copied!

selected_idx = np.random.randint(10000)
print("Ground truth is: ",eval_data[selected_idx]['y'])
selected_idx = np.random.randint(10000)
print("Ground truth is: ",eval_data[selected_idx]['y'])

Ground truth is:  1

Create data dictionary for the inference. The Transform() function in Pipeline and Network applies all the operations on the given data:

In [10]:

Copied!

infer_data = {"x":eval_data[selected_idx]['x'], "y":eval_data[selected_idx]['y']}
data = pipeline.transform(infer_data, mode="infer")
data = network.transform(data, mode="infer")
infer_data = {"x":eval_data[selected_idx]['x'], "y":eval_data[selected_idx]['y']}
data = pipeline.transform(infer_data, mode="infer")
data = network.transform(data, mode="infer")

Finally, print the inferencing results.

In [11]:

Copied!

print("Prediction for the input sequence: ", np.array(data["y_pred"])[0][0])
print("Prediction for the input sequence: ", np.array(data["y_pred"])[0][0])

Prediction for the input sequence:  0.91389465

Using your own dataset¶

This example can be used for any custom dataset that requires a sequence-to-vector task. The Pipeline in this code example assumes a tokenized sentence (every word represented by an index) in each sample with a fixed length (0-padded).

If you have a tokenized dataset already, you can create a Dataset class that produces an output similar to the code example. If your dataset is not yet tokenized (aka in words), you can use a Tokenize Operator similar to This example.