Advanced Tutorial 6: Summary¶

Overview¶

In this tutorial, we will discuss the following topics:

Experiment Logging
Experiment Summaries
Log Parsing
Summary Visualization
TensorBoard Visualization

Preliminary Setup¶

We will first set up a basic MNIST example for the rest of the demonstrations:

In [1]:

Copied!





import fastestimator as fe
from fastestimator.architecture.tensorflow import LeNet
from fastestimator.dataset.data import mnist
from fastestimator.op.numpyop.univariate import ExpandDims, Minmax
from fastestimator.op.tensorop.loss import CrossEntropy
from fastestimator.op.tensorop.model import ModelOp, UpdateOp
from fastestimator.schedule import cosine_decay
from fastestimator.trace.adapt import LRScheduler
from fastestimator.trace.metric import Accuracy
from fastestimator.trace.io import TensorBoard

train_data, eval_data = mnist.load_data()
test_data = eval_data.split(0.5)
pipeline = fe.Pipeline(train_data=train_data,
                       eval_data=eval_data,
                       test_data=test_data,
                       batch_size=32,
                       ops=[ExpandDims(inputs="x", outputs="x"), Minmax(inputs="x", outputs="x")])

model = fe.build(model_fn=LeNet, optimizer_fn="adam")
network = fe.Network(ops=[
    ModelOp(model=model, inputs="x", outputs="y_pred"),
    CrossEntropy(inputs=("y_pred", "y"), outputs="ce"),
    UpdateOp(model=model, loss_name="ce")
])

traces = [
    Accuracy(true_key="y", pred_key="y_pred"),
    LRScheduler(model=model, lr_fn=lambda step: cosine_decay(step, cycle_length=3750, init_lr=1e-3))
]
import fastestimator as fe
from fastestimator.architecture.tensorflow import LeNet
from fastestimator.dataset.data import mnist
from fastestimator.op.numpyop.univariate import ExpandDims, Minmax
from fastestimator.op.tensorop.loss import CrossEntropy
from fastestimator.op.tensorop.model import ModelOp, UpdateOp
from fastestimator.schedule import cosine_decay
from fastestimator.trace.adapt import LRScheduler
from fastestimator.trace.metric import Accuracy
from fastestimator.trace.io import TensorBoard

train_data, eval_data = mnist.load_data()
test_data = eval_data.split(0.5)
pipeline = fe.Pipeline(train_data=train_data,
                       eval_data=eval_data,
                       test_data=test_data,
                       batch_size=32,
                       ops=[ExpandDims(inputs="x", outputs="x"), Minmax(inputs="x", outputs="x")])

model = fe.build(model_fn=LeNet, optimizer_fn="adam")
network = fe.Network(ops=[
    ModelOp(model=model, inputs="x", outputs="y_pred"),
    CrossEntropy(inputs=("y_pred", "y"), outputs="ce"),
    UpdateOp(model=model, loss_name="ce")
])

traces = [
    Accuracy(true_key="y", pred_key="y_pred"),
    LRScheduler(model=model, lr_fn=lambda step: cosine_decay(step, cycle_length=3750, init_lr=1e-3))
]

Experiment Logging¶

As you may have noticed if you have used FastEstimator, log messages are printed to the screen during training. If you want to persist these log messages for later records, you can simply pipe them into a file when launching training from the command line, or else just copy and paste the messages from the console into a persistent file on the disk. FastEstimator allows logging to be controlled via arguments passed to the Estimator class, as described in the tutorial 7 in the beginner section. Let's see an example logging every 120 steps:

In [2]:

Copied!

est = fe.Estimator(pipeline=pipeline, network=network, epochs=1, traces=traces, log_steps=120)
est.fit()
est = fe.Estimator(pipeline=pipeline, network=network, epochs=1, traces=traces, log_steps=120)
est.fit()

    ______           __  ______     __  _                 __            
   / ____/___ ______/ /_/ ____/____/ /_(_)___ ___  ____ _/ /_____  _____
  / /_  / __ `/ ___/ __/ __/ / ___/ __/ / __ `__ \/ __ `/ __/ __ \/ ___/
 / __/ / /_/ (__  ) /_/ /___(__  ) /_/ / / / / / / /_/ / /_/ /_/ / /    
/_/    \__,_/____/\__/_____/____/\__/_/_/ /_/ /_/\__,_/\__/\____/_/     
                                                                        

FastEstimator-Warn: No ModelSaver Trace detected. Models will not be saved.
FastEstimator-Start: step: 1; num_device: 0; logging_interval: 120; 
FastEstimator-Train: step: 1; ce: 2.310556; model_lr: 0.001; 
FastEstimator-Train: step: 120; ce: 0.37298; steps/sec: 144.44; model_lr: 0.0009975198; 
FastEstimator-Train: step: 240; ce: 0.23124042; steps/sec: 137.05; model_lr: 0.000990021; 
FastEstimator-Train: step: 360; ce: 0.025805598; steps/sec: 126.34; model_lr: 0.0009775789; 
FastEstimator-Train: step: 480; ce: 0.053540815; steps/sec: 115.62; model_lr: 0.0009603194; 
FastEstimator-Train: step: 600; ce: 0.124904916; steps/sec: 98.48; model_lr: 0.00093841663; 
FastEstimator-Train: step: 720; ce: 0.069644645; steps/sec: 102.84; model_lr: 0.00091209175; 
FastEstimator-Train: step: 840; ce: 0.054759175; steps/sec: 100.21; model_lr: 0.0008816107; 
FastEstimator-Train: step: 960; ce: 0.014191106; steps/sec: 90.53; model_lr: 0.00084728113; 
FastEstimator-Train: step: 1080; ce: 0.15409154; steps/sec: 84.12; model_lr: 0.0008094498; 
FastEstimator-Train: step: 1200; ce: 0.021797167; steps/sec: 81.62; model_lr: 0.0007684987; 
FastEstimator-Train: step: 1320; ce: 0.018265918; steps/sec: 81.65; model_lr: 0.0007248414; 
FastEstimator-Train: step: 1440; ce: 0.0836072; steps/sec: 79.25; model_lr: 0.0006789187; 
FastEstimator-Train: step: 1560; ce: 0.0046536885; steps/sec: 66.7; model_lr: 0.00063119427; 
FastEstimator-Train: step: 1680; ce: 0.0028682733; steps/sec: 82.64; model_lr: 0.0005821501; 
FastEstimator-Train: step: 1800; ce: 0.075449295; steps/sec: 77.96; model_lr: 0.00053228147; 
FastEstimator-Train: step: 1875; epoch: 1; epoch_time: 22.39 sec; 
FastEstimator-Eval: step: 1875; epoch: 1; ce: 0.040944144; accuracy: 0.9868; 
FastEstimator-Finish: step: 1875; total_time: 23.1 sec; model_lr: 0.0005009185;

Experiment Summaries¶

Having log messages on the screen can be handy, but what if you want to access these messages within python? Enter the Summary class. Summary objects contain information about the training over time, and will be automatically generated when the Estimator fit() method is invoked with an experiment name:

In [3]:

Copied!

est = fe.Estimator(pipeline=pipeline, network=network, epochs=1, traces=traces, log_steps=500)
summary = est.fit("experiment1")
est = fe.Estimator(pipeline=pipeline, network=network, epochs=1, traces=traces, log_steps=500)
summary = est.fit("experiment1")

    ______           __  ______     __  _                 __            
   / ____/___ ______/ /_/ ____/____/ /_(_)___ ___  ____ _/ /_____  _____
  / /_  / __ `/ ___/ __/ __/ / ___/ __/ / __ `__ \/ __ `/ __/ __ \/ ___/
 / __/ / /_/ (__  ) /_/ /___(__  ) /_/ / / / / / / /_/ / /_/ /_/ / /    
/_/    \__,_/____/\__/_____/____/\__/_/_/ /_/ /_/\__,_/\__/\____/_/     
                                                                        

FastEstimator-Warn: No ModelSaver Trace detected. Models will not be saved.
FastEstimator-Start: step: 1; num_device: 0; logging_interval: 500; 
FastEstimator-Train: step: 1; ce: 0.0035908362; model_lr: 0.001; 
FastEstimator-Train: step: 500; ce: 0.2240004; steps/sec: 167.16; model_lr: 0.000956986; 
FastEstimator-Train: step: 1000; ce: 0.017725334; steps/sec: 175.28; model_lr: 0.0008350416; 
FastEstimator-Train: step: 1500; ce: 0.033957843; steps/sec: 171.15; model_lr: 0.0006552519; 
FastEstimator-Train: step: 1875; epoch: 1; epoch_time: 11.02 sec; 
FastEstimator-Eval: step: 1875; epoch: 1; ce: 0.042691424; accuracy: 0.9878; 
FastEstimator-Finish: step: 1875; total_time: 11.62 sec; model_lr: 0.0005009185;

Lets take a look at what sort of information is contained within our Summary object:

In [4]:

Copied!

summary.name
summary.name

Out[4]:

'experiment1'

In [5]:

Copied!

summary.history
summary.history

Out[5]:

defaultdict(<function fastestimator.summary.summary.Summary.__init__.<locals>.<lambda>()>,
            {'train': defaultdict(dict,
                         {'num_device': {0: array(0)},
                          'logging_interval': {0: array(500)},
                          'ce': {1: array(0.00359084, dtype=float32),
                           500: array(0.2240004, dtype=float32),
                           1000: array(0.01772533, dtype=float32),
                           1500: array(0.03395784, dtype=float32)},
                          'model_lr': {1: array(0.001, dtype=float32),
                           500: array(0.00095699, dtype=float32),
                           1000: array(0.00083504, dtype=float32),
                           1500: array(0.00065525, dtype=float32)},
                          'steps/sec': {500: array(167.16),
                           1000: array(175.28),
                           1500: array(171.15)},
                          'epoch': {1875: 1},
                          'epoch_time': {1875: array('11.02 sec', dtype='<U9')}}),
             'eval': defaultdict(dict,
                         {'epoch': {1875: 1},
                          'ce': {1875: array(0.04269142, dtype=float32)},
                          'accuracy': {1875: array(0.9878)},
                          'total_time': {1875: array('11.62 sec', dtype='<U9')},
                          'model_lr': {1875: array(0.00050092, dtype=float32)}})})

The history field can appear a little daunting, but it is simply a dictionary laid out as follows: {mode: {key: {step: value}}}. Once you have invoked the .fit() method with an experiment name, subsequent calls to .test() will add their results into the same summary dictionary:

In [6]:

Copied!

summary = est.test()
summary = est.test()

FastEstimator-Test: step: 1875; epoch: 1; accuracy: 0.9868;

In [7]:

Copied!

summary.history
summary.history

Out[7]:

defaultdict(<function fastestimator.summary.summary.Summary.__init__.<locals>.<lambda>()>,
            {'train': defaultdict(dict,
                         {'num_device': {0: array(0)},
                          'logging_interval': {0: array(500)},
                          'ce': {1: array(0.00359084, dtype=float32),
                           500: array(0.2240004, dtype=float32),
                           1000: array(0.01772533, dtype=float32),
                           1500: array(0.03395784, dtype=float32)},
                          'model_lr': {1: array(0.001, dtype=float32),
                           500: array(0.00095699, dtype=float32),
                           1000: array(0.00083504, dtype=float32),
                           1500: array(0.00065525, dtype=float32)},
                          'steps/sec': {500: array(167.16),
                           1000: array(175.28),
                           1500: array(171.15)},
                          'epoch': {1875: 1},
                          'epoch_time': {1875: array('11.02 sec', dtype='<U9')}}),
             'eval': defaultdict(dict,
                         {'epoch': {1875: 1},
                          'ce': {1875: array(0.04269142, dtype=float32)},
                          'accuracy': {1875: array(0.9878)},
                          'total_time': {1875: array('11.62 sec', dtype='<U9')},
                          'model_lr': {1875: array(0.00050092, dtype=float32)}}),
             'test': defaultdict(dict,
                         {'epoch': {1875: 1},
                          'accuracy': {1875: array(0.9868)}})})

Even if an experiment name was not provided during the .fit() call, it may be provided during the .test() call. The resulting summary object will, however, only contain information from the Test mode.

Log Parsing¶

Suppose that you have a log file saved to disk, and you want to create an in-memory Summary representation of it. This can be done through FastEstimator logging utilities:

In [8]:

Copied!

summary = fe.summary.logs.parse_log_file(file_path="../resources/t06a_exp1.txt", file_extension=".txt")
summary = fe.summary.logs.parse_log_file(file_path="../resources/t06a_exp1.txt", file_extension=".txt")

In [9]:

Copied!

summary.name
summary.name

Out[9]:

't06a_exp1'

In [10]:

Copied!

summary.history['eval']
summary.history['eval']

Out[10]:

defaultdict(dict,
            {'epoch': {1875: 1.0, 3750: 2.0, 5625: 3.0},
             'ce': {1875: 0.03284014, 3750: 0.02343675, 5625: 0.02382297},
             'min_ce': {1875: 0.03284014, 3750: 0.02343675, 5625: 0.02343675},
             'since_best': {1875: 0.0, 3750: 0.0, 5625: 1.0},
             'accuracy': {1875: 0.9882, 3750: 0.992, 5625: 0.9922}})

Log Visualization¶

While seeing log data as numbers can be informative, visualizations of data are often more useful. FastEstimator provides several ways to visualize log data: from python using Summary objects or log files, as well as through the command line.

In [11]:

Copied!

fe.summary.logs.visualize_logs(experiments=[summary])
fe.summary.logs.visualize_logs(experiments=[summary])

No description has been provided for this image

If you are only interested in visualizing a subset of these log values, it is also possible to whitelist or blacklist values via the 'include_metrics' and 'ignore_metrics' arguments respectively:

In [12]:

Copied!

fe.summary.logs.visualize_logs(experiments=[summary], include_metrics={"accuracy", "ce"})
fe.summary.logs.visualize_logs(experiments=[summary], include_metrics={"accuracy", "ce"})

It is also possible to compare logs from different experiments, which can be especially useful when fiddling with hyper-parameter values to determine their effects on training:

In [13]:

Copied!

fe.summary.logs.parse_log_files(file_paths=["../resources/t06a_exp1.txt", "../resources/t06a_exp2.txt"], log_extension=".txt")
fe.summary.logs.parse_log_files(file_paths=["../resources/t06a_exp1.txt", "../resources/t06a_exp2.txt"], log_extension=".txt")

All of the log files within a given directory can also be compared at the same time, either by using the parse_log_dir() method or via the command line as follows: fastestimator logs --extension .txt --smooth 0 ../resources

TensorBoard¶

Of course, no modern AI framework would be complete without TensorBoard integration. In FastEstimator, all that is required to achieve TensorBoard integration is to add the TensorBoard Trace to the list of traces passed to the Estimator:

In [14]:

Copied!





import tempfile
log_dir = tempfile.mkdtemp()

pipeline = fe.Pipeline(train_data=train_data,
                       eval_data=eval_data,
                       test_data=test_data,
                       batch_size=32,
                       ops=[ExpandDims(inputs="x", outputs="x"), Minmax(inputs="x", outputs="x")], num_process=0)
model = fe.build(model_fn=LeNet, optimizer_fn="adam")
network = fe.Network(ops=[
    ModelOp(model=model, inputs="x", outputs="y_pred"),
    CrossEntropy(inputs=("y_pred", "y"), outputs="ce"),
    UpdateOp(model=model, loss_name="ce")
])
traces = [
    Accuracy(true_key="y", pred_key="y_pred"),
    LRScheduler(model=model, lr_fn=lambda step: cosine_decay(step, cycle_length=3750, init_lr=1e-3)),
    TensorBoard(log_dir=log_dir, weight_histogram_freq="epoch")
]
est = fe.Estimator(pipeline=pipeline, network=network, epochs=3, traces=traces, log_steps=1000)
est.fit()
import tempfile
log_dir = tempfile.mkdtemp()

pipeline = fe.Pipeline(train_data=train_data,
                       eval_data=eval_data,
                       test_data=test_data,
                       batch_size=32,
                       ops=[ExpandDims(inputs="x", outputs="x"), Minmax(inputs="x", outputs="x")], num_process=0)
model = fe.build(model_fn=LeNet, optimizer_fn="adam")
network = fe.Network(ops=[
    ModelOp(model=model, inputs="x", outputs="y_pred"),
    CrossEntropy(inputs=("y_pred", "y"), outputs="ce"),
    UpdateOp(model=model, loss_name="ce")
])
traces = [
    Accuracy(true_key="y", pred_key="y_pred"),
    LRScheduler(model=model, lr_fn=lambda step: cosine_decay(step, cycle_length=3750, init_lr=1e-3)),
    TensorBoard(log_dir=log_dir, weight_histogram_freq="epoch")
]
est = fe.Estimator(pipeline=pipeline, network=network, epochs=3, traces=traces, log_steps=1000)
est.fit()

    ______           __  ______     __  _                 __            
   / ____/___ ______/ /_/ ____/____/ /_(_)___ ___  ____ _/ /_____  _____
  / /_  / __ `/ ___/ __/ __/ / ___/ __/ / __ `__ \/ __ `/ __/ __ \/ ___/
 / __/ / /_/ (__  ) /_/ /___(__  ) /_/ / / / / / / /_/ / /_/ /_/ / /    
/_/    \__,_/____/\__/_____/____/\__/_/_/ /_/ /_/\__,_/\__/\____/_/     
                                                                        

FastEstimator-Warn: No ModelSaver Trace detected. Models will not be saved.
FastEstimator-Tensorboard: writing logs to /var/folders/lx/drkxftt117gblvgsp1p39rlc0000gn/T/tmpb_oy2ihe/20200504-202406
FastEstimator-Start: step: 1; num_device: 0; logging_interval: 1000; 
FastEstimator-Train: step: 1; ce: 2.296093; model1_lr: 0.001; 
FastEstimator-Train: step: 1000; ce: 0.18865156; steps/sec: 71.25; model1_lr: 0.0008350416; 
FastEstimator-Train: step: 1875; epoch: 1; epoch_time: 26.52 sec; 
FastEstimator-Eval: step: 1875; epoch: 1; ce: 0.050555836; accuracy: 0.9816; 
FastEstimator-Train: step: 2000; ce: 0.052690372; steps/sec: 70.85; model1_lr: 0.00044870423; 
FastEstimator-Train: step: 3000; ce: 0.0037323756; steps/sec: 70.63; model1_lr: 9.664212e-05; 
FastEstimator-Train: step: 3750; epoch: 2; epoch_time: 26.56 sec; 
FastEstimator-Eval: step: 3750; epoch: 2; ce: 0.030163307; accuracy: 0.99; 
FastEstimator-Train: step: 4000; ce: 0.063815504; steps/sec: 70.37; model1_lr: 0.0009891716; 
FastEstimator-Train: step: 5000; ce: 0.002615007; steps/sec: 73.63; model1_lr: 0.0007506123; 
FastEstimator-Train: step: 5625; epoch: 3; epoch_time: 25.93 sec; 
FastEstimator-Eval: step: 5625; epoch: 3; ce: 0.030318245; accuracy: 0.9902; 
FastEstimator-Finish: step: 5625; total_time: 81.43 sec; model1_lr: 0.0005009185;

Now let's launch TensorBoard to visualize our logs. Note that this call will prevent any subsequent Jupyter Notebook cells from running until you manually terminate it.

In [15]:

Copied!

#!tensorboard --reload_multifile=true --logdir /var/folders/lx/drkxftt117gblvgsp1p39rlc0000gn/T/tmpb_oy2ihe
#!tensorboard --reload_multifile=true --logdir /var/folders/lx/drkxftt117gblvgsp1p39rlc0000gn/T/tmpb_oy2ihe

The TensorBoard display should look something like this:

drawing