Adversarial Training Using the Fast Gradient Sign Method (FGSM)¶
[Paper] [Notebook] [TF Implementation] [Torch Implementation]
In this example we will demonstrate how to train a model to resist adversarial attacks constructed using the Fast Gradient Sign Method. For more background on adversarial attacks.
Import the required libraries¶
import tempfile
import os
import numpy as np
import fastestimator as fe
from fastestimator.architecture.tensorflow import LeNet
from fastestimator.backend import argmax
from fastestimator.dataset.data import cifair10
from fastestimator.op.numpyop.univariate import Normalize
from fastestimator.op.tensorop import Average
from fastestimator.op.tensorop.gradient import Watch, FGSM
from fastestimator.op.tensorop.loss import CrossEntropy
from fastestimator.op.tensorop.model import ModelOp, UpdateOp
from fastestimator.trace.io import BestModelSaver
from fastestimator.trace.metric import Accuracy
from fastestimator.util import BatchDisplay, GridDisplay, to_number
# training parameters
epsilon=0.04 # The strength of the adversarial attack
epochs=10
batch_size=50
train_steps_per_epoch=None
eval_steps_per_epoch=None
save_dir=tempfile.mkdtemp()
from fastestimator.dataset.data import cifair10
train_data, eval_data = cifair10.load_data()
test_data = eval_data.split(0.5)
Prepare the Pipeline
¶
We will use a simple pipeline that just normalizes the images
pipeline = fe.Pipeline(
train_data=train_data,
eval_data=eval_data,
test_data=test_data,
batch_size=batch_size,
ops=[
Normalize(inputs="x", outputs="x", mean=(0.4914, 0.4822, 0.4465), std=(0.2471, 0.2435, 0.2616))
])
model = fe.build(model_fn=lambda: LeNet(input_shape=(32, 32, 3)), optimizer_fn="adam", model_name="adv_model")
2023-03-22 12:21:03.745710: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2023-03-22 12:21:03.745734: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
Metal device set to: Apple M1 Max
Network
defintion¶
This is where the adversarial attack will be implemented. To perform an FGSM attack, we first need to monitor gradients with respect to the input image. This can be accomplished in FastEstimator using the Watch
TensorOp. We then will run the model forward once, compute the loss, and then pass the loss value into the FGSM
TensorOp in order to create an adversarial image. We will then run the adversarial image through the model, compute the loss again, and average the two results together in order to update the model.
network = fe.Network(ops=[
Watch(inputs="x"),
ModelOp(model=model, inputs="x", outputs="y_pred"),
CrossEntropy(inputs=("y_pred", "y"), outputs="base_ce"),
FGSM(data="x", loss="base_ce", outputs="x_adverse", epsilon=epsilon),
ModelOp(model=model, inputs="x_adverse", outputs="y_pred_adv"),
CrossEntropy(inputs=("y_pred_adv", "y"), outputs="adv_ce"),
Average(inputs=("base_ce", "adv_ce"), outputs="avg_ce"),
UpdateOp(model=model, loss_name="avg_ce")
])
Step 3 - Estimator
definition and training¶
In this step, we define the Estimator
to connect the Network
with the Pipeline
and set the traces
which will compute accuracy (Accuracy
) and save the best model (BestModelSaver
) along the way. We will compute accuracy both with respect to the clean input images ('clean accuracy') as well as with respect to the adversarial input images ('adversarial accuracy'). At the end, we use Estimator.fit
to trigger the training.
traces = [
Accuracy(true_key="y", pred_key="y_pred", output_name="clean_accuracy"),
Accuracy(true_key="y", pred_key="y_pred_adv", output_name="adversarial_accuracy"),
BestModelSaver(model=model, save_dir=save_dir, metric="base_ce", save_best_mode="min"),
]
estimator = fe.Estimator(pipeline=pipeline,
network=network,
epochs=epochs,
traces=traces,
train_steps_per_epoch=train_steps_per_epoch,
eval_steps_per_epoch=eval_steps_per_epoch,
monitor_names=["base_ce", "adv_ce"],
log_steps=1000)
estimator.fit()
______ __ ______ __ _ __ / ____/___ ______/ /_/ ____/____/ /_(_)___ ___ ____ _/ /_____ _____ / /_ / __ `/ ___/ __/ __/ / ___/ __/ / __ `__ \/ __ `/ __/ __ \/ ___/ / __/ / /_/ (__ ) /_/ /___(__ ) /_/ / / / / / / /_/ / /_/ /_/ / / /_/ \__,_/____/\__/_____/____/\__/_/_/ /_/ /_/\__,_/\__/\____/_/ FastEstimator-Start: step: 1; logging_interval: 1000; num_device: 1; WARNING:tensorflow:From /Users/skynet/.pyenv/versions/miniforge3-4.10.3-10/envs/femos38/lib/python3.8/site-packages/tensorflow/python/autograph/pyct/static_analysis/liveness.py:83: Analyzer.lamba_check (from tensorflow.python.autograph.pyct.static_analysis.liveness) is deprecated and will be removed after 2023-09-23. Instructions for updating: Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089 FastEstimator-Train: step: 1; adv_ce: 2.5932403; avg_ce: 2.500473; base_ce: 2.4077055; FastEstimator-Train: step: 1000; adv_ce: 1.8527539; avg_ce: 1.6701212; base_ce: 1.4874886; steps/sec: 100.77; FastEstimator-Train: step: 1000; epoch: 1; epoch_time(sec): 10.91; Eval Progress: 1/100; Eval Progress: 33/100; steps/sec: 133.06; Eval Progress: 66/100; steps/sec: 117.13; Eval Progress: 100/100; steps/sec: 74.83; FastEstimator-BestModelSaver: Saved model to /var/folders/lx/drkxftt117gblvgsp1p39rlc0000gn/T/tmpv5xrd_mi/adv_model_best_base_ce.h5 FastEstimator-Eval: step: 1000; epoch: 1; adv_ce: 1.6358021; adversarial_accuracy: 0.3874; avg_ce: 1.4667553; base_ce: 1.2977084; clean_accuracy: 0.543; min_base_ce: 1.2977084; since_best_base_ce: 0; FastEstimator-Train: step: 2000; adv_ce: 1.4819262; avg_ce: 1.2961853; base_ce: 1.1104442; steps/sec: 92.93; FastEstimator-Train: step: 2000; epoch: 2; epoch_time(sec): 10.78; Eval Progress: 1/100; Eval Progress: 33/100; steps/sec: 128.45; Eval Progress: 66/100; steps/sec: 130.29; Eval Progress: 100/100; steps/sec: 115.47; FastEstimator-BestModelSaver: Saved model to /var/folders/lx/drkxftt117gblvgsp1p39rlc0000gn/T/tmpv5xrd_mi/adv_model_best_base_ce.h5 FastEstimator-Eval: step: 2000; epoch: 2; adv_ce: 1.5952863; adversarial_accuracy: 0.4086; avg_ce: 1.3943998; base_ce: 1.1935132; clean_accuracy: 0.5844; min_base_ce: 1.1935132; since_best_base_ce: 0; FastEstimator-Train: step: 3000; adv_ce: 1.5492551; avg_ce: 1.3347089; base_ce: 1.1201626; steps/sec: 92.02; FastEstimator-Train: step: 3000; epoch: 3; epoch_time(sec): 10.86; Eval Progress: 1/100; Eval Progress: 33/100; steps/sec: 126.13; Eval Progress: 66/100; steps/sec: 120.81; Eval Progress: 100/100; steps/sec: 124.64; FastEstimator-BestModelSaver: Saved model to /var/folders/lx/drkxftt117gblvgsp1p39rlc0000gn/T/tmpv5xrd_mi/adv_model_best_base_ce.h5 FastEstimator-Eval: step: 3000; epoch: 3; adv_ce: 1.5346261; adversarial_accuracy: 0.4336; avg_ce: 1.3141936; base_ce: 1.093761; clean_accuracy: 0.6146; min_base_ce: 1.093761; since_best_base_ce: 0; FastEstimator-Train: step: 4000; adv_ce: 1.3029621; avg_ce: 1.08198; base_ce: 0.86099803; steps/sec: 95.21; FastEstimator-Train: step: 4000; epoch: 4; epoch_time(sec): 10.51; Eval Progress: 1/100; Eval Progress: 33/100; steps/sec: 105.34; Eval Progress: 66/100; steps/sec: 122.6; Eval Progress: 100/100; steps/sec: 123.17; FastEstimator-BestModelSaver: Saved model to /var/folders/lx/drkxftt117gblvgsp1p39rlc0000gn/T/tmpv5xrd_mi/adv_model_best_base_ce.h5 FastEstimator-Eval: step: 4000; epoch: 4; adv_ce: 1.5180533; adversarial_accuracy: 0.4364; avg_ce: 1.2781447; base_ce: 1.038236; clean_accuracy: 0.6398; min_base_ce: 1.038236; since_best_base_ce: 0; FastEstimator-Train: step: 5000; adv_ce: 1.3933566; avg_ce: 1.1443093; base_ce: 0.8952619; steps/sec: 93.05; FastEstimator-Train: step: 5000; epoch: 5; epoch_time(sec): 10.74; Eval Progress: 1/100; Eval Progress: 33/100; steps/sec: 123.29; Eval Progress: 66/100; steps/sec: 132.07; Eval Progress: 100/100; steps/sec: 116.19; FastEstimator-BestModelSaver: Saved model to /var/folders/lx/drkxftt117gblvgsp1p39rlc0000gn/T/tmpv5xrd_mi/adv_model_best_base_ce.h5 FastEstimator-Eval: step: 5000; epoch: 5; adv_ce: 1.4809489; adversarial_accuracy: 0.4582; avg_ce: 1.2281222; base_ce: 0.97529566; clean_accuracy: 0.6596; min_base_ce: 0.97529566; since_best_base_ce: 0; FastEstimator-Train: step: 6000; adv_ce: 1.3028104; avg_ce: 1.0527444; base_ce: 0.8026783; steps/sec: 94.44; FastEstimator-Train: step: 6000; epoch: 6; epoch_time(sec): 10.59; Eval Progress: 1/100; Eval Progress: 33/100; steps/sec: 110.63; Eval Progress: 66/100; steps/sec: 108.65; Eval Progress: 100/100; steps/sec: 106.96; FastEstimator-BestModelSaver: Saved model to /var/folders/lx/drkxftt117gblvgsp1p39rlc0000gn/T/tmpv5xrd_mi/adv_model_best_base_ce.h5 FastEstimator-Eval: step: 6000; epoch: 6; adv_ce: 1.4869825; adversarial_accuracy: 0.4452; avg_ce: 1.226828; base_ce: 0.9666735; clean_accuracy: 0.664; min_base_ce: 0.9666735; since_best_base_ce: 0; FastEstimator-Train: step: 7000; adv_ce: 1.3652477; avg_ce: 1.1188885; base_ce: 0.8725292; steps/sec: 93.72; FastEstimator-Train: step: 7000; epoch: 7; epoch_time(sec): 10.68; Eval Progress: 1/100; Eval Progress: 33/100; steps/sec: 116.1; Eval Progress: 66/100; steps/sec: 133.15; Eval Progress: 100/100; steps/sec: 128.46; FastEstimator-BestModelSaver: Saved model to /var/folders/lx/drkxftt117gblvgsp1p39rlc0000gn/T/tmpv5xrd_mi/adv_model_best_base_ce.h5 FastEstimator-Eval: step: 7000; epoch: 7; adv_ce: 1.4863496; adversarial_accuracy: 0.4602; avg_ce: 1.2245204; base_ce: 0.96269095; clean_accuracy: 0.6634; min_base_ce: 0.96269095; since_best_base_ce: 0; FastEstimator-Train: step: 8000; adv_ce: 1.32106; avg_ce: 1.0630496; base_ce: 0.80503905; steps/sec: 93.0; FastEstimator-Train: step: 8000; epoch: 8; epoch_time(sec): 10.75; Eval Progress: 1/100; Eval Progress: 33/100; steps/sec: 114.21; Eval Progress: 66/100; steps/sec: 121.23; Eval Progress: 100/100; steps/sec: 126.59; FastEstimator-BestModelSaver: Saved model to /var/folders/lx/drkxftt117gblvgsp1p39rlc0000gn/T/tmpv5xrd_mi/adv_model_best_base_ce.h5 FastEstimator-Eval: step: 8000; epoch: 8; adv_ce: 1.4645156; adversarial_accuracy: 0.4636; avg_ce: 1.1880443; base_ce: 0.91157293; clean_accuracy: 0.6918; min_base_ce: 0.91157293; since_best_base_ce: 0; FastEstimator-Train: step: 9000; adv_ce: 1.5193198; avg_ce: 1.2024301; base_ce: 0.8855404; steps/sec: 88.37; FastEstimator-Train: step: 9000; epoch: 9; epoch_time(sec): 11.32; Eval Progress: 1/100; Eval Progress: 33/100; steps/sec: 110.76; Eval Progress: 66/100; steps/sec: 130.51; Eval Progress: 100/100; steps/sec: 132.61; FastEstimator-Eval: step: 9000; epoch: 9; adv_ce: 1.4988724; adversarial_accuracy: 0.4574; avg_ce: 1.2090085; base_ce: 0.9191448; clean_accuracy: 0.6834; min_base_ce: 0.91157293; since_best_base_ce: 1; FastEstimator-Train: step: 10000; adv_ce: 1.451181; avg_ce: 1.1657407; base_ce: 0.8803004; steps/sec: 92.99; FastEstimator-Train: step: 10000; epoch: 10; epoch_time(sec): 10.75; Eval Progress: 1/100; Eval Progress: 33/100; steps/sec: 108.18; Eval Progress: 66/100; steps/sec: 120.07; Eval Progress: 100/100; steps/sec: 121.8; FastEstimator-BestModelSaver: Saved model to /var/folders/lx/drkxftt117gblvgsp1p39rlc0000gn/T/tmpv5xrd_mi/adv_model_best_base_ce.h5 FastEstimator-Eval: step: 10000; epoch: 10; adv_ce: 1.4784583; adversarial_accuracy: 0.4698; avg_ce: 1.1881733; base_ce: 0.89788854; clean_accuracy: 0.696; min_base_ce: 0.89788854; since_best_base_ce: 0; FastEstimator-Finish: step: 10000; adv_model_lr: 0.001; total_time(sec): 121.55;
Model Testing¶
Let's start by re-loading the weights from the best model, since the model may have overfit during training
model.load_weights(os.path.join(save_dir, "adv_model_best_base_ce.h5"))
estimator.test()
FastEstimator-Test: step: 10000; epoch: 10; adv_ce: 1.4894997; adversarial_accuracy: 0.4588; avg_ce: 1.201717; base_ce: 0.91393447; clean_accuracy: 0.6842;
In spite of our training the network using adversarially crafted images, the adversarial attack is still effective at reducing the accuracy of the network. This does not, however, mean that the efforts were wasted.
Comparison vs Network without Adversarial Training¶
To see whether training using adversarial hardening was actually useful, we will compare it to a network which is trained without considering any adversarial images. The setup will be similar to before, but we will only use the adversarial images for evaluation purposes and so the second CrossEntropy
Op as well as the Average
Op can be omitted.
clean_model = fe.build(model_fn=lambda: LeNet(input_shape=(32, 32, 3)), optimizer_fn="adam", model_name="clean_model")
clean_network = fe.Network(ops=[
Watch(inputs="x"),
ModelOp(model=clean_model, inputs="x", outputs="y_pred"),
CrossEntropy(inputs=("y_pred", "y"), outputs="base_ce"),
FGSM(data="x", loss="base_ce", outputs="x_adverse", epsilon=epsilon, mode="!train"),
ModelOp(model=clean_model, inputs="x_adverse", outputs="y_pred_adv", mode="!train"),
UpdateOp(model=clean_model, loss_name="base_ce")
])
clean_traces = [
Accuracy(true_key="y", pred_key="y_pred", output_name="clean_accuracy"),
Accuracy(true_key="y", pred_key="y_pred_adv", output_name="adversarial_accuracy"),
BestModelSaver(model=clean_model, save_dir=save_dir, metric="base_ce", save_best_mode="min"),
]
clean_estimator = fe.Estimator(pipeline=pipeline,
network=clean_network,
epochs=epochs,
traces=clean_traces,
train_steps_per_epoch=train_steps_per_epoch,
eval_steps_per_epoch=eval_steps_per_epoch,
log_steps=1000)
clean_estimator.fit()
______ __ ______ __ _ __ / ____/___ ______/ /_/ ____/____/ /_(_)___ ___ ____ _/ /_____ _____ / /_ / __ `/ ___/ __/ __/ / ___/ __/ / __ `__ \/ __ `/ __/ __ \/ ___/ / __/ / /_/ (__ ) /_/ /___(__ ) /_/ / / / / / / /_/ / /_/ /_/ / / /_/ \__,_/____/\__/_____/____/\__/_/_/ /_/ /_/\__,_/\__/\____/_/ FastEstimator-Start: step: 1; logging_interval: 1000; num_device: 1; FastEstimator-Train: step: 1; base_ce: 2.3047411; FastEstimator-Train: step: 1000; base_ce: 1.1804659; steps/sec: 150.06; FastEstimator-Train: step: 1000; epoch: 1; epoch_time(sec): 7.17; Eval Progress: 1/100; Eval Progress: 33/100; steps/sec: 104.44; Eval Progress: 66/100; steps/sec: 115.12; Eval Progress: 100/100; steps/sec: 110.92; FastEstimator-BestModelSaver: Saved model to /var/folders/lx/drkxftt117gblvgsp1p39rlc0000gn/T/tmpv5xrd_mi/clean_model_best_base_ce.h5 FastEstimator-Eval: step: 1000; epoch: 1; adversarial_accuracy: 0.3022; base_ce: 1.1240247; clean_accuracy: 0.6064; min_base_ce: 1.1240247; since_best_base_ce: 0; FastEstimator-Train: step: 2000; base_ce: 0.8876241; steps/sec: 137.26; FastEstimator-Train: step: 2000; epoch: 2; epoch_time(sec): 7.3; Eval Progress: 1/100; Eval Progress: 33/100; steps/sec: 106.95; Eval Progress: 66/100; steps/sec: 113.4; Eval Progress: 100/100; steps/sec: 127.38; FastEstimator-BestModelSaver: Saved model to /var/folders/lx/drkxftt117gblvgsp1p39rlc0000gn/T/tmpv5xrd_mi/clean_model_best_base_ce.h5 FastEstimator-Eval: step: 2000; epoch: 2; adversarial_accuracy: 0.3038; base_ce: 0.98229593; clean_accuracy: 0.6534; min_base_ce: 0.98229593; since_best_base_ce: 0; FastEstimator-Train: step: 3000; base_ce: 0.8849621; steps/sec: 132.52; FastEstimator-Train: step: 3000; epoch: 3; epoch_time(sec): 7.55; Eval Progress: 1/100; Eval Progress: 33/100; steps/sec: 103.08; Eval Progress: 66/100; steps/sec: 130.25; Eval Progress: 100/100; steps/sec: 125.19; FastEstimator-BestModelSaver: Saved model to /var/folders/lx/drkxftt117gblvgsp1p39rlc0000gn/T/tmpv5xrd_mi/clean_model_best_base_ce.h5 FastEstimator-Eval: step: 3000; epoch: 3; adversarial_accuracy: 0.2696; base_ce: 0.9033165; clean_accuracy: 0.6916; min_base_ce: 0.9033165; since_best_base_ce: 0; FastEstimator-Train: step: 4000; base_ce: 0.95111996; steps/sec: 140.33; FastEstimator-Train: step: 4000; epoch: 4; epoch_time(sec): 7.12; Eval Progress: 1/100; Eval Progress: 33/100; steps/sec: 109.92; Eval Progress: 66/100; steps/sec: 122.95; Eval Progress: 100/100; steps/sec: 128.74; FastEstimator-BestModelSaver: Saved model to /var/folders/lx/drkxftt117gblvgsp1p39rlc0000gn/T/tmpv5xrd_mi/clean_model_best_base_ce.h5 FastEstimator-Eval: step: 4000; epoch: 4; adversarial_accuracy: 0.2716; base_ce: 0.8884863; clean_accuracy: 0.691; min_base_ce: 0.8884863; since_best_base_ce: 0; FastEstimator-Train: step: 5000; base_ce: 0.8336683; steps/sec: 140.4; FastEstimator-Train: step: 5000; epoch: 5; epoch_time(sec): 7.12; Eval Progress: 1/100; Eval Progress: 33/100; steps/sec: 101.26; Eval Progress: 66/100; steps/sec: 129.9; Eval Progress: 100/100; steps/sec: 118.06; FastEstimator-BestModelSaver: Saved model to /var/folders/lx/drkxftt117gblvgsp1p39rlc0000gn/T/tmpv5xrd_mi/clean_model_best_base_ce.h5 FastEstimator-Eval: step: 5000; epoch: 5; adversarial_accuracy: 0.2754; base_ce: 0.82907814; clean_accuracy: 0.7166; min_base_ce: 0.82907814; since_best_base_ce: 0; FastEstimator-Train: step: 6000; base_ce: 0.5674667; steps/sec: 139.48; FastEstimator-Train: step: 6000; epoch: 6; epoch_time(sec): 7.17; Eval Progress: 1/100; Eval Progress: 33/100; steps/sec: 106.19; Eval Progress: 66/100; steps/sec: 132.69; Eval Progress: 100/100; steps/sec: 127.4; FastEstimator-BestModelSaver: Saved model to /var/folders/lx/drkxftt117gblvgsp1p39rlc0000gn/T/tmpv5xrd_mi/clean_model_best_base_ce.h5 FastEstimator-Eval: step: 6000; epoch: 6; adversarial_accuracy: 0.2522; base_ce: 0.81661767; clean_accuracy: 0.72; min_base_ce: 0.81661767; since_best_base_ce: 0; FastEstimator-Train: step: 7000; base_ce: 0.78401643; steps/sec: 141.15; FastEstimator-Train: step: 7000; epoch: 7; epoch_time(sec): 7.09; Eval Progress: 1/100; Eval Progress: 33/100; steps/sec: 110.63; Eval Progress: 66/100; steps/sec: 134.3; Eval Progress: 100/100; steps/sec: 132.15; FastEstimator-Eval: step: 7000; epoch: 7; adversarial_accuracy: 0.2384; base_ce: 0.8509378; clean_accuracy: 0.7172; min_base_ce: 0.81661767; since_best_base_ce: 1; FastEstimator-Train: step: 8000; base_ce: 0.664769; steps/sec: 142.86; FastEstimator-Train: step: 8000; epoch: 8; epoch_time(sec): 7.0; Eval Progress: 1/100; Eval Progress: 33/100; steps/sec: 105.74; Eval Progress: 66/100; steps/sec: 112.19; Eval Progress: 100/100; steps/sec: 116.31; FastEstimator-Eval: step: 8000; epoch: 8; adversarial_accuracy: 0.2188; base_ce: 0.96747774; clean_accuracy: 0.6962; min_base_ce: 0.81661767; since_best_base_ce: 2; FastEstimator-Train: step: 9000; base_ce: 0.40541548; steps/sec: 134.91; FastEstimator-Train: step: 9000; epoch: 9; epoch_time(sec): 7.41; Eval Progress: 1/100; Eval Progress: 33/100; steps/sec: 110.06; Eval Progress: 66/100; steps/sec: 120.28; Eval Progress: 100/100; steps/sec: 131.44; FastEstimator-Eval: step: 9000; epoch: 9; adversarial_accuracy: 0.2194; base_ce: 0.87976956; clean_accuracy: 0.7232; min_base_ce: 0.81661767; since_best_base_ce: 3; FastEstimator-Train: step: 10000; base_ce: 0.74041855; steps/sec: 135.69; FastEstimator-Train: step: 10000; epoch: 10; epoch_time(sec): 7.38; Eval Progress: 1/100; Eval Progress: 33/100; steps/sec: 96.99; Eval Progress: 66/100; steps/sec: 124.55; Eval Progress: 100/100; steps/sec: 118.18; FastEstimator-Eval: step: 10000; epoch: 10; adversarial_accuracy: 0.2048; base_ce: 0.91465575; clean_accuracy: 0.7202; min_base_ce: 0.81661767; since_best_base_ce: 4; FastEstimator-Finish: step: 10000; clean_model_lr: 0.001; total_time(sec): 86.08;
As before, we will reload the best weights and the test the model
clean_model.load_weights(os.path.join(save_dir, "clean_model_best_base_ce.h5"))
print("Normal Network:")
normal_results = clean_estimator.test("normal")
print("The whitebox FGSM attack reduced accuracy by {:.2f}".format(list(normal_results.history['test']['clean_accuracy'].values())[0] - list(normal_results.history['test']['adversarial_accuracy'].values())[0]))
print("-----------")
print("Adversarially Trained Network:")
adversarial_results = estimator.test("adversarial")
print("The whitebox FGSM attack reduced accuracy by {:.2f}".format(list(adversarial_results.history['test']['clean_accuracy'].values())[0] - list(adversarial_results.history['test']['adversarial_accuracy'].values())[0]))
print("-----------")
Normal Network: FastEstimator-Test: step: 10000; epoch: 10; adversarial_accuracy: 0.2556; base_ce: 0.8501647; clean_accuracy: 0.7136; The whitebox FGSM attack reduced accuracy by 0.46 ----------- Adversarially Trained Network: FastEstimator-Test: step: 10000; epoch: 10; adv_ce: 1.4894997; adversarial_accuracy: 0.4588; avg_ce: 1.201717; base_ce: 0.91393447; clean_accuracy: 0.6842; The whitebox FGSM attack reduced accuracy by 0.23 -----------
As we can see, the normal network is significantly less robust against adversarial attacks than the one which was trained to resist them. The downside is that the adversarial network requires more epochs of training to converge, and the training steps take about twice as long since they require two forward pass operations. It is also interesting to note that as the regular model was training, it actually saw progressively worse adversarial accuracy. This may be an indication that the network is developing very brittle decision boundaries.
Visualizing Adversarial Samples¶
Lets visualize some images generated by these adversarial attacks to make sure that everything is working as we would expect. The first step is to get some sample data from the pipeline:
class_dictionary = {
0: "airplane", 1: "car", 2: "bird", 3: "cat", 4: "deer", 5: "dog", 6: "frog", 7: "horse", 8: "ship", 9: "truck"
}
batch = pipeline.get_results(mode="test")
Now let's run our sample data through the network and then visualize the results
batch = clean_network.transform(batch, mode="test")
n_samples = 10
y = np.array([class_dictionary[clazz.item()] for clazz in to_number(batch["y"][0:n_samples])])
y_pred = np.array([class_dictionary[clazz.item()] for clazz in to_number(argmax(batch["y_pred"], axis=1)[0:n_samples])])
y_adv = np.array([class_dictionary[clazz.item()] for clazz in to_number(argmax(batch["y_pred_adv"], axis=1)[0:n_samples])])
GridDisplay([BatchDisplay(image=batch["x"][0:n_samples], title="x"),
BatchDisplay(image=batch["x_adverse"][0:n_samples], title="x_adv"),
BatchDisplay(text=y, title="y"),
BatchDisplay(text=y_pred, title="y_pred"),
BatchDisplay(text=y_adv, title="y_adv")
]).show()
As you can see, the adversarial images appear very similar to the unmodified images, and yet they are often able to modify the class predictions of the network. Note that if a network's prediction is already wrong, the attack is unlikely to change the incorrect prediction, but rather to increase the model's confidence in its incorrect prediction.