Multi-Task Learning using Uncertainty Weighted Loss¶
[Paper] [Notebook] [TF Implementation] [Torch Implementation]
Multi-task learning is popular in many deep learning applications. For example, in object detection the network performs both classification and localization for each object. As a result, the final loss will be a combination of classification loss and regression loss. The most frequent way of combining two losses is by simply adding them together:
$loss_{total} = loss_1 + loss_2$
However, a problem emerges when the two losses are on different numerical scales. To resolve this issue, people usually manually design/experimentally determine the best weight, which is very time consuming and computationally expensive:
$loss_{total} = w_1loss_1 + w_2loss_2$
This paper presents an interesting idea: make the weights w1 and w2 trainable parameters based on the uncertainty of each task, such that the network can dynamically focus more on the task with higher uncertainty.
import os
import tempfile
import cv2
import torch
import torch.nn as nn
import torch.nn.functional as fn
import numpy as np
from torch.nn.init import kaiming_normal_ as he_normal
from torchvision import models
import fastestimator as fe
from fastestimator.backend import reduce_mean
from fastestimator.op.numpyop import Delete
from fastestimator.op.numpyop.meta import Sometimes
from fastestimator.op.numpyop.multivariate import HorizontalFlip, LongestMaxSize, PadIfNeeded, ReadMat, ShiftScaleRotate
from fastestimator.op.numpyop.univariate import ChannelTranspose, Normalize, ReadImage, Reshape
from fastestimator.op.tensorop import TensorOp
from fastestimator.op.tensorop.loss import CrossEntropy
from fastestimator.op.tensorop.model import ModelOp, UpdateOp
from fastestimator.schedule import cosine_decay
from fastestimator.trace.adapt import LRScheduler
from fastestimator.trace.io import BestModelSaver
from fastestimator.trace.metric import Accuracy, Dice
#parameters
epochs = 25
batch_size = 8
train_steps_per_epoch = None
eval_steps_per_epoch = None
save_dir = tempfile.mkdtemp()
data_dir = None
Building Components¶
Dataset¶
We will use the CUB200 2010 dataset by Caltech. It contains 6033 bird images from 200 categories, where each image also has a corresponding mask. Therefore, our task is to classify and segment the bird given the image.
We use a FastEstimator API to load the CUB200 dataset and split the dataset to get train, evaluation and test sets.
from fastestimator.dataset.data import cub200
train_data = cub200.load_data(root_dir=data_dir)
eval_data = train_data.split(0.3)
test_data = eval_data.split(0.5)
Step 1: Create Pipeline
¶
We read the images with ReadImage
, and the masks stored in a MAT file with ReadMat
. There is other information stored in the MAT file, so we specify the key seg
to retrieve the mask only.
Here the main task is to resize the images and masks into 512 by 512 pixels. We use LongestMaxSize
(to preserve the aspect ratio) and PadIfNeeded
to resize the image. We will augment both image and mask in the same way and rescale the image pixel values between -1 and 1 since we are using pre-trained ImageNet weights.
pipeline = fe.Pipeline(
batch_size=batch_size,
train_data=train_data,
eval_data=eval_data,
test_data=test_data,
ops=[
ReadImage(inputs="image", outputs="image", parent_path=train_data.parent_path),
Normalize(inputs="image", outputs="image", mean=1.0, std=1.0, max_pixel_value=127.5),
ReadMat(inputs='annotation', outputs="seg", parent_path=train_data.parent_path),
Delete(keys="annotation"),
LongestMaxSize(max_size=512, image_in="image", image_out="image", mask_in="seg", mask_out="seg"),
PadIfNeeded(min_height=512,
min_width=512,
image_in="image",
image_out="image",
mask_in="seg",
mask_out="seg",
border_mode=cv2.BORDER_CONSTANT,
value=0,
mask_value=0),
ShiftScaleRotate(image_in="image",
mask_in="seg",
image_out="image",
mask_out="seg",
mode="train",
shift_limit=0.2,
rotate_limit=15.0,
scale_limit=0.2,
border_mode=cv2.BORDER_CONSTANT,
value=0,
mask_value=0),
Sometimes(HorizontalFlip(image_in="image", mask_in="seg", image_out="image", mask_out="seg", mode="train")),
ChannelTranspose(inputs="image", outputs="image"),
Reshape(shape=(1, 512, 512), inputs="seg", outputs="seg")
])
Let's visualize our Pipeline
results¶
from fastestimator.util import ImageDisplay, GridDisplay
result = pipeline.get_results()
GridDisplay([ImageDisplay(image=result["image"][1],
title="Original Image"),
ImageDisplay(image=result["image"][1],
masks=np.squeeze(result["seg"][1].numpy()),
title="Mask Overlay"),
]).show()