Data Augmentation in PyTorch

Asked 3/8, 2018 at 17:51 Answered 2/3, 2022 at 8:22

Solved python image-processing dataset pytorch data-augmentation

I am a little bit confused about the data augmentation performed in PyTorch. Now, as far as I know, when we are performing data augmentation, we are KEEPING our original dataset, and then adding other versions of it (Flipping, Cropping...etc). But that doesn't seem like happening in PyTorch. As far as I understood from the references, when we use data.transforms in PyTorch, then it applies them one by one. So for example:

data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

Here , for the training, we are first randomly cropping the image and resizing it to shape (224,224). Then we are taking these (224,224) images and horizontally flipping them. Therefore, our dataset is now containing ONLY the horizontally flipped images, so our original images are lost in this case.

Am I right? Is this understanding correct? If not, then where do we tell PyTorch in this code above (taken from Official Documentation) to keep the original images and resize them to the expected shape (224,224)?

Thanks

Carlo answered 3/8, 2018 at 17:51 Comment(0)

The transforms operations are applied to your original images at every batch generation. So your dataset is left unchanged, only the batch images are copied and transformed every iteration.

The confusion may come from the fact that often, like in your example, transforms are used both for data preparation (resizing/cropping to expected dimensions, normalizing values, etc.) and for data augmentation (randomizing the resizing/cropping, randomly flipping the images, etc.).

What your data_transforms['train'] does is:

Randomly resize the provided image and randomly crop it to obtain a (224, 224) patch
Apply or not a random horizontal flip to this patch, with a 50/50 chance
Convert it to a Tensor
Normalize the resulting Tensor, given the mean and deviation values you provided

What your data_transforms['val'] does is:

Resize your image to (256, 256)
Center crop the resized image to obtain a (224, 224) patch
Convert it to a Tensor
Normalize the resulting Tensor, given the mean and deviation values you provided

(i.e. the random resizing/cropping for the training data is replaced by a fixed operation for the validation one, to have reliable validation results)

If you don't want your training images to be horizontally flipped with a 50/50 chance, just remove the transforms.RandomHorizontalFlip() line.

Similarly, if you want your images to always be center-cropped, replace transforms.RandomResizedCrop by transforms.Resize and transforms.CenterCrop, as done for data_transforms['val'].

Sealey answered 3/8, 2018 at 18:19 Comment(4)

Thanks for you answer. So meaning that the CNN will not be trained on the original images I have, only the horizontally flipped images. Right? – Carlo 3/8, 2018 at 18:25

Not exactly right. Your network will be trained on patches of images which are randomly resized and cropped from the original dataset, and which are sometimes horizontally flipped (probability = 0.5). – Sealey 3/8, 2018 at 18:49

It's still unclear to me which transformations increase the size of the dataset and which transformations will change the original image? – Chagres 15/2, 2019 at 1:38

@insanely_sin: All transformations somehow change the image (they leave the original untouched, just returning a changed copy). Given the same input image, some methods will always apply the same changes (e.g., converting it to Tensor, resizing to a fixed shape, etc.). Other methods will apply transformations with random parameters, returning different results each time (e.g., randomly cropping the images, randomly changing their brightness or saturation, etc.). Because the latter transformations return different images each time (from the same original samples), they augment the dataset. – Sealey 15/2, 2019 at 11:34

I assume you are asking whether these data augmentation transforms (e.g. RandomHorizontalFlip) actually increase the size of the dataset as well, or are they applied on each item in the dataset one by one and not adding to the size of the dataset.

Running the following simple code snippet we could observe that the latter is true, i.e. if you have a dataset of 8 images, and create a PyTorch dataset object for this dataset when you iterate through the dataset, the transformations are called on each data point, and the transformed data point is returned. So for example if you have random flipping, some of the data points are returned as original, some are returned as flipped (e.g. 4 flipped and 4 original). In other words, by one iteration through the dataset items, you get 8 data points(some flipped and some not). [Which is at odds with the conventional understanding of augmenting the dataset(e.g. in this case having 16 data points in the augmented dataset)]

from torch.utils.data import Dataset
from torchvision import transforms

class experimental_dataset(Dataset):

    def __init__(self, data, transform):
        self.data = data
        self.transform = transform

    def __len__(self):
        return len(self.data.shape[0])

    def __getitem__(self, idx):
        item = self.data[idx]
        item = self.transform(item)
        return item

transform = transforms.Compose([
    transforms.ToPILImage(),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor()
])

x = torch.rand(8, 1, 2, 2)
print(x)

dataset = experimental_dataset(x,transform)

for item in dataset:
    print(item)

Results: (The little differences in floating points are caused by transforming to pil image and back)

Original dummy dataset:

tensor([[[[0.1872, 0.5518],
          [0.5733, 0.6593]]],


    [[[0.6570, 0.6487],
      [0.4415, 0.5883]]],


    [[[0.5682, 0.3294],
      [0.9346, 0.1243]]],


    [[[0.1829, 0.5607],
      [0.3661, 0.6277]]],


    [[[0.1201, 0.1574],
      [0.4224, 0.6146]]],


    [[[0.9301, 0.3369],
      [0.9210, 0.9616]]],


    [[[0.8567, 0.2297],
      [0.1789, 0.8954]]],


    [[[0.0068, 0.8932],
      [0.9971, 0.3548]]]])

transformed dataset:

tensor([[[0.1843, 0.5490],
     [0.5725, 0.6588]]])
tensor([[[0.6549, 0.6471],
     [0.4392, 0.5882]]])
tensor([[[0.5647, 0.3255],
         [0.9333, 0.1216]]])
tensor([[[0.5569, 0.1804],
         [0.6275, 0.3647]]])
tensor([[[0.1569, 0.1176],
         [0.6118, 0.4196]]])
tensor([[[0.9294, 0.3333],
         [0.9176, 0.9608]]])
tensor([[[0.8549, 0.2275],
         [0.1765, 0.8941]]])
tensor([[[0.8902, 0.0039],
         [0.3529, 0.9961]]])

Bedim answered 31/1, 2019 at 12:7 Comment(10)

I think this is the answer to the question the OP really asked. – Lorrettalorri 21/4, 2019 at 22:16

So that means that upon every epoch you get a different version of the dataset, right? – Charlinecharlock 21/8, 2019 at 15:31

@Charlinecharlock Yes – Bedim 29/8, 2019 at 14:8

@NicoleFinnie But how can I use all of them, original and transformed dataset. Because one of the purposes of augmentation is to increase the dataset size, right? – Marquittamarr 2/5, 2020 at 20:32

@Marquittamarr Not necessarily. The purpose of data augmentation is trying to get an upper bound of the data distribution of unseen (test) data in a hope that the neural nets will be approximated to that data distribution with a trade-off that it approximates the original distribution of the train data (the test data is unlikely to be similar in reality). There's no one size fits all data augmentation approach or definition. – Lorrettalorri 2/5, 2020 at 23:5

@pooria, you don't need to do at (explained by @NicoleFinnie)... however, if you have to do it like that for some reason, you can generate a new dataset by using the transformation available in pytorch, save it.. and train on the new one.. (though I would not recommend it, do it only if you have a specific reason for it) – Bedim 2/6, 2020 at 10:4

@Bedim Yeah, tnx I figured! I decided to not do that since it doesn't make any sense :D – Marquittamarr 30/6, 2020 at 1:4

@Nicole Finnie So if we apply multiple random transformations like random zoom, random rotation, random contrast, should we run the neural network for longer epochs than without data augmentation (say 50 instead of 40) so that the network can see all transformations? or it's not necessary? – Kutchins 3/2, 2021 at 17:53

@YacineRouizi In principle, use other criteria such as val loss for stopping the training... and yes, if your goal is to go over all possible data transforms, your training takes longer, however the objective is your validation loss, we use augmentation to improve generalization, and whether we should visit all possible transformations or a few of them is not the question... – Bedim 9/2, 2021 at 17:59

@Bedim , so when we traverse the dataloader like next(iter(dataloader)) , we will able see (not necessarily) the data transforms we applied on the dataset right? – Koblenz 22/8, 2021 at 7:11

The transforms operations are applied to your original images at every batch generation. So your dataset is left unchanged, only the batch images are copied and transformed every iteration.

What your data_transforms['train'] does is:

Randomly resize the provided image and randomly crop it to obtain a (224, 224) patch
Apply or not a random horizontal flip to this patch, with a 50/50 chance
Convert it to a Tensor
Normalize the resulting Tensor, given the mean and deviation values you provided

What your data_transforms['val'] does is:

Resize your image to (256, 256)
Center crop the resized image to obtain a (224, 224) patch
Convert it to a Tensor
Normalize the resulting Tensor, given the mean and deviation values you provided

(i.e. the random resizing/cropping for the training data is replaced by a fixed operation for the validation one, to have reliable validation results)

If you don't want your training images to be horizontally flipped with a 50/50 chance, just remove the transforms.RandomHorizontalFlip() line.

Similarly, if you want your images to always be center-cropped, replace transforms.RandomResizedCrop by transforms.Resize and transforms.CenterCrop, as done for data_transforms['val'].

Sealey answered 3/8, 2018 at 18:19 Comment(4)

Thanks for you answer. So meaning that the CNN will not be trained on the original images I have, only the horizontally flipped images. Right? – Carlo 3/8, 2018 at 18:25

It's still unclear to me which transformations increase the size of the dataset and which transformations will change the original image? – Chagres 15/2, 2019 at 1:38

Yes the dataset size does not change after the transformations. Every Image is passed to the transformation and returned, thus the size remaining the same.

If you wish to use the original dataset with transformed one concat them.

e.g increased_dataset = torch.utils.data.ConcatDataset([transformed_dataset,original])

Schindler answered 11/9, 2020 at 18:20 Comment(1)

with this technique I succeed to augment my data *2, but I still wondering if there is a way that can help me to automate this task, for example you just give the original data and you receive the augmented data *2 or *4 times depends on transformations you made ? – Bugloss 25/4, 2022 at 22:21

TLDR :

The transform operation applies a bunch of transforms with a certain probability to the input batch that comes in the loop. So the model now is exposed to more examples during the course of multiple epochs.
Personally, when I was Training an audio classification model on my own dataset, before augmentation, my model always seem to converge at 72 % accuracy. I used augmentation along with an increased number of training epochs, Which boosted the validation accuracy in the test set to 89 percent.

Whorish answered 7/3, 2021 at 7:6 Comment(1)

I think this answer, while less detailed, actually addresses what the question is confused aboout. The point is that the data is coming back different during each batch (due to the "random" options) so from the point of how many unique images are seen during the training loop, the data is augmented. You can run more epochs without overfitting. In the case of the non-random transforms, they are a form of preprocessing and not increasing the amount of options seen during the training loop but are there out of necessity (eg resize) – Adelleadelpho 29/11, 2023 at 5:25

The purpose of data augumentation is to increase the diversity of training dataset.

Even though the data.transforms doesn't change the size of dataset, however, every epoch we recall the dataset, the transforms operation will be executed and then get different data.

I changed @Ashkan372 code slightly to output data for multiple epochs：

import torch
from torchvision import transforms
from torch.utils.data import TensorDataset as Dataset
from torch.utils.data import DataLoader

class experimental_dataset(Dataset):
  def __init__(self, data, transform):
    self.data = data
    self.transform = transform

  def __len__(self):
    return self.data.shape[0]

  def __getitem__(self, idx):
    item = self.data[idx]
    item = self.transform(item)
    return item

transform = transforms.Compose([
  transforms.ToPILImage(),
  transforms.RandomHorizontalFlip(),
  transforms.ToTensor()
])

x = torch.rand(8, 1, 2, 2)
print('the original data: \n', x)

epoch_size = 3
batch_size = 4

dataset = experimental_dataset(x,transform)
for i in range(epoch_size):
  print('----------------------------------------------')
  print('the epoch', i, 'data: \n')
  for item in DataLoader(dataset, batch_size, shuffle=False):
    print(item)

The output is:

the original data: 
 tensor([[[[0.5993, 0.5898],
          [0.7365, 0.5472]]],


        [[[0.1878, 0.3546],
          [0.2124, 0.8324]]],


        [[[0.9321, 0.0795],
          [0.4090, 0.9513]]],


        [[[0.2825, 0.6954],
          [0.3737, 0.0869]]],


        [[[0.2123, 0.7024],
          [0.6270, 0.5923]]],


        [[[0.9997, 0.9825],
          [0.0267, 0.2910]]],


        [[[0.2323, 0.1768],
          [0.4646, 0.4487]]],


        [[[0.2368, 0.0262],
          [0.2423, 0.9593]]]])
----------------------------------------------
the epoch 0 data: 

tensor([[[[0.5882, 0.5961],
          [0.5451, 0.7333]]],


        [[[0.3529, 0.1843],
          [0.8314, 0.2118]]],


        [[[0.9294, 0.0784],
          [0.4078, 0.9490]]],


        [[[0.6941, 0.2824],
          [0.0863, 0.3725]]]])
tensor([[[[0.7020, 0.2118],
          [0.5922, 0.6235]]],


        [[[0.9804, 0.9961],
          [0.2902, 0.0235]]],


        [[[0.2314, 0.1765],
          [0.4627, 0.4471]]],


        [[[0.0235, 0.2353],
          [0.9569, 0.2392]]]])
----------------------------------------------
the epoch 1 data: 

tensor([[[[0.5882, 0.5961],
          [0.5451, 0.7333]]],


        [[[0.1843, 0.3529],
          [0.2118, 0.8314]]],


        [[[0.0784, 0.9294],
          [0.9490, 0.4078]]],


        [[[0.2824, 0.6941],
          [0.3725, 0.0863]]]])
tensor([[[[0.2118, 0.7020],
          [0.6235, 0.5922]]],


        [[[0.9804, 0.9961],
          [0.2902, 0.0235]]],


        [[[0.2314, 0.1765],
          [0.4627, 0.4471]]],


        [[[0.0235, 0.2353],
          [0.9569, 0.2392]]]])
----------------------------------------------
the epoch 2 data: 

tensor([[[[0.5882, 0.5961],
          [0.5451, 0.7333]]],


        [[[0.3529, 0.1843],
          [0.8314, 0.2118]]],


        [[[0.0784, 0.9294],
          [0.9490, 0.4078]]],


        [[[0.6941, 0.2824],
          [0.0863, 0.3725]]]])
tensor([[[[0.2118, 0.7020],
          [0.6235, 0.5922]]],


        [[[0.9961, 0.9804],
          [0.0235, 0.2902]]],


        [[[0.2314, 0.1765],
          [0.4627, 0.4471]]],


        [[[0.0235, 0.2353],
          [0.9569, 0.2392]]]])

Different epoch we get different outputs!

Confront answered 2/3, 2022 at 8:22 Comment(0)

In PyTorch, there are types of cropping that DO change the size of the dataset. These are FiveCrop and TenCrop:

CLASS torchvision.transforms.FiveCrop(size)

Crop the given image into four corners and the central crop.

This transform returns a tuple of images and there may be a mismatch in the number of inputs and targets your Dataset returns. See below for an example of how to deal with this.

Example:
>>> transform = Compose([
>>>    TenCrop(size), # this is a list of PIL Images
>>>    Lambda(lambda crops: torch.stack([ToTensor()(crop) for crop in crops])) # returns a 4D tensor
>>> ])
>>> #In your test loop you can do the following:
>>> input, target = batch # input is a 5d tensor, target is 2d
>>> bs, ncrops, c, h, w = input.size()
>>> result = model(input.view(-1, c, h, w)) # fuse batch size and ncrops
>>> result_avg = result.view(bs, ncrops, -1).mean(1) # avg over crops

TenCrop is the same plus the flipped version of the five patches (horizontal flipping is used by default).

Responsible answered 25/6, 2021 at 13:5 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags