I developed a custom dataset by using the PyTorch dataset class. The code is like that:
class CustomDataset(torch.utils.data.Dataset):
def __init__(self, root_path, transform=None):
self.path = root_path
self.mean = mean
self.std = std
self.transform = transform
self.images = []
self.masks = []
for add in os.listdir(self.path):
# Some script to load file from directory and appending address to relative array
...
self.masks.sort()
self.images.sort()
def __len__(self):
return len(self.images)
def __getitem__(self, item):
image_address = self.images[item]
mask_address = self.masks[item]
if self.transform is not None:
augment = self.transform(image=np.asarray(Image.open(image_address, 'r', None)),
mask=np.asarray(Image.open(mask_address, 'r', None)))
image = Image.fromarray(augment['image'])
mask = augment['mask']
if self.transform is None:
image = np.asarray(Image.open(image_address, 'r', None))
mask = np.asarray(Image.open(mask_address, 'r', None))
# Handle Augmentation here
return image, mask
Then I created an object from this class and passed it to torch.utils.data.DataLoader. Although this works well with DataLoader but with torch.utils.data.DataLoader2 I got a problem. The error is this:
dataloader = torch.utils.data.DataLoader2(dataset=dataset, batch_size=2, pin_memory=True, num_workers=4)
Exception: thread parallelism mode is not supported for old DataSets
My question is why DataLoader2 module was added to PyTorch what is different with DataLoader and what are its benefits?
PyTorch Version: 1.10.1