【代码模板】统计数据集的均值和标准差
背景
在数据预处理时,通常会对样本进行标准化操作,使样本的均值为0,标准差为1,从而提高训练的稳定性。
进行标准化操作时,需要预先统计数据集的均值和标准差。下面的demo展示了如何实现这个操作。
demo
import torch
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
import torchvision.datasets as datasets
from tqdm import tqdm
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
train_set = datasets.CIFAR10(
root="dataset/", transform=transforms.ToTensor(), download=True
)
train_loader = DataLoader(dataset=train_set, batch_size=64, shuffle=True)
def get_mean_std(loader):
# var[X] = E[X**2] - E[X]**2
channels_sum, channels_sqrd_sum, num_batches = 0, 0, 0
for data, _ in tqdm(loader):
channels_sum += torch.mean(data, dim=[0, 2, 3])
channels_sqrd_sum += torch.mean(data**2, dim=[0, 2, 3])
num_batches += 1
mean = channels_sum / num_batches
std = (channels_sqrd_sum / num_batches - mean**2) ** 0.5
return mean, std
mean, std = get_mean_std(train_loader)
print(mean)
print(std)
参考资料
Pytorch Quick Tip: Calculate Mean and Standard Deviation of Data
原文地址:https://blog.csdn.net/qq_42693593/article/details/142724086
免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!