【计算机视觉技术 - 人脸生成】1.人脸数据集构建

🕗 发布于 2024-11-28 13:48 计算机视觉 人工智能

一、实验目的

本实验旨在学习如何构建自定义的人脸数据集，并将其应用于图像生成任务中。具体来说，本任务是一个条件生成任务，即给定一个人脸的类别标签（如超模脸、动漫脸、萌娃脸、明星脸等），生成与该类别标签相对应的人脸图像。

二、硬件与软件环境

以下是我的设备配置：

处理器：12th Gen Intel(R) Core(TM) i5-12450H，2.00 GHz

内存：16.0 GB (15.8 GB 可用)

系统类型：64 位操作系统，基于 x64 的处理器

显卡：NVIDIA GeForce RTX 3060 Laptop GPU

GPU核心频率：210 MHz

显存频率：405 MHz

GPU性能状态：8

操作系统：Windows 11 家庭中文版

开发工具：PyCharm 2023.3.1 (Professional Edition)，Python 3.11

三、实验内容

1.人脸数据集构建。

1.1创建Flask项目

这是需要完成的整体目录结构：

通常自定义数据集的构建可以分为 2 步：数据收集、数据整理。

1.2 人脸图像收集

与公开数据集不同，我们需要根据需要从web上收集数据。这里我们主要通过python爬取百度图片网页上的图像。在根目录上新建一个download_data.py文件，编写代码如下:

download_data.py

import os
import requests
import re
from tqdm import tqdm

def get_image_type(url):
    return "png" if ".png" in url else "jpg" if ".jpg" in url or ".jpeg" in url else None

def get_urls_one_page(url):
    try:
        response = requests.get(url, timeout=10)
        response.raise_for_status()
        response.encoding = 'utf-8'
        html = response.text
        url_pictures_this_page = re.findall(r'"objURL":"(.*?)",', html)
        url_next_page_prefix = re.findall(r'<a href="(.*?)" class="next">下一页</a>', html)
        url_next_page = f"http://image.baidu.com/{url_next_page_prefix[0]}" if url_next_page_prefix else None
        return url_pictures_this_page, url_next_page
    except requests.RequestException as e:
        print(f"请求页面出错: {e}")
        return [], None

def download_image(url, save_dir, index, image_type):
    try:
        picture = requests.get(url, timeout=10)
        picture.raise_for_status()
        name = f"{save_dir}/{index}.{image_type}"
        with open(name, 'wb') as f:
            f.write(picture.content)
        print(f"第{index + 1}张图片下载成功")
    except Exception as e:
        print(f"第{index + 1}张图片下载失败! 错误: {e}")

def download_images(keyWord, save_dir, number):
    if not os.path.exists(save_dir):
        os.makedirs(save_dir)
    base_url = "http://image.baidu.com/search/flip?tn=baiduimage&ipn=r&ct=201326592&cl=2&lm=-1&st=-1&fm=result&fr=&sf=1&fmq=1497491098685_R&pv=&ic=0&nc=1&z=&se=1&showtab=0&fb=0&width=&height=&face=0&istype=2&ie=utf-8&ctd=1497491098685%5E00_1519X735&word="
    url = base_url + keyWord
    a = 0
    while a < number:
        pictures_url, url = get_urls_one_page(url)
        if not pictures_url:
            break
        for i in pictures_url:
            image_type = get_image_type(i)
            if image_type:
                download_image(i, save_dir, a, image_type)
                a += 1
                if a >= number:
                    break

classlist = ['黄种人', '白种人', '黑种人', '动漫头像']
for key in tqdm(classlist, desc="下载进度"):
    download_images(key, f'data/raw/{key}/', 1000)

由于网络上爬取的内容参差不齐，所以我直接找到了处理好的数据集使用。

链接如下：https://seeprettyface.com/mydataset.html

可以选择自己喜欢的数据集进行合成。

1.3 人脸数据整理

完成下载后，查看下载的图像会发现掺杂了一些非人脸或非对应类别的人脸，需要手动进行一些大致的清理，删除不符合要求的图像。

清理后，为了用于训练，我们需要对这些原始数据进行初步的处理。如用于生成训练，我们需要将所有图像剪切到 128*128 的尺寸，并尽量使人脸部分居中对齐。我们使用 mtcnn 插件完成对人脸的处理。具体过程为：

1.3.1 mtcnn_pytorch.zip

我们将 mtcnn_pytorch.zip 从资源中拷贝到项目根目录并解压,得到一个 mtcnn_pytorch 文件夹。在资源绑定中可以下载相关压缩包。

1.3.2 人脸对齐相关程序

在根目录新建 mtcnn.py 文件，编写人脸对齐相关程序。

mtcnn.py

import numpy as np
import torch
from PIL import Image
from mtcnn_pytorch.src.get_nets import PNet, RNet, ONet
from mtcnn_pytorch.src.box_utils import nms, calibrate_box, get_image_boxes, convert_to_square
from mtcnn_pytorch.src.first_stage import run_first_stage
from mtcnn_pytorch.src.align_trans import get_reference_facial_points, warp_and_crop_face

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

class MTCNN():
    def __init__(self):
        self.pnet = PNet().to(device)
        self.rnet = RNet().to(device)
        self.onet = ONet().to(device)
        self.pnet.eval()
        self.rnet.eval()
        self.onet.eval()
        self.reference = get_reference_facial_points(default_square=True)

    def get_resized_reference_facial_points(self, crop_size, reference_pts):
        ref_pts = np.float32(reference_pts)
        tmp_crop_size = np.array((96, 112))
        resize_TIME = max(crop_size) / max(tmp_crop_size)
        ref_pts = ref_pts * resize_TIME
        return ref_pts

    def align(self, img, crop_size=(112, 112)):
        _, landmarks = self.detect_faces(img)
        if len(landmarks) == 0:
            print("No face detected.")
            return None  # 处理未检测到人脸的情况

        facial5points = [[landmarks[0][j], landmarks[0][j + 5]] for j in range(5)]
        reference_pts = self.get_resized_reference_facial_points(crop_size, self.reference)
        warped_face = warp_and_crop_face(np.array(img), facial5points, reference_pts, crop_size)
        return Image.fromarray(warped_face)

    def align_multi(self, img, limit=None, min_face_size=30.0):
        boxes, landmarks = self.detect_faces(img, min_face_size)
        if limit:
            boxes = boxes[:limit]
            landmarks = landmarks[:limit]
        faces = []
        for landmark in landmarks:
            facial5points = [[landmark[j], landmark[j + 5]] for j in range(5)]
            warped_face = warp_and_crop_face(np.array(img), facial5points, self.reference,
                                             crop_size=(112, 112))
            faces.append(Image.fromarray(warped_face))
        return boxes, faces

    def detect_faces(self, image, min_face_size=20.0, thresholds=[0.6, 0.7, 0.8],
                     nms_thresholds=[0.7, 0.7, 0.7]):
        width, height = image.size
        min_length = min(height, width)

        min_detection_size = 12
        factor = 0.707  # sqrt (0.5)
        scales = []
        m = min_detection_size / min_face_size
        min_length *= m

        factor_count = 0
        while min_length > min_detection_size:
            scales.append(m * factor ** factor_count)
            min_length *= factor
            factor_count += 1

        bounding_boxes = []
        with torch.no_grad():
            for s in scales:
                boxes = run_first_stage(image, self.pnet, scale=s, threshold=thresholds[0])
                if boxes is not None:
                    bounding_boxes.append(boxes)

            if not bounding_boxes:
                return [], []

            bounding_boxes = np.vstack(bounding_boxes)
            keep = nms(bounding_boxes[:, 0:5], nms_thresholds[0])
            bounding_boxes = bounding_boxes[keep]
            bounding_boxes = calibrate_box(bounding_boxes[:, 0:5], bounding_boxes[:, 5:])
            bounding_boxes = convert_to_square(bounding_boxes)
            bounding_boxes[:, 0:4] = np.round(bounding_boxes[:, 0:4])

            img_boxes = get_image_boxes(bounding_boxes, image, size=24)
            img_boxes = torch.FloatTensor(img_boxes).to(device)

            output = self.rnet(img_boxes)
            offsets = output[0].cpu().data.numpy()  # shape [n_boxes，4]
            probs = output[1].cpu().data.numpy()  # shape [n_boxes，2]

            keep = np.where(probs[:, 1] > thresholds[1])[0]
            bounding_boxes = bounding_boxes[keep]
            bounding_boxes[:, 4] = probs[keep, 1].reshape((-1,))
            offsets = offsets[keep]

            keep = nms(bounding_boxes, nms_thresholds[1])
            bounding_boxes = bounding_boxes[keep]
            bounding_boxes = calibrate_box(bounding_boxes, offsets[keep])
            bounding_boxes = convert_to_square(bounding_boxes)
            bounding_boxes[:, 0:4] = np.round(bounding_boxes[:, 0:4])

            img_boxes = get_image_boxes(bounding_boxes, image, size=48)
            if len(img_boxes) == 0 or len(bounding_boxes) == 0:
                return [], []
            img_boxes = torch.FloatTensor(img_boxes).to(device)
            output = self.onet(img_boxes)
            landmarks = output[0].cpu().data.numpy()  # shape [n_boxes,10]
            offsets = output[1].cpu().data.numpy()  # shape [n_boxes，4]
            probs = output[2].cpu().data.numpy()  # shape [n_boxes，2]

            keep = np.where(probs[:, 1] > thresholds[2])[0]
            bounding_boxes = bounding_boxes[keep]
            bounding_boxes[:, 4] = probs[keep, 1].reshape((-1,))
            offsets = offsets[keep]
            landmarks = landmarks[keep]

            width = bounding_boxes[:, 2] - bounding_boxes[:, 0] + 1.0
            height = bounding_boxes[:, 3] - bounding_boxes[:, 1] + 1.0
            xmin, ymin = bounding_boxes[:, 0], bounding_boxes[:, 1]
            landmarks[:, 0:5] = np.expand_dims(xmin, 1) + np.expand_dims(width, 1) * landmarks[:, 0:5]
            landmarks[:, 5:10] = np.expand_dims(ymin, 1) + np.expand_dims(height, 1) * landmarks[:, 5:10]

            bounding_boxes = calibrate_box(bounding_boxes, offsets)
            keep = nms(bounding_boxes, nms_thresholds[2], mode='min')
            bounding_boxes = bounding_boxes[keep]
            landmarks = landmarks[keep]

        return bounding_boxes, landmarks

1.3.3 人脸对齐和剪切

在根目录新建 process_mtcnn_128.py 文件，用于人脸对齐和剪切操作。代码如下：

import PIL
from PIL import Image
import os
from tqdm import tqdm
import argparse
from mtcnn import MTCNN


def crop(imgpath, savepath, mtcnn, crop_size=(112, 112)):
    try:
        image = PIL.Image.open(imgpath).convert('RGB')
        aligned_image = mtcnn.align(img=image, crop_size=crop_size)
        if aligned_image is None:
            print(f'No face detected or alignment failed for {imgpath}')
            return False
        aligned_image.save(savepath)
        print(f'Successfully saved: {savepath}')
        return True
    except Exception as e:
        print(f'Error processing {imgpath}: {str(e)}')
        return False


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--root", type=str, default='data/raw/', required=False,
                        help="the dir of raw datas ")
    parser.add_argument("--output_dir", type=str, default='data/crop128/', required=False,
                        help="the dir for processed datas")
    args = parser.parse_args()

    mtcnn = MTCNN()
    dir_origin_path = args.root
    dir_save_path = args.output_dir

    if not os.path.exists(dir_save_path):
        os.makedirs(dir_save_path)

    success_count = 0
    fail_count = 0

    for root, subdirs, _ in os.walk(dir_origin_path):
        for sub in tqdm(subdirs, desc="Processing subdirectories"):
            base = os.path.join(dir_origin_path, sub)
            path = os.path.join(dir_save_path, sub)
            if not os.path.exists(path):
                os.makedirs(path)
                print(f'Created directory: {path}')

            cnt = 0
            for img in tqdm(os.listdir(base), desc=f"Processing images in {sub}"):
                output_img = os.path.join(path, f"{cnt:08}.png")
                img_file = os.path.join(base, img)
                if os.path.isfile(output_img):
                    continue
                result = crop(img_file, output_img, mtcnn, crop_size=(128, 128))
                if result:
                    success_count += 1
                else:
                    fail_count += 1
                cnt += 1

    print(f'Total processed: {success_count + fail_count}')
    print(f'Success: {success_count}')
    print(f'Failed: {fail_count}')

执行程序。介绍两种方式执行程序，一种上可视化的方式，右键点击文件选择“执行”（英文版：run），如图所示：

第二种方法是通过命令的方法执行。点开 Pycharm 的 terminal 窗口，或者ALT+F12快捷键进入终端输入命令：

python process_mtcnn_128.py --root data/raw --output_dir data/crop128

如图所示：

1.3.4 数据可视化

在根目录下新建 loaddata.py，编写代码如下：

import torchvision
from torch.utils.data import DataLoader
from torchvision.datasets import ImageFolder
from torchvision.transforms import Compose
import matplotlib.pyplot as plt
import numpy as np

def main():
    data_root = 'data/crop128/'
    image_size = 128
    batch_size_train = 4


    # 创建数据集
    train_data = ImageFolder(
        root=data_root,
        transform=Compose([
            torchvision.transforms.Resize(image_size),
            torchvision.transforms.CenterCrop(image_size),
            torchvision.transforms.ToTensor(),
            torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
        ])
    )

    # 创建加载器
    train_loader = DataLoader(
        train_data,
        batch_size=batch_size_train,
        num_workers=4,
        shuffle=True
    )

    def imshow(img):
        img = img / 2 + 0.5  # 反归一化
        npimg = img.numpy()
        plt.imshow(np.transpose(npimg, (1, 2, 0)))
        plt.axis('off')  # 不显示坐标轴
        plt.show()

    # 展示一批图像
    for idx, (images, labels) in enumerate(train_loader):
        print(labels)
        imshow(torchvision.utils.make_grid(images))
        break  # 仅展示第一批

if __name__ == '__main__':
    main()

运行结果如下：

原文地址：https://blog.csdn.net/m0_64545019/article/details/144091029

免责声明：本站文章内容转载自网络资源，如本站内容侵犯了原著者的合法权益，可联系本站删除。更多内容请关注自学内容网（zxcms.com）！

上一篇：day19 C语言收尾及数据结构
下一篇：mysql之慢查询设置及日志分析

Leetcode刷题笔记15
724. 寻找数组的中心下标238. 除自身以外数组的乘积974. 和可被 K 整除的子数组
阅读更多2024-11-28
自动驾驶目标检测融合全貌
4、asymmetry fusion 异步融合，用已经检测好的2d框投影原始3d点云，或者用像素给3d点云检测框上色，增加语义信息描述。1、early fusion 早期融合，特点用到几何空间转换3
阅读更多2024-11-28
Linux 系统优化实战指南
本指南将带您逐步完成 Linux 系统的优化过程，包括性能调优、安全加固和资源管理等方面。
阅读更多2024-11-28
Java中TimedCache缓存对象的详细使用
TimedCache是一个泛型类，它的主要作用通常是在一定时间范围内对特定键值对进行缓存，并且能够根据设定的时间策略来自动清理过期的缓存项。
阅读更多2024-11-28
【大数据学习 | Spark-SQL】Spark-SQL编程
上面的是SparkSQL的API操作。
阅读更多2024-11-28
【大数据学习 | Spark-SQL】SparkSQL读写数据
我们使用sparksql进行编程，编程的过程我们需要创建dataframe对象，这个对象的创建方式我们是先创建RDD然后再转换rdd变成为DataFrame对象。(1.列式存储，检索效率高，防止冗余查
阅读更多2024-11-28
数据库和缓存的数据一致性 -20241124
数据库和缓存存在不一致性，部分原因是因为数据库和缓存是独立的两部分。从设计上可能做了适配，但是总归有配合问题。如果数据库本身在设计上就考虑设计高并发缓存，我们在使用时就不用考虑一致性问题或者性能问题。
阅读更多2024-11-28
比特币与区块链原理解析：矿机挖矿与去中心化的未来
比特币（Bitcoin）是由一个名为“中本聪”（Satoshi Nakamoto）的匿名人物或团体于2008年11月1日提出并于2009年正式发布的数字货币。它的诞生起源于全球金融危机后的对中心化金融
阅读更多2024-11-28
Wordcloud+PyQt5写个词云图生成器1.0
WordCloud去掉停用词（fit_words+generate）的2种用法。使用WordCloud模块中repeat参数，做一个关键字重复的词云图。-------------词云图集合------
阅读更多2024-11-28
JSONArray 与Object 之间的转换
【代码】JSONArray 与Object 之间的转换。
阅读更多2024-11-28