代码解读：Diffusion Models中的长宽桶技术(Aspect Ratio Bucketing)

🕗 发布于 2024-07-21 05:05 计算机视觉 人工智能 深度学习 扩散模型 AIGC

Diffusion Models专栏文章汇总：入门与实战

前言：自从SDXL提出了长宽桶技术之后，彻底解决了不同长宽比的图像输入问题，现在已经成为训练扩散模型必选的方案。这篇博客从代码详细解读如何在模型训练的时候运用长宽桶技术(Aspect Ratio Bucketing)。

原理解读-原有训练的问题

纵横比分桶训练可以极大地提高输出质量，现有图像生成模型的一个常见问题是，它们非常容易生成带有非自然作物的图像。这是因为这些模型被训练成生成方形图像。然而，大多数照片和艺术品都不是方形的。然而，该模型只能同时在相同大小的图像上工作，并且在训练过程中，通常的做法是同时在多个训练样本上操作，以优化所使用gpu的效率。作为妥协，选择正方形图像，在训练过程中，只裁剪出每个图像的中心，然后作为训练样例显示给图像生成模型。

例如，人类通常是没有脚或头的，剑只有一个刀刃，剑柄和剑尖在框架外。因为我们正在创建一个图像生成模型来配合我们的故事叙述体验，所以我们的模型能够产生适当的，未裁剪的角色是很重要的，并且生成的骑士不应该持有延伸到无限的金属状直线。

对裁剪图像进行训练的另一个问题是，它可能导致文本和图像之间的不匹配。例如，带有王冠标签的图像通常在中央裁剪后不再包含王冠，因此君主已经被斩首。我们发现使用随机作物代替中心作物只能略微改善这些问题。使用具有可变图像大小的稳定扩散是可能的，尽管可以注意到，远远超过512x512的原生分辨率往往会引入重复的图像元素，并且非常低的分辨率会产生无法识别的图像。

尽管如此，这向我们表明，在可变大小的图像上训练模型应该是可能的。在单个、可变大小的样本上进行训练是微不足道的，但也非常缓慢，而且由于使用小批量提供的缺乏正则化，更容易产生训练不稳定性。

长宽桶技术(Aspect Ratio Bucketing)

由于这个问题似乎没有现有的解决方案，我们已经为我们的数据集实现了自定义批生成代码，允许创建批处理，其中批处理中的每个项目具有相同的大小，但批处理的图像大小可能不同。

我们通过一种叫做宽高比桶的方法来做到这一点。另一种方法是使用固定的图像大小，缩放每个图像以适应这个固定的大小，并应用在训练期间被掩盖的填充。由于这会导致训练期间不必要的计算，我们没有选择遵循这种替代方法。

在下面，我们描述了我们自定义的宽高比桶的批量生成方案背后的原始想法。

首先，我们必须定义要将数据集的图像排序到哪个存储桶中。为此，我们定义的最大图像尺寸为512x768，最大尺寸为1024。由于最大图像大小为512x768，比512x512大，需要更多的VRAM，因此每个gpu的批处理大小必须降低，这可以通过梯度积累来补偿。

我们通过应用以下算法生成桶:

Set the width to 256.
While the width is less than or equal to 1024:
    Find the largest height such that height is less than or equal to 1024 and that width multiplied by height is less than or equal to 512 * 768.
    Add the resolution given by height and width as a bucket.
    Increase the width by 64.

同样的重复，宽度和高度互换。重复的桶将从列表中删除，并添加一个大小为512x512的桶。

接下来，我们将图像分配到相应的桶中。为此，我们首先将桶分辨率存储在NumPy数组中，并计算每个分辨率的长宽比。对于数据集中的每张图像，我们检索其分辨率并计算长宽比。图像宽高比从桶宽高比数组中减去，使我们能够根据宽高比差的绝对值有效地选择最接近的桶:

image_bucket = argmin(abs(bucket_aspects — image_aspect))

图像的桶号与其数据集中的项目ID相关联。如果图像的宽高比非常极端，甚至与最适合的桶相差太大，则从数据集中修剪图像。

由于我们在多个GPU上进行训练，在每个epoch之前，我们对数据集进行了分片，以确保每个GPU在大小相等的不同子集上工作。为此，我们首先复制数据集中的项目id列表并对它们进行洗牌。如果这个复制的列表不能被gpu数量乘以批大小整除，则会对列表进行修剪，并删除最后的项以使其可整除。

然后，我们根据当前进程的全局排名选择1/world_size*bsz项id的不同子集。自定义批处理生成的其余部分将从这些过程中的任何一个过程中进行描述，并对数据集项id的子集进行操作。

对于当前的分片，每个bucket的列表是通过迭代打乱的数据集项目ID列表并将ID分配给分配给图像的bucket对应的列表来创建的。

处理完所有图像后，我们遍历每个bucket的列表。如果它的长度不能被批大小整除，则根据需要删除列表上的最后一个元素以使其可整除，并将它们添加到单独的捕获所有桶中。由于保证整个分片大小包含许多可被批大小整除的元素，因此保证生成一个长度可被批大小整除的所有bucket。

当请求批处理时，我们从加权分布中随机抽取一个桶。桶的权重设置为桶的大小除以所有剩余桶的大小。这确保了即使有大小差异很大的桶，自定义批生成在训练期间不会引入强烈的偏差，根据图像大小显示图像。如果在没有加权的情况下选择桶，那么小的桶将在训练过程中早期清空，只有最大的桶将在训练结束时保留。按大小对桶进行加权可以避免这种情况。

最后从所选的桶中取出一批项。取走的项目从桶中移除。如果桶现在为空，则在epoch的剩余时间内删除它。所选的项id和所选桶的分辨率现在被传递给图像加载函数。

完整代码

import numpy as np
import pickle
import time

def get_prng(seed):
    return np.random.RandomState(seed)

class BucketManager:
    def __init__(self, bucket_file, valid_ids=None, max_size=(768,512), divisible=64, step_size=8, min_dim=256, base_res=(512,512), bsz=1, world_size=1, global_rank=0, max_ar_error=4, seed=42, dim_limit=1024, debug=False):
        with open(bucket_file, "rb") as fh:
            self.res_map = pickle.load(fh)
        if valid_ids is not None:
            new_res_map = {}
            valid_ids = set(valid_ids)
            for k, v in self.res_map.items():
                if k in valid_ids:
                    new_res_map[k] = v
            self.res_map = new_res_map
        self.max_size = max_size
        self.f = 8
        self.max_tokens = (max_size[0]/self.f) * (max_size[1]/self.f)
        self.div = divisible
        self.min_dim = min_dim
        self.dim_limit = dim_limit
        self.base_res = base_res
        self.bsz = bsz
        self.world_size = world_size
        self.global_rank = global_rank
        self.max_ar_error = max_ar_error
        self.prng = get_prng(seed)
        epoch_seed = self.prng.tomaxint() % (2**32-1)
        self.epoch_prng = get_prng(epoch_seed) # separate prng for sharding use for increased thread resilience
        self.epoch = None
        self.left_over = None
        self.batch_total = None
        self.batch_delivered = None

        self.debug = debug

        self.gen_buckets()
        self.assign_buckets()
        self.start_epoch()

    def gen_buckets(self):
        if self.debug:
            timer = time.perf_counter()
        resolutions = []
        aspects = []
        w = self.min_dim
        while (w/self.f) * (self.min_dim/self.f) <= self.max_tokens and w <= self.dim_limit:
            h = self.min_dim
            got_base = False
            while (w/self.f) * ((h+self.div)/self.f) <= self.max_tokens and (h+self.div) <= self.dim_limit:
                if w == self.base_res[0] and h == self.base_res[1]:
                    got_base = True
                h += self.div
            if (w != self.base_res[0] or h != self.base_res[1]) and got_base:
                resolutions.append(self.base_res)
                aspects.append(1)
            resolutions.append((w, h))
            aspects.append(float(w)/float(h))
            w += self.div
        h = self.min_dim
        while (h/self.f) * (self.min_dim/self.f) <= self.max_tokens and h <= self.dim_limit:
            w = self.min_dim
            got_base = False
            while (h/self.f) * ((w+self.div)/self.f) <= self.max_tokens and (w+self.div) <= self.dim_limit:
                if w == self.base_res[0] and h == self.base_res[1]:
                    got_base = True
                w += self.div
            resolutions.append((w, h))
            aspects.append(float(w)/float(h))
            h += self.div
        res_map = {}
        for i, res in enumerate(resolutions):
            res_map[res] = aspects[i]
        self.resolutions = sorted(res_map.keys(), key=lambda x: x[0] * 4096 - x[1])
        self.aspects = np.array(list(map(lambda x: res_map[x], self.resolutions)))
        self.resolutions = np.array(self.resolutions)
        if self.debug:
            timer = time.perf_counter() - timer
            print(f"resolutions:\n{self.resolutions}")
            print(f"aspects:\n{self.aspects}")
            print(f"gen_buckets: {timer:.5f}s")

    def assign_buckets(self):
        if self.debug:
            timer = time.perf_counter()
        self.buckets = {}
        self.aspect_errors = []
        skipped = 0
        skip_list = []
        for post_id in self.res_map.keys():
            w, h = self.res_map[post_id]
            aspect = float(w)/float(h)
            bucket_id = np.abs(self.aspects - aspect).argmin()
            if bucket_id not in self.buckets:
                self.buckets[bucket_id] = []
            error = abs(self.aspects[bucket_id] - aspect)
            if error < self.max_ar_error:
                self.buckets[bucket_id].append(post_id)
                if self.debug:
                    self.aspect_errors.append(error)
            else:
                skipped += 1
                skip_list.append(post_id)
        for post_id in skip_list:
            del self.res_map[post_id]
        if self.debug:
            timer = time.perf_counter() - timer
            self.aspect_errors = np.array(self.aspect_errors)
            print(f"skipped images: {skipped}")
            print(f"aspect error: mean {self.aspect_errors.mean()}, median {np.median(self.aspect_errors)}, max {self.aspect_errors.max()}")
            for bucket_id in reversed(sorted(self.buckets.keys(), key=lambda b: len(self.buckets[b]))):
                print(f"bucket {bucket_id}: {self.resolutions[bucket_id]}, aspect {self.aspects[bucket_id]:.5f}, entries {len(self.buckets[bucket_id])}")
            print(f"assign_buckets: {timer:.5f}s")

    def start_epoch(self, world_size=None, global_rank=None):
        if self.debug:
            timer = time.perf_counter()
        if world_size is not None:
            self.world_size = world_size
        if global_rank is not None:
            self.global_rank = global_rank

        # select ids for this epoch/rank
        index = np.array(sorted(list(self.res_map.keys())))
        index_len = index.shape[0]
        index = self.epoch_prng.permutation(index)
        index = index[:index_len - (index_len % (self.bsz * self.world_size))]
        #print("perm", self.global_rank, index[0:16])
        index = index[self.global_rank::self.world_size]
        self.batch_total = index.shape[0] // self.bsz
        assert(index.shape[0] % self.bsz == 0)
        index = set(index)

        self.epoch = {}
        self.left_over = []
        self.batch_delivered = 0
        for bucket_id in sorted(self.buckets.keys()):
            if len(self.buckets[bucket_id]) > 0:
                self.epoch[bucket_id] = np.array([post_id for post_id in self.buckets[bucket_id] if post_id in index], dtype=np.int64)
                self.prng.shuffle(self.epoch[bucket_id])
                self.epoch[bucket_id] = list(self.epoch[bucket_id])
                overhang = len(self.epoch[bucket_id]) % self.bsz
                if overhang != 0:
                    self.left_over.extend(self.epoch[bucket_id][:overhang])
                    self.epoch[bucket_id] = self.epoch[bucket_id][overhang:]
                if len(self.epoch[bucket_id]) == 0:
                    del self.epoch[bucket_id]

        if self.debug:
            timer = time.perf_counter() - timer
            count = 0
            for bucket_id in self.epoch.keys():
                count += len(self.epoch[bucket_id])
            print(f"correct item count: {count == len(index)} ({count} of {len(index)})")
            print(f"start_epoch: {timer:.5f}s")

    def get_batch(self):
        if self.debug:
            timer = time.perf_counter()
        # check if no data left or no epoch initialized
        if self.epoch is None or self.left_over is None or (len(self.left_over) == 0 and not bool(self.epoch)) or self.batch_total == self.batch_delivered:
            self.start_epoch()

        found_batch = False
        batch_data = None
        resolution = self.base_res
        while not found_batch:
            bucket_ids = list(self.epoch.keys())
            if len(self.left_over) >= self.bsz:
                bucket_probs = [len(self.left_over)] + [len(self.epoch[bucket_id]) for bucket_id in bucket_ids]
                bucket_ids = [-1] + bucket_ids
            else:
                bucket_probs = [len(self.epoch[bucket_id]) for bucket_id in bucket_ids]
            bucket_probs = np.array(bucket_probs, dtype=np.float32)
            bucket_lens = bucket_probs
            bucket_probs = bucket_probs / bucket_probs.sum()
            bucket_ids = np.array(bucket_ids, dtype=np.int64)
            if bool(self.epoch):
                chosen_id = int(self.prng.choice(bucket_ids, 1, p=bucket_probs)[0])
            else:
                chosen_id = -1

            if chosen_id == -1:
                # using leftover images that couldn't make it into a bucketed batch and returning them for use with basic square image
                self.prng.shuffle(self.left_over)
                batch_data = self.left_over[:self.bsz]
                self.left_over = self.left_over[self.bsz:]
                found_batch = True
            else:
                if len(self.epoch[chosen_id]) >= self.bsz:
                    # return bucket batch and resolution
                    batch_data = self.epoch[chosen_id][:self.bsz]
                    self.epoch[chosen_id] = self.epoch[chosen_id][self.bsz:]
                    resolution = tuple(self.resolutions[chosen_id])
                    found_batch = True
                    if len(self.epoch[chosen_id]) == 0:
                        del self.epoch[chosen_id]
                else:
                    # can't make a batch from this, not enough images. move them to leftovers and try again
                    self.left_over.extend(self.epoch[chosen_id])
                    del self.epoch[chosen_id]

            assert(found_batch or len(self.left_over) >= self.bsz or bool(self.epoch))

        if self.debug:
            timer = time.perf_counter() - timer
            print(f"bucket probs: " + ", ".join(map(lambda x: f"{x:.2f}", list(bucket_probs*100))))
            print(f"chosen id: {chosen_id}")
            print(f"batch data: {batch_data}")
            print(f"resolution: {resolution}")
            print(f"get_batch: {timer:.5f}s")

        self.batch_delivered += 1
        return (batch_data, resolution)

    def generator(self):
        if self.batch_delivered >= self.batch_total:
            self.start_epoch()
        while self.batch_delivered < self.batch_total:
            yield self.get_batch()

if __name__ == "__main__":
    # prepare a pickle with mapping of dataset IDs to resolutions called resolutions.pkl to use this
    with open("resolutions.pkl", "rb") as fh:
        ids = list(pickle.load(fh).keys())

    counts = np.zeros((len(ids),)).astype(np.int64)
    id_map = {}
    for i, post_id in enumerate(ids):
        id_map[post_id] = i

    bm = BucketManager("resolutions.pkl", debug=True, bsz=8, world_size=8, global_rank=3)
    print("got: " + str(bm.get_batch()))
    print("got: " + str(bm.get_batch()))
    print("got: " + str(bm.get_batch()))
    print("got: " + str(bm.get_batch()))
    print("got: " + str(bm.get_batch()))
    print("got: " + str(bm.get_batch()))
    print("got: " + str(bm.get_batch()))

    bm = BucketManager("resolutions.pkl", bsz=8, world_size=1, global_rank=0, valid_ids=ids[0:16])
    for _ in range(16):
        bm.get_batch()
    print("got from future epoch: " + str(bm.get_batch()))

    bms = []
    for rank in range(16):
        bm = BucketManager("resolutions.pkl", bsz=8, world_size=16, global_rank=rank)
        bms.append(bm)
    for epoch in range(5):
        print(f"epoch {epoch}")
        for i, bm in enumerate(bms):
            print(f"bm {i}")
            first = True
            for ids, res in bm.generator():
                if first and i == 0:
                    #print(ids)
                    first = False
                for post_id in ids:
                    counts[id_map[post_id]] += 1
        print(np.bincount(counts))

原文地址：https://blog.csdn.net/qq_41895747/article/details/140319150

免责声明：本站文章内容转载自网络资源，如本站内容侵犯了原著者的合法权益，可联系本站删除。更多内容请关注自学内容网（zxcms.com）！

上一篇：Linux C服务需要在A服务和B服务都启动成功后才能启动
下一篇：各种复现，保证质量

【大数据学习 | flume】flume Sink Processors与拦截器Interceptor
比如：一个日志文件(多个系统的日志都在该文件中)，根据日志中某个字段值，比如type=1，是系统A日志，sink to hdfs；type=2，是系统B日志，sink to kafka，此时就可以使用
阅读更多2024-11-17
5. langgraph中的react agent使用 (从零构建一个react agent)
首先，我们需要定义 Agent 的状态，这包括 Agent 所持有的消息。Annotated,Sequence,TypedDict,我们需要定义工具节点和模型调用节点，以便在 Agent 工作流中使用
阅读更多2024-11-17
37.超级简易的计算器 C语言
超级简单，简单到甚至这个计算器输入都比较反人类。
阅读更多2024-11-17
TCP Analysis Flags 之 TCP Dup ACK
TCP 段大小为 0窗口大小非零且没有改变，或者有有效的 SACK 数据下一个期望的 Seq Num 和 LastACK Num 是非 0 的（即连接已经建立）没有设置 SYN、FIN、RST具体的代
阅读更多2024-11-17
【AI图像生成网站&Golang】雪花算法
使用更精准的时间单位。提供了自定义机器 ID 的能力。支持长时间运行，且避免了时钟回拨问题。改进点Snowflake 的问题Sonyflake 的优化符号位固定占用 1 位，没有实际用途去掉符号位，增
阅读更多2024-11-17
【go从零单排】Directories、Temporary Files and Directories目录和临时目录、临时文件
在 Go 语言中，path/filepath 包提供了一组用于处理文件路径的函数，特别是与文件系统路径相关的操作。这个包是 Go 标准库的一部分，主要用于跨平台的路径操作，确保在不同操作系统上（如 W
阅读更多2024-11-17
菜叶子芯酸笔记4：大模型训练、分布式训练、显存估算
大模型训练任务主要分为以下三种模型训练过程。预训练pretrain监督微调 supervised finetune training奖励模型 reward model。
阅读更多2024-11-17
前端面试笔试（四）
RADIUS是一种分布式的、客户端/服务器结构的信息交互协议，"100"是一个有效的数字，它等于十进制的4。哈希表有10个元素，采用的hash函数为H(key)=key%10，用线
阅读更多2024-11-17
力扣-Hot100-链表其一【算法学习day.34】
##我做这类文档一个重要的目的还是给正在学习的大家提供方向（例如想要掌握基础用法，该刷哪些题？）我的解析也不会做的非常详细，只会提供思路和一些关键点，力扣上的大佬们的题解质量是非常非常高滴！！！
阅读更多2024-11-17
机器学习实战笔记30-31：逻辑回归及对应调参实验代码
Class_weight:输入{0:1,1:3}则代表1类样本的每条数据在计算损失函数时都会*3，当输入balanced，则调整为真实样本比例的反比，以达到平衡，但实际情况中不常用。#UI多迭代10的
阅读更多2024-11-17

代码解读：Diffusion Models中的长宽桶技术(Aspect Ratio Bucketing)

原理解读-原有训练的问题

长宽桶技术(Aspect Ratio Bucketing)

完整代码

相关文章