Pytorch常用的函数(九)torch.gather()用法

🕗 发布于 2024-05-15 15:49 python 人工智能

Pytorch常用的函数(九)torch.gather()用法

torch.gather() 就是在指定维度上收集value。

torch.gather() 的必填也是最常用的参数有三个，下面引用官方解释：

input (Tensor) – the source tensor
dim (int) – the axis along which to index
index (LongTensor) – the indices of elements to gather

一句话概括 gather 操作就是：根据 index ，在 input 的 dim 维度上收集 value。

1、举例直观理解

# 1、我们有input_tensor如下
>>> input_tensor = torch.arange(24).reshape(2, 3, 4)
tensor([[[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]],

        [[12, 13, 14, 15],
         [16, 17, 18, 19],
         [20, 21, 22, 23]]])

# 2、我们有index_tensor如下
>>> index_tensor = torch.tensor(
       [[[0, 0, 0, 0],
         [2, 2, 2, 2]],
         
        [[0, 0, 0, 0],
         [2, 2, 2, 2]]]
)

# 3、我们通过torch.gather()函数获取out_tensor
>>> out_tensor = torch.gather(input_tensor, dim=1, index=index_tensor)
tensor([[[ 0,  1,  2,  3],
         [ 8,  9, 10, 11]],
         
        [[12, 13, 14, 15],
         [20, 21, 22, 23]]])

我们以out_tensor中[0,1,0]=8为例，解释下如何利用dim和index，从input_tensor中获得8。

在这里插入图片描述

根据上图，我们很直观的了解根据 index ，在 input 的 dim 维度上收集 value的过程。

假设 input 和 index 均为三维数组，那么输出 tensor 每个位置的索引是列表 [i, j, k] ，正常来说我们直接取 input[i, j, k] 作为输出 tensor 对应位置的值即可；
但是由于 dim 的存在以及 input.shape 可能不等于 index.shape ，所以直接取值可能就会报错；
所以我们是将索引列表的相应位置替换为 dim ，再去 input 取值。在上面示例中，由于dim=1，那么我们就替换索引列表第1个值，即[i,dim,k]，因此由原来的[0,1,0]替换为[0,2,0]后，再去input_tensor中取值。
pytorch官方文档的写法如下，同一个意思。

out[i][j][k] = input[index[i][j][k]][j][k]  # if dim == 0
out[i][j][k] = input[i][index[i][j][k]][k]  # if dim == 1
out[i][j][k] = input[i][j][index[i][j][k]]  # if dim == 2

2、反推法再理解

# 1、我们有input_tensor如下
>>> input_tensor = torch.arange(24).reshape(2, 3, 4)
tensor([[[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]],

        [[12, 13, 14, 15],
         [16, 17, 18, 19],
         [20, 21, 22, 23]]])

# 2、假设我们要得到out_tensor如下
>>> out_tensor
tensor([[[ 0,  1,  2,  3],
         [ 8,  9, 10, 11]],
         
        [[12, 13, 14, 15],
         [20, 21, 22, 23]]])、
         
# 3、如何知道dim 和 index_tensor呢？ 
# 首先，我们要记住：out_tensor的shape = index_tensor的shape

# 从 output_tensor 的第一个位置开始：
# 此时[i, j, k]一样，看不出来 dim 应该是多少
output_tensor[0, 0, :] = input_tensor[0, 0, :] = 0
# 同理可知，此时index都为0
output_tensor[0, 0, 1] = input_tensor[0, 0, 1] = 1
output_tensor[0, 0, 2] = input_tensor[0, 0, 2] = 2
output_tensor[0, 0, 3] = input_tensor[0, 0, 3] = 3

# 我们从下一行的第一个位置开始：
# 这里我们看到维度 1 发生了变化，1 变成了 2，所以 dim 应该是 1，而 index 应为 2
output_tensor[0, 1, 0] = input_tensor[0, 2, 0] = 8
# 同理可知，此时index都为2
output_tensor[0, 1, 1] = input_tensor[0, 2, 1] = 9
output_tensor[0, 1, 2] = input_tensor[0, 2, 2] = 10
output_tensor[0, 1, 3] = input_tensor[0, 2, 3] = 11

# 根据上面推导我们易知dim=1,index_tensor为：
>>> index_tensor = torch.tensor(
       [[[0, 0, 0, 0],
         [2, 2, 2, 2]],
         
        [[0, 0, 0, 0],
         [2, 2, 2, 2]]]
)

3、实际案例

在大神何凯明MAE模型(Masked Autoencoders Are Scalable Vision Learners)源码中，多次使用了torch.gather() 函数。

论文链接：https://arxiv.org/pdf/2111.06377
官方源码：https://github.com/facebookresearch/mae

在MAE中根据预设的掩码比例(paper 中提倡的是 75%)，使用服从均匀分布的随机采样策略采样一部分 tokens 送给 Encoder，另一部分mask 掉。采样25%作为unmasked tokens过程中，使用了torch.gather() 函数。

# models_mae.py

import torch

def random_masking(x, mask_ratio=0.75):
    """
    Perform per-sample random masking by per-sample shuffling.
    Per-sample shuffling is done by argsort random noise.
    x: [N, L, D], sequence
    """
    N, L, D = x.shape  # batch, length, dim
    len_keep = int(L * (1 - mask_ratio))  # 计算unmasked的片数
    # 利用0-1均匀分布进行采样，避免潜在的【中心归纳偏好】
    noise = torch.rand(N, L, device=x.device)  # noise in [0, 1]

    # sort noise for each sample【核心代码】
    ids_shuffle = torch.argsort(noise, dim=1)  # ascend: small is keep, large is remove
    ids_restore = torch.argsort(ids_shuffle, dim=1)

    # keep the first subset
    ids_keep = ids_shuffle[:, :len_keep]
    # 利用torch.gather()从源tensor中获取25%的unmasked tokens
    x_masked = torch.gather(x, dim=1, index=ids_keep.unsqueeze(-1).repeat(1, 1, D))

    # generate the binary mask: 0 is keep, 1 is remove
    mask = torch.ones([N, L], device=x.device)
    mask[:, :len_keep] = 0
    # unshuffle to get the binary mask
    mask = torch.gather(mask, dim=1, index=ids_restore)

    return x_masked, mask, ids_restore

if __name__ == '__main__':
    x = torch.arange(64).reshape(1, 16, 4)
    random_masking(x)

# x模拟一张图片经过patch_embedding后的序列
# x相当于input_tensor
# 16是patch数量，实际上一般为(img_size/patch_size)^2 = (224 / 16)^2 = 14*14=196
# 4是一个patch中像素个数，这里只是模拟，实际上一般为（in_chans * patch_size * patch_size = 3*16*16 = 768）
>>> x = torch.arange(64).reshape(1, 16, 4) 
tensor([[[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11],
         [12, 13, 14, 15],
         [16, 17, 18, 19], # 4
         [20, 21, 22, 23],
         [24, 25, 26, 27],
         [28, 29, 30, 31],
         [32, 33, 34, 35],
         [36, 37, 38, 39],
         [40, 41, 42, 43], # 10
         [44, 45, 46, 47],
         [48, 49, 50, 51], # 12
         [52, 53, 54, 55], # 13
         [56, 57, 58, 59],
         [60, 61, 62, 63]]])
# dim=1, index相当于index_tensor
>>> index
tensor([[[10, 10, 10, 10],
         [12, 12, 12, 12],
         [ 4,  4,  4,  4],
         [13, 13, 13, 13]]])


# x_masked(从源tensor即x中，随机获取25%(4个patch)的unmasked tokens)     
>>> x_masked相当于out_tensor
tensor([[[40, 41, 42, 43],
         [48, 49, 50, 51],
         [16, 17, 18, 19],
         [52, 53, 54, 55]]])

原文地址：https://blog.csdn.net/qq_44665283/article/details/138576187

免责声明：本站文章内容转载自网络资源，如本站内容侵犯了原著者的合法权益，可联系本站删除。更多内容请关注自学内容网（zxcms.com）！

window10解决 docker is starting 问题
win10 需要开启 Hyper-V。在程序和功能中开启服务Server (不开启的话，安装完会报错)安装toolbox 最新版 Toolbox 下载地址：访问，注册一个账号，然后登录。点击 Get
阅读更多2024-11-06
JavaScript数据类型- BigInt详解（处理任意大小整数的终极指南）
随着ECMAScript 11（ES11）引入了BigInt，JavaScript开发者现在可以轻松地处理超出传统Number类型限制的大整数。本文全面解析了BigInt的使用方法、核心特性和限制条件
阅读更多2024-11-06
ubuntu下使用pocketsphinx进行语音识别
由于工作需要语音识别的功能，环境是在linux arm版上，所以想先在ubuntu上跑起来看一看，就找了一下语音识别的开源框架，选中了很多框架可以看编译vosk那篇文章，现在一一试验一下。网上对于po
阅读更多2024-11-06
时间段比较与 SQL 实现：交集、并集与补集
时间段比较是一个非常常见的操作，尤其是在涉及调度、事件分析和时间管理的应用中。通过掌握 SQL 中关于时间段交集、并集和补集的查询方式，我们能够高效地处理时间段相关的数据。希望本文的示例和分析能帮助你
阅读更多2024-11-06
flink 内存配置（四）：内存调优和问题处理
本节解释如何根据用例设置内存，以及每种用例中哪些选项是重要的。
阅读更多2024-11-06
深度学习：解密图像、音频和视频数据的“理解”之道20241105
深度学习是一个充满无限可能的领域。无论是图像、音频还是视频数据，深度学习模型都能找到最有效的方式进行处理和理解。希望这篇文章让你对深度学习背后的数据解析过程有了更清晰的认识。你有什么见解或疑问？欢迎留
阅读更多2024-11-06
TOSHIBA 74VHC00FT COMS汽车、工业企业的选择
74VHC00FT集成了四个独立的 NAND 门，每个门都有两个输入。NAND 门是数字逻辑电路中的基本构建模块，只有当所有输入都为高时输出为低。该芯片可以在各种设备中执行逻辑操作，包括计算机、计算器
阅读更多2024-11-06
19. 架构重要需求
对架构师来说，并非所有需求都是同等重要的。有些需求对架构的影响比其他需求大得多。一个 “**架构重要需求（ASR）**” 是一个将对架构产生深远影响的需求 —— 也就是说，如果没有这样的需求，架构很可
阅读更多2024-11-06
[大模型]视频生成-Sora简析
Sora模型的简述
阅读更多2024-11-06
Kubernetes的概述与架构
Kubernetes的概述与架构。
阅读更多2024-11-06

Pytorch常用的函数(九)torch.gather()用法