AF3 AttentionPairBias类源码解读

🕗 发布于 2025-01-22 06:40 pytorch 人工智能 python 生物信息学 深度学习

AttentionPairBias 是 AlphaFold3 的一个注意力机制模块，设计用于实现全自注意力（Full Self-Attention）并结合成对表示的偏置（Pair Bias）。它在 AlphaFold3 的架构中发挥重要作用，特别是在处理蛋白质序列和空间对称性相关的任务时。

源代码：

class AttentionPairBias(nn.Module):
    """Full self-attention with pair bias."""

    def __init__(
            self,
            dim: int,
            c_pair: int = 16,
            no_heads: int = 8,
            dropout: float = 0.0,
            input_gating: bool = True,
            residual: bool = True,
            inf: float = 1e8,
    ):
        """Initialize the AttentionPairBias module.
        Args:
            dim:
                Total dimension of the model.
            c_pair:
                The number of channels for the pair representation. Defaults to 16.
            no_heads:
                Number of parallel attention heads. Note that c_atom will be split across no_heads
                (i.e. each head will have dimension c_atom // no_heads).
            dropout:
                Dropout probability on attn_output_weights. Default: 0.0 (no dropout).
            residual:
                Whether the module is used as a residual block. Default: True. This affects the initialization
                of the final projection layer of the MHA attention.
            input_gating:
                Whether the single representation should be gated with another single-like representation using
                adaptive layer normalization. Default: True.
        """
        super().__init__()
        self.dim = dim
        self.c_pair = c_pair
        self.num_heads = no_heads
        self.dropout = dropout
        self.input_gating = input_gating
        self.inf = inf

        # Perform check for dimensionality
        assert dim % no_heads == 0, f"the model dimensionality ({dim}) should be divisible by the " \
                                    f"number of heads ({no_heads}) "
        # Projections
        self.input_proj = None
        self.output_proj_linear = None
        if input_gating:
            self.input_proj = AdaLN(dim)

            # Output projection from AdaLN
            self.output_proj_linear = Linear(dim, dim, init='gating')
            self.output_proj_linear.bias = nn.Parameter(torch.ones(dim) * -2.0)  # gate values will be ~0.11
        else:
            self.input_proj = LayerNorm(dim)

        # Attention
        self.attention = Attention(
            c_q=dim,
            c_k=dim,
            c_v=dim,
            c_hidden=dim // no_heads,
            no_heads=no_heads,
            gating=True,
            residual=residual,
            proj_q_w_bias=True,
        )

        # Pair bias
        self.proj_pair_bias = nn.Sequential(
            LayerNorm(self.c_pair),
            LinearNoBias(self.c_pair, self.num_heads, init='normal')
        )

    def _prep_biases(
            self,
            single_repr: torch.Tensor,  # (*, S, N, c_s)
            pair_repr: torch.Tensor,  # (*, N, N, c_z)
            mask: Optional[torch.Tensor] = None,  # (*, N)
    ):
        """Prepares the mask and pair biases in the shapes expected by the DS4Science attention.

        Expected shapes for the DS4Science kernel:
        # Q, K, V: [Batch, N_seq, N_res, Head, Dim]
        # res_mask: [Batch, N_seq, 1, 1, N_res]
        # pair_bias: [Batch, 1, Head, N_res, N_res]
        """
        # Compute the single mask
        n_seq, n_res, _ = single_repr.shape[-3:]
        if mask is None:
            # [*, N_seq, N_res]
            mask = single_repr.new_ones(
                single_repr.shape[:-3] + (n_seq, n_res),
            )
        else:
            # Expand mask by N_seq (or samples per trunk)
            new_shape = (mask.shape[:-1] + (n_seq, n_res))  # (*, N_seq, N_res)
            mask = mask.unsqueeze(-2).expand(new_shape)
            mask = mask.to(single_repr.dtype)
            
        # [*, N_seq, 1, 1, N_res]
        mask_bias = (self.inf * (mask - 1))[..., :, None, None, :]

        # Project pair biases per head from pair representation
        pair_bias = self.proj_pair_bias(pair_repr)  # (bs, n_tokens, n_tokens, n_heads)
        pair_bias = rearrange(pair_bias, 'b i j h -> b h i j')  # # (bs, h, n, n)
        pair_bias = pair_bias.unsqueeze(-4)
        return mask_bias, pair_bias

    def forward(
            self,
            singl

原文地址：https://blog.csdn.net/qq_27390023/article/details/145290295

免责声明：本站文章内容转载自网络资源，如侵犯了原著者的合法权益，可联系本站删除。更多内容请关注自学内容网（zxcms.com）！

上一篇：以 RFID 为钥，开启民兵装备管理的科技之门
下一篇：python中Mako用法

【优选算法】3----快乐数
这是第三篇算法题，也是十分的不好想，虽然定位在简单，但我感觉还是有些难度的~
阅读更多2025-01-22
算法竞赛之差分进阶——等差数列差分 python
给定区间 [ l, r ]，把数组[ l, r ] 区间中的数加上一个首项s、末项e、公差为d的等差数列如何实现？
阅读更多2025-01-22
Linux:进程控制
• 另外，进程⼀旦变成僵⼫状态，那就⼑枪不⼊，“杀⼈不眨眼”的kill -9 也⽆能为⼒，因为谁也没有办法杀死⼀个已经死去的进程。通常，⽗⼦代码共享，⽗⼦再不写⼊时，数据也是共享的，当任意⼀⽅试图写⼊
阅读更多2025-01-22
2024：CSDN上的收获与蜕变——我的技术成长之旅
2024：CSDN上的收获与蜕变——我的技术成长之旅
阅读更多2025-01-22
【C】memory 详解
【C】memory详解
阅读更多2025-01-22
OpenMP并行编程实例系列3 —— 线程设置
等待同步：用在并行域内，所有线程执行到 barrier 都要停下等待，直到所有线程都执行到 barrier，然后再继续往下执行。指定并行域中的串行任务，创建仅由一个线程执行的任务，先到先执行，其他线程
阅读更多2025-01-22
【GitHub】登录时的2FA验证
如何进行2FA认证，以及验证后如何在新设备或新浏览器登录
阅读更多2025-01-22
【Python项目】小区监控图像拼接系统
科技的介入正在重塑我们的生活方式，而越来越多的人也开始享受这种由科技带来的便利和变化。例如，监控画面分散，安保人员需要在多个屏幕上查看不同的区域，这不仅增加了管理难度，也降低了监控效率。该系统利用先进
阅读更多2025-01-22
Mysql触发器（学习自用）
仅供学习使用，如有错误，欢迎指正。
阅读更多2025-01-22
堆的实现（C语言详解版）
（Heap）是一种特殊的完全二叉树，它满足父节点的值总是不大于或不小于其子节点的值。这种数据结构常用于实现优先队列，以及在各种排序算法中快速找到最大或最小元素。和。在最大堆中，父节点的值总是大于或等于
阅读更多2025-01-22

AF3 AttentionPairBias类源码解读

源代码：

相关文章