注意力机制深度解析

Transformer 架构的核心就是注意力机制。

 自注意力 (Self-Attention)

python
def attention(Q, K, V):
    scores = torch.matmul(Q, K.transpose(-2, -1)) / sqrt(d_k)
    weights = softmax(scores)
    return torch.matmul(weights, V)


 多头注意力

多个注意力头并行工作，捕捉不同特征。

 应用场景

- 机器翻译
- 文本摘要
- 图像识别

你对注意力机制有什么疑问？