vllm.ir.ops ¶

Modules:

Name	Description
`activation`
`layernorm`

gelu_fast ¶

gelu_fast(x: Tensor) -> Tensor

Fast GELU activation function.

Formula: 0.5 * x * (1.0 + tanh(x * 0.7978845608 * (1.0 + 0.044715 * x^2)))

A computationally efficient approximation of the GELU function.

Source code in vllm/ir/ops/activation.py

@register_op
def gelu_fast(x: Tensor) -> Tensor:
    """
    Fast GELU activation function.

    Formula: 0.5 * x * (1.0 + tanh(x * 0.7978845608 * (1.0 + 0.044715 * x^2)))

    A computationally efficient approximation of the GELU function.
    """
    return 0.5 * x * (
        1.0 + torch.tanh(x * 0.7978845608 * (1.0 + 0.044715 * x * x))
    )

gelu_new ¶

gelu_new(x: Tensor) -> Tensor

New GELU activation function.

Formula: 0.5 * x * (1.0 + tanh(sqrt(2/pi) * (x + 0.044715 * x^3)))

This is the GELU approximation used in GPT-2 and other transformer models.

Source code in vllm/ir/ops/activation.py

@register_op
def gelu_new(x: Tensor) -> Tensor:
    """
    New GELU activation function.

    Formula: 0.5 * x * (1.0 + tanh(sqrt(2/pi) * (x + 0.044715 * x^3)))

    This is the GELU approximation used in GPT-2 and other transformer models.
    """
    return 0.5 * x * (1.0 + torch.tanh(c_gelu_new * (x + 0.044715 * torch.pow(x, 3.0))))

quick_gelu ¶

quick_gelu(x: Tensor) -> Tensor

Quick GELU activation function.

Formula: x * sigmoid(1.702 * x)

A fast approximation of GELU used in various transformer models. Reference: https://github.com/huggingface/transformers/blob/main/src/transformers/activations.py#L90

Source code in vllm/ir/ops/activation.py

@register_op
def quick_gelu(x: Tensor) -> Tensor:
    """
    Quick GELU activation function.

    Formula: x * sigmoid(1.702 * x)

    A fast approximation of GELU used in various transformer models.
    Reference: https://github.com/huggingface/transformers/blob/main/src/transformers/activations.py#L90
    """
    return x * torch.sigmoid(1.702 * x)

rms_norm ¶

rms_norm(
    x: Tensor,
    weight: Tensor | None,
    epsilon: float,
    variance_size: int | None = None,
) -> Tensor

Weighted root-mean-square layer normalization

Source code in vllm/ir/ops/layernorm.py

@register_op
def rms_norm(
    x: Tensor, weight: Tensor | None, epsilon: float, variance_size: int | None = None
) -> Tensor:
    """Weighted root-mean-square layer normalization"""
    orig_dtype = x.dtype
    x = x.to(torch.float32)
    x_var = x if variance_size is None else x[..., :variance_size]
    variance = x_var.pow(2).mean(dim=-1, keepdim=True)
    x = x * torch.rsqrt(variance + epsilon)
    if weight is not None:
        x = x.to(weight.dtype) * weight
    return x.to(orig_dtype)