Skip to content

vllm.kernels.vllm_c

CUDA_ALIKE module-attribute

CUDA_ALIKE = is_cuda_alike()

Most kernels in this file are supported on all CUDA-alike platforms.

rms_no_var_size module-attribute

rms_no_var_size = (
    lambda x, weight, epsilon, variance_size=None: (
        variance_size is None
        and (weight is None or dtype == dtype)
    )
)

vLLM kernel requires no variance_size override and matching input/weight dtype.

gelu_fast

gelu_fast(x: Tensor) -> Tensor

Fast GELU activation function using vLLM C++ kernel.

Formula: 0.5 * x * (1.0 + tanh(x * 0.7978845608 * (1.0 + 0.044715 * x^2)))

Source code in vllm/kernels/vllm_c.py
@ir.ops.gelu_fast.register_impl("vllm_c", supported=CUDA_ALIKE)
def gelu_fast(x: Tensor) -> Tensor:
    """
    Fast GELU activation function using vLLM C++ kernel.

    Formula: 0.5 * x * (1.0 + tanh(x * 0.7978845608 * (1.0 + 0.044715 * x^2)))
    """
    out = torch.empty_like(x)
    torch.ops._C.gelu_fast(out, x)
    return out

gelu_new

gelu_new(x: Tensor) -> Tensor

New GELU activation function using vLLM C++ kernel.

Formula: 0.5 * x * (1.0 + tanh(sqrt(2/pi) * (x + 0.044715 * x^3)))

Source code in vllm/kernels/vllm_c.py
@ir.ops.gelu_new.register_impl("vllm_c", supported=CUDA_ALIKE)
def gelu_new(x: Tensor) -> Tensor:
    """
    New GELU activation function using vLLM C++ kernel.

    Formula: 0.5 * x * (1.0 + tanh(sqrt(2/pi) * (x + 0.044715 * x^3)))
    """
    out = torch.empty_like(x)
    torch.ops._C.gelu_new(out, x)
    return out

quick_gelu

quick_gelu(x: Tensor) -> Tensor

Quick GELU activation function using vLLM C++ kernel.

Formula: x * sigmoid(1.702 * x)

Source code in vllm/kernels/vllm_c.py
@ir.ops.quick_gelu.register_impl("vllm_c", supported=CUDA_ALIKE)
def quick_gelu(x: Tensor) -> Tensor:
    """
    Quick GELU activation function using vLLM C++ kernel.

    Formula: x * sigmoid(1.702 * x)
    """
    out = torch.empty_like(x)
    torch.ops._C.gelu_quick(out, x)
    return out