Critical RCE Vulnerability in SGLang: CVE-2026-5760 Threatens AI Serving Infrastructure

TL;DR

A critical remote code execution (RCE) vulnerability, tracked as CVE-2026-5760 (CVSS 9.8), has been discovered in the SGLang LLM serving framework. The flaw resides in the reranking endpoint and allows attackers to execute arbitrary Python code via malicious GGUF model files containing crafted Jinja2 templates. No official patch was reported during the coordination process.

The landscape of AI infrastructure security is facing a new challenge with the disclosure of a "critical" security vulnerability in SGLang, a popular open-source framework used for high-performance serving of Large Language Models (LLMs) and multimodal models.

Tracked as CVE-2026-5760, the vulnerability carries a near-perfect CVSS score of 9.8, signifying its high impact and ease of exploitation. If leveraged successfully, the flaw allows a remote attacker to achieve full code execution on the underlying server hosting the SGLang service.

The Vulnerability: Command Injection via GGUF

According to the CERT Coordination Center (CERT/CC), the issue is a case of command injection facilitated by a Server-Side Template Injection (SSTI). The vulnerability specifically impacts the /v1/rerank endpoint of the SGLang service.

The attack vector involves GPT-Generated Unified Format (GGUF) model files. An attacker can craft a malicious GGUF file by embedding a payload within the tokenizer.chat_template parameter. This parameter uses the Jinja2 templating engine, which SGLang utilizes to process chat formats.

Attack Vector and Execution Flow

Security researcher Stuart Beck, who discovered the flaw, noted that the root cause is the use of jinja2.Environment() without proper sandboxing. By failing to use ImmutableSandboxedEnvironment, the framework allows the template to escape its intended bounds and execute system-level commands.

The attack typically follows this sequence:

Payload Creation: An attacker creates a GGUF model file containing a Jinja2 SSTI payload within the tokenizer.chat_template.
Trigger Mechanism: The template includes a specific "Qwen3 reranker trigger phrase" designed to activate the vulnerable code path located in entrypoints/openai/serving_rerank.py.
Distribution: The malicious model is uploaded to public repositories like Hugging Face.
Infection: A victim downloads and loads the malicious model into their SGLang instance.
Execution: When a request is made to the /v1/rerank endpoint, SGLang renders the malicious template, executing the attacker's Python code on the server.

A Growing Trend in AI Vulnerabilities

CVE-2026-5760 is part of an emerging class of "Model-as-Code" vulnerabilities. It bears a striking resemblance to CVE-2024-34359 (famously known as "Llama Drama"), a critical flaw in llama_cpp_python. Similar issues were also recently addressed in the vLLM framework (CVE-2025-61620).

These vulnerabilities highlight a significant shift in the threat model for AI: the model weights and configuration files themselves can act as delivery vehicles for malware, turning a simple "model load" into a full system compromise.

Status and Mitigation

At the time of disclosure, the CERT/CC reported that no response or patch was obtained from the SGLang maintainers during the coordination process. Given that SGLang is a widely used project—with over 26,100 stars and 5,500 forks on GitHub—this represents a significant risk to the AI research and deployment community.

Recommended Actions

To mitigate this risk, security experts and CERT/CC recommend:

Code Modification: Manually update the SGLang source code to use ImmutableSandboxedEnvironment instead of jinja2.Environment() when rendering chat templates.
Model Provenance: Only download and deploy models from trusted, verified sources. Avoid loading GGUF files from unknown or unverified contributors on public hubs.
Network Segmentation: Restrict the SGLang server's ability to make outbound connections to prevent "reverse shell" scenarios often associated with RCE.

Users of SGLang are advised to monitor the official GitHub repository closely for any upcoming security patches or community-contributed fixes.

Source: The Hacker News - SGLang CVE-2026-5760 Enables RCE via Malicious GGUF Model Files