Cloudflare Network Down for Hours Due to Oversized Feature File, Not Cyberattack

On 18 November 2025, Cloudflare experienced a major network outage beginning at 11:20 UTC, affecting the delivery of core network traffic. Internet users attempting to access sites protected by Cloudflare were met with error pages indicating failures within the company’s network. The incident was not caused by a cyberattack but stemmed from a change to the permissions of one of Cloudflare’s database systems. This change caused a critical “feature file” used by the Bot Management system to double in size, exceeding the software’s preconfigured limits and triggering widespread errors across the network.

Initially, Cloudflare suspected a hyper-scale DDoS attack, but the root cause was quickly identified as the oversized feature file. The company halted the propagation of the faulty file and replaced it with a previous, functional version. Core traffic largely returned to normal by 14:30 UTC, with full restoration of all systems achieved by 17:06 UTC. During the outage, increased load on various parts of the network caused temporary latency spikes and service disruptions across several products.

The outage affected multiple Cloudflare services, including its core CDN and security services, Turnstile, Workers KV, Access, and the Dashboard. HTTP 5xx errors were widespread, authentication failures occurred for Access users, and Turnstile login services were temporarily unavailable. Email Security experienced minor disruptions, mainly in spam detection, but with no critical impact. The incident also highlighted the vulnerability of the company’s own status page, which coincidentally went offline during the outage, complicating initial diagnosis.

The failure originated from a modification in Cloudflare’s ClickHouse database query system. The update inadvertently caused duplicate entries in the feature file used by the Bot Management module, which then exceeded the module’s memory preallocation limits. This triggered unhandled errors in the core proxy system, leading to 5xx errors for customer traffic and impacting downstream services dependent on the proxy. Customers using Cloudflare’s new FL2 proxy engine were particularly affected, while those on the older proxy saw bot scores misapplied rather than full outages.

Cloudflare responded by implementing temporary workarounds, including bypassing the core proxy for Workers KV and Access, manually restoring a known good configuration file, and restarting impacted systems. These measures gradually reduced error rates, restored authentication services, and brought the network fully back online.

In response to the outage, Cloudflare is undertaking measures to prevent similar failures, including hardening ingestion of configuration files, implementing global kill switches for features, managing error reports to avoid system overload, and reviewing failure modes across core proxy modules. Cloudflare described this outage as its most significant since 2019 and expressed apologies for the disruption, emphasizing the company’s commitment to maintaining a highly resilient and reliable Internet infrastructure.

Leave a Reply

Your email address will not be published. Required fields are marked *