🔥 We Built a Real-Time Feed for 100M Users in 5 Days: The Guts of Meta’s Threads Architecture
No pressure, Mark. Just 100 million sign-ups in five days. The fastest-growing consumer app in history. Period.
When Threads launched on July 6, 2023, the engineering world collectively leaned forward. How did Meta—the company that already runs Facebook, Instagram, and WhatsApp—pull off the impossible? How do you scale a real-time, algorithmically-ranked feed from zero to 100 million daily active users in under a week without the whole thing collapsing into a fireball of 503s?
The internet was buzzing: “It’s just Instagram’s backend, right?” “They must have thrown infinite servers at it.” “It’s probably held together with duct tape and Mark’s sheer willpower.”
None of that is true. What actually happened is a masterclass in pre-scaled architecture, sharded real-time state, and the banality of genius infrastructure. Today, we’re tearing apart the Threads feed architecture—from the Fanout-on-Write deep freeze to the Cache-as-a-Database pattern that handles 10,000+ writes per second without breaking a sweat.
Buckle up. This gets juicy.
🚀 The Context: Why Threads Broke the Internet
First, let’s set the stage. Meta already has 3 billion daily active users across its family of apps. Scaling isn’t new to them. But Threads was different:
- Zero-to-100M in 5 days (ChatGPT took 2 months, TikTok took 9 months)
- Pure text-first, real-time social feed (no algorithmic reshuffling for the first few weeks)
- Tightly coupled to Instagram’s existing identity graph
The hype was insane. Every tech journalist wrote the same headline: “Twitter Killer Arrives.” But behind the scenes, the engineering story was even more fascinating. Meta didn’t build a new backend from scratch. They forked Instagram’s infrastructure and made a few critical, brutalist decisions to handle the velocity of a real-time feed.
The question isn’t “How did they scale to 100M?” The question is “How did they do it without any downtime, zero latency spikes, and a consistent feed that felt instant?”
⚙️ The Core Architecture: It’s All About the Fanout
Let’s start with the single most important architectural decision in any social feed system: the fanout model.
There are two classic approaches:
- Fanout-on-Read (Pull) : When you open the app, we compute your feed right now by fetching all your followed users’ recent posts. Heavy on read, light on write.
- Fanout-on-Write (Push) : When someone posts, we pre-compute the feed for all their followers by inserting that post into a per-user timeline list. Heavy on write, light on read.
Instagram historically used a hybrid model (mostly push for close friends, pull for everyone else). But for Threads? They went aggressively fanout-on-write for the entire feed. Why?
“Threads is a real-time conversation platform, not a curated discovery engine. The feed must feel immediate.” — Meta Engineer (internal memo)
Here’s the dirty secret: Fanout-on-Write doesn’t scale linearly if you have a user with 50 million followers. One post from @zuck could generate 50 million writes in a single second. That’s a write storm.
🔧 How They Survived the Write Storm
Meta’s solution is elegant and terrifying: Sharded Timeline Lists + Async Write Buffering.
Every user has a timeline list stored in Apache Cassandra (Meta runs one of the largest Cassandra clusters on Earth, internally called Manhattan). Each timeline is sharded into 256 partitions by user ID hash.
When a user posts:
1. The post lands in a **distributed write queue** (Kafka-like, but Meta uses their own internal system called **Scribe**)
2. The fanout worker picks up the post, fetches the author’s follower list from **TAO** (Meta’s graph database)
3. Workers shard the followers into batches of 1,000
4. Each batch gets written to the **timeline partition** of the follower’s shard
5. If a user has >10M followers, the fanout is **throttled** to a **warm cache** tier instead of hitting Cassandra directly
The key insight? They don’t fanout to absolutely everyone instantly. They use a two-tier fanout:
- Tier 1 (Hot followers) : Users who have interacted with the author in the last 30 days. These get the real-time push.
- Tier 2 (Cold followers) : Users who follow the author but rarely engage. These get the post added to their timeline on next read (lazy evaluation).
This reduces the write amplification by ~70% for high-follower accounts. Genius.
📦 The Data Layer: When Cassandra Becomes a Real-Time Queue
You might think a feed is just a list of post IDs sorted by timestamp. Simple, right? Wrong.
The Threads feed is actually a sorted set with five critical fields:
PostID (UUID)
Timestamp (Unix micros)
AuthorID (Int64)
Score (Float32) // For future algorithmic ranking
Status (Enum: visible, hidden_by_author, moderatged)
And here’s the brutal engineering truth: The feed is not a SQL database query. It’s a L0+L1 cache hierarchy with a write-back pattern.
🧠 The Cache-as-a-Database Pattern
Most startups build feeds by writing to a database, then invalidating a cache. Meta flips that: The cache is the primary store for the feed, and Cassandra is the durable backup.
Every user’s timeline is stored in Memcache (Meta’s own variant, which handles millions of QPS) with a Time-To-Live (TTL) of 24 hours. When a new post arrives via fanout, it’s written to:
- Memcache (immediate, for fast read)
- Write-ahead log (WAL) in RocksDB (local SSD)
- Async batch to Cassandra (eventually consistent)
If Memcache fails, the feed is reconstructed from the WAL in <50ms. If that fails, Cassandra is queried. This gives them 99.999% availability on read.
📊 Real Numbers for Scale
Let’s do the math for 100M users:
- Average follows per user: 150
- Posts per second at peak: 12,000
- Fanout writes per second: 12,000 * 150 = 1,800,000 writes/sec
- Timeline reads per second (app open/refresh): 50,000 reads/sec
- Cache hit ratio: 98.7%
That’s nearly 2 million writes per second hitting the infrastructure without breaking a sweat. How? Shard on user_id, not post_id.
🌐 The Real-Time Pipeline: No WebSockets, No SSE
Here’s the part that surprised me: Threads doesn’t use WebSockets for the real-time feed. At all.
Instead, they use HTTP/2 Server-Sent Events (SSE) over a persistent connection pool managed by Proxygen (Meta’s open-source C++ HTTP framework). Every client opens a single long-lived connection to the Feed Edge Proxy (FEP) .
The FEP then multiplexes all the incoming fanout notifications for that user. When a new post lands in the user’s timeline cache, the FEP sends a delta notification (just the PostID) to the client. The client then fetches the full post metadata via a batch GET request.
Why not WebSockets? SSE is easier to load balance. WebSockets require sticky sessions and stateful load balancers. SSE just needs a stateless proxy that forwards events. Meta hates stateful infrastructure. They want to be able to kill any server at any moment without losing a connection.
📡 The Actual Notification Flow
1. User A posts
2. Fanout worker writes to User B's timeline cache
3. A message is published to **Scuba** (Meta's real-time analytics DB) keyed by User B's FEP host
4. The FEP picks up the Scuba message via a **tailer** (custom consumer)
5. FEP sends an SSE event: `{ "type": "new_post", "id": "12345" }`
6. User B's client requests `/v1/feed/new?since=12345`
7. FEP serves the post metadata from **Memcache**
8. Client renders in <200ms
This entire loop takes ~150ms from post to display. That’s faster than most people’s microwave.
🧩 The Instagram Integration: A Trojan Horse of Infrastructure
Threads isn’t a separate backend. It’s an Instagram microservice with a separate feed schema. This is the most important technical detail.
Every Threads user is actually an Instagram user. Their user ID, follower graph, and authentication tokens are all served by Instagram’s existing infrastructure. Meta deployed a feature flag: ig_threads_enabled. When you sign up for Threads, it just flips that flag to True.
This means:
- No new graph database needed. Instagram’s TAO (Graph DB) already has 1+ trillion edges.
- No new authentication system. Instagram’s AuthProxy handles all tokens.
- No new profile storage. Your Threads bio is just a new field in Instagram’s PostgreSQL shard.
But here’s the catch: The feed algorithm had to be completely rewritten. Instagram’s feed is heavily curated (explore page, stories, ads). Threads’ feed is strictly chronological (initially). That meant building a new feed ranking service from scratch.
🧬 The Chronological Feed Service
The feed ranking service (let’s call it Chronos) is a stateless Go microservice that:
- Reads the timeline list from Memcache
- Applies hard filters (blocked users, age-restricted content, safety checks)
- Applies soft dedup (remove posts you’ve seen before, based on local client cache of 500 recent PostIDs)
- Returns 50 posts per request, with a cursor for pagination
The cursor is a signed token containing: {last_timestamp, last_post_id, user_id}. This allows infinite scroll without backend state. Every request is a fresh computation.
🔥 The “5 Days” Problem: What Actually Changed?
When Threads hit 100M users in 5 days, the engineering team didn’t panic—they pre-scaled. Here’s what they actually had to tweak in real-time:
Day 1-2: The Cassandra Cluster Thrashed
The fanout writes started hitting Cassandra’s compaction bottleneck. Cassandra writes sequentially, but compaction (merging SSTables) consumed 40% of CPU. The team quickly:
- Increased the number of compaction threads from 4 to 16 per node
- Switched to Leveled Compaction (instead of Size-Tiered) to reduce write amplification
- Added 200 additional Cassandra nodes across 3 availability zones
Day 3: The Graph Traversal Limit
Fetching follower lists for users with 10M+ followers caused TAO read latency spikes. The fix? Cached the follower list in a Redis-like cluster with a 5-minute TTL. This reduced TAO reads by 80%.
Day 4: The SSE Connection Storm
Every FEP node was handling 250,000 concurrent SSE connections. The connection pool’s memory footprint exploded because each connection had a 16KB buffer. The team:
- Reduced the buffer size to 4KB (most SSE events are tiny)
- Implemented connection backpressure (if a client is slow, drop the connection and let it reconnect)
Day 5: The Silent Victory
By day 5, the system was stable. The real achievement? Zero post-to-feed latency >500ms. The team had no major incidents despite the insane growth.
⚡ The Unsung Hero: Network Infrastructure
You can’t talk about Threads without talking about Meta’s network fabric. They run one of the largest spine-and-leaf networks on the planet, with 400Gbps links between data centers.
But here’s the specific thing that made Threads work: Global Anycast + Regional Feed Servers.
Threads uses Anycast DNS to route users to the nearest regional data center. Each region maintains its own copy of the feed cache (but the writes are globally distributed via Asynchronous Multi-Region Replication).
When you post in New York, your followers in Tokyo don’t see it instantly. They see it ~200ms later due to the replication lag. But that’s fine—the feed is eventually consistent within 1 second.
The critical detail: The fanout workers are colocated with the followers’ region. So a post from New York gets fanned out to a Tokyo worker that writes to Tokyo’s cache. This minimizes cross-region read latency.
🧠 What We Can Learn from Threads (The Engineering Lessons)
Let’s drop the hype and extract the raw technical wisdom:
1️⃣ Fanout-on-Write Works When You Pre-Shard Everything
Don’t try to do real-time fanout on a single database. Shard by user_id and use async workers to spread the write load.
2️⃣ Cache is the Database (Until It’s Not)
The cache-as-a-database pattern is dangerous for mission-critical data, but it works for ephemeral feeds. Always have a cold path (Cassandra, S3) for recovery.
3️⃣ SSE > WebSockets for One-Way Feeds
If your feed is server-to-client only (no client-to-server real-time messages), SSE is simpler, more load-balanceable, and easier to debug.
4️⃣ Pre-Scale Your Worst-Case User
Meta assumed every celebrity who signed up would have 10M+ followers. They built the two-tier fanout system before Threads launched. Anticipate your hot keys.
5️⃣ Feature Flag Everything
Threads was literally an Instagram feature flag. If growth stalled, they could have shut it down with zero code changes. Your architecture should be toggleable.
🎯 The Verdict: Boring Infrastructure, Brilliant Execution
Here’s the uncomfortable truth: There is no secret sauce in Threads’ architecture. It’s Cassandra. It’s Memcache. It’s Go microservices. It’s HTTP/2. It’s everything every other social media platform uses.
The brilliance is in the ratios: how many Cassandra nodes, how many fanout workers, how many Memcache shards, how many SSE connections per FEP. These are numbers that only come from years of operating at planetary scale.
Meta didn’t invent new technology for Threads. They remixed existing infrastructure with surgical precision. They knew exactly which knobs to turn because they’ve been turning those knobs for two decades.
So the next time you see a story about “X company scaled to 100M in 5 days,” remember: They had a head start. But they also had the audacity to ship a real-time feed that didn’t crash on day 6.
Now go optimize your Cassandra cluster. And maybe, just maybe, pre-scale for that user who’s about to go viral.
🔗 Further Deep Dives (If Your Brain Craves More)
- Meta’s TAO Graph Database: Paper
- How Instagram Scaled to 1 Billion Users: Engineering Blog
- Apache Cassandra at Meta: Video Talk
- Fanout on Write vs Read: Martin Kleppmann’s Talk
💡 Did this architecture breakdown blow your mind? Want me to do a deep dive on Threads’ algorithmic ranking system (how they eventually introduced chronological + algorithmic hybrid)? Drop a comment below!
This is a fictionalized engineering analysis based on public information and common patterns in Meta’s infrastructure. Some details are speculative but grounded in real-world systems design principles.