Building A CDN on Cloudflare Workers
I’ve been building SmartCDN built entirely on Cloudflare Workers. It is an interesting project to learn edge computing and turned into something not useful. Here’s what it does, what I learned, and the issues I ran into.
What is SmartCDN?
SmartCDN sits between your users and your origin server, handling requests at Cloudflare’s edge locations worldwide. It does the usual CDN stuff like caching, but also includes:
- A/B testing - Route users to different variants with cookie persistence
- Image optimization - Automatic WebP/AVIF conversion and mobile resizing
- Geographic routing - Route to the nearest origin server
- Device detection - Optimize for mobile/tablet/desktop
- Rate limiting - Protect against abuse
- Real-time analytics - Built-in API endpoints for monitoring
The basic flow is: user request → edge worker → check cache → fetch from origin if needed → optimize → cache → return response. Simple in theory, but the details matter.
What I Learned
Service Bindings Are Essential
I spent way too long debugging 404 errors when SmartCDN tried to fetch from the origin Worker. Turns out HTTP fetch between Workers using .workers.dev URLs is unreliable—Cloudflare’s routing can just… fail sometimes.
The solution? Service bindings. They let Workers talk directly to each other without going through HTTP:
# wrangler.toml
[[env.staging.services]]
binding = "ORIGIN"
service = "smartcdn-test-origin-staging"
// Direct Worker-to-Worker communication
if (env.ORIGIN) {
const response = await env.ORIGIN.fetch(request);
}
This is way more reliable than HTTP fetch and faster too. Both workers need to be in the same Cloudflare account.
Cache API Quirks
The Cache API works great, but there are gotchas:
- HEAD requests don’t cache by default—you have to handle them specially
- Cache keys need careful normalization (query params, headers, etc.)
- TTLs are suggestions, not guarantees
- Don’t cache error responses (4xx/5xx)
I ended up implementing separate logic for HEAD vs GET, and normalizing cache keys by filtering tracking parameters like utm_source and _cache.
Header Management Matters
When proxying between Workers, some headers cause problems. The Host header especially—it needs to match the origin’s host, not the CDN’s. I created a header cleaning function that removes problematic headers:
function cleanHeadersForOrigin(originalHeaders) {
const newHeaders = new Headers(originalHeaders);
newHeaders.delete('Host'); // Critical!
newHeaders.delete('CF-Worker');
newHeaders.delete('X-Forwarded-For');
return newHeaders;
}
Small detail, but it fixed a bunch of routing issues.
Testing at the Edge is Hard
Testing Cloudflare Workers means mocking the Cache API, KV store, and other runtime APIs. Local development doesn’t fully replicate production behavior, especially for service bindings. I ended up writing a lot of integration tests that run against actual deployments.
Deploying to Cloudflare
Deployment is straightforward once you have Wrangler set up:
# Install and login
npm install -g wrangler
wrangler login
# Deploy origin first
cd test-origin
npm install
wrangler deploy --env staging
# Then deploy SmartCDN
cd ../smartcdn
npm install
npm run deploy:staging
The deployment script runs tests, builds, deploys, runs smoke tests, and rolls back automatically if anything fails. I learned to always deploy the origin Worker first since SmartCDN depends on it via service binding.
You can verify it’s working:
curl -I https://smartcdn-staging.max-977.workers.dev/
Should return 200 with SmartCDN headers like X-Cache-Status, X-Device-Type, etc.
Issues I Hit
Issue 1: 404s from Origin
Problem: SmartCDN returned 404 when fetching from origin, but origin worked fine directly.
Root Cause: HTTP fetch between Workers is unreliable. The routing layer sometimes just fails.
Solution: Service bindings. Switched from fetch(originUrl) to env.ORIGIN.fetch(request) and it works 100% of the time.
Issue 2: HEAD Requests Broken
Problem: HEAD requests returned 404 while GET worked fine.
Root Cause: I only added service binding support to GET handlers, forgot about HEAD.
Solution: Added the same service binding logic to HEAD request handlers.
Issue 3: Cache HIT on First Request
Problem: Test script showed “Cache HIT” on the very first request, which should be a MISS.
Root Cause: Cloudflare’s edge cache was caching responses before they reached my Worker. This is actually correct behavior for Cloudflare’s network, but confusing for testing.
Solution: Added cache bypass headers for testing scenarios and documented the difference between Worker Cache API and Cloudflare’s global edge cache.
How It Works
Here’s what happens when a request comes in:
- Device detection - Parse User-Agent to determine mobile/tablet/desktop
- A/B test assignment - Check cookie or hash IP to assign variant
- Geographic routing - Determine region and select origin server
- Cache lookup - Check Cache API for existing response
- Fetch if needed - If cache miss, fetch from origin via service binding
- Optimize - Apply image format conversion, resizing, etc.
- Cache response - Store for future requests
- Return - Send response with SmartCDN headers
Cache hit? Response in 2-6ms. Cache miss? 45-120ms depending on origin response time.
Usage Examples
Here are some real examples of SmartCDN in action:
Basic homepage request showing cache HIT, device detection, and A/B test variant assignment:
Desktop browsers get WebP format automatically:
Mobile devices get resized images (800px width, lower quality):
A/B test headers show which variant a user gets:
First request misses cache (87ms), second request hits (4ms):
Real-time analytics show cache performance:
Rate limiting kicks in after exceeding limits:
Performance
The numbers are pretty good:
- Cache HIT: 2-6ms response time (85-95% of requests)
- Cache MISS: 45-120ms (first request or expired)
- Bandwidth savings: 85% reduction via caching
- Image optimization: 60-80% smaller file sizes with WebP/AVIF
Most requests hit cache, so users get responses in single-digit milliseconds. The origin server only handles about 10-15% of total requests.
What I’d Do Differently
- Start with service bindings - Would have saved hours of debugging
- Better testing strategy - More integration tests earlier
- Documentation earlier - Wrote docs after the fact, should have done it during
- Error handling - More robust error responses and logging
Future Improvements
Some things I’d like to add:
- WebSocket support for real-time apps
- GraphQL query caching
- JWT-based auth/authorization
- Better DDoS protection
- Multi-cloud origin support
Takeaways
Building SmartCDN taught me a lot about edge computing, caching strategies, and Cloudflare Workers. The platform is powerful but has quirks. Service bindings are the key to reliable Worker-to-Worker communication. And edge caching is really effective—getting 85%+ cache hit rates makes a huge difference.
If you’re building something on Cloudflare Workers, definitely check out service bindings for multi-worker setups. And don’t assume HTTP fetch between Workers will work reliably—it won’t.
You can check out the code on GitHub if you want to see how it’s implemented. Feel free to use it, contribute, or learn from it.