Cloudflare Pages is an excellent static hosting service. It's free for the scale VORA operates at, deploys automatically from GitHub pushes, and has global CDN distribution built in. It is also, like all static hosting platforms, specific in its conventions — and learning those conventions cost us more than a few failed deployments and a Google Search Console indexing incident. This post documents the deployment lessons.
Why Cloudflare Pages for a Static Site
VORA's architecture is pure static files: HTML, CSS, JavaScript. No build step, no server, no database. This makes it an ideal candidate for static hosting. The alternatives we considered were GitHub Pages, Netlify, and Vercel. We chose Cloudflare Pages for one specific reason: the global CDN is operated by Cloudflare's edge network, which has exceptionally low latency in Asia-Pacific. For a product whose primary users are in Korea and Japan, Cloudflare's edge performance in Seoul and Tokyo is measurably better than competitors.
The setup is straightforward: connect GitHub repository, set the root directory, set the build output directory (or leave empty for static sites with no build step). Push to main, automatically deploys. No configuration required for the happy path.
The Sitemap/Robots Incident
The commit message "chore: move sitemap and robots back to root for Cloudflare Pages deployment" understates what happened. Here's the full story.
We added sitemap.xml and robots.txt to the repository. Initially, following standard practice for file organization, we placed them in a subdirectory. Google Search Console was configured to validate the sitemap.
The problem: robots.txt must be served from the exact path https://yourdomain.com/robots.txt — not from any subdirectory. Googlebot explicitly looks only at the root. Similarly, sitemap.xml is expected at https://yourdomain.com/sitemap.xml by convention (though the actual path can be specified in robots.txt). When these files were in a subdirectory, Google Search Console reported that the sitemap was inaccessible and robots.txt was not found. Several weeks of indexing were potentially affected.
Moving the files to the repository root (which becomes the Cloudflare Pages deployment root) immediately fixed the issue. The Search Console validation went green. But the delay in getting the sitemap indexed meant some blog posts weren't indexed for 2-3 weeks after publishing.
sitemap.xml in the repo root → served at yourdomain.com/sitemap.xml. There is no "public" directory or "dist" directory by default.
The CORS Headers Challenge for WASM Pages
The WASM experiments in Labs required specific HTTP response headers: Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: require-corp. These are necessary for SharedArrayBuffer, which ONNX Runtime needs for multi-threaded WASM execution.
On a normal web server, you add these headers in your nginx or Apache configuration. On Cloudflare Pages, you configure custom headers through a special file called _headers placed in the deployment root. The format:
# _headers file (Cloudflare Pages)
/vad-test.html
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
/hybrid-asr-test.html
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
/sherpa-onnx-test.html
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
This worked — but only for the specific pages listed. Any page that loads cross-origin resources (Google Fonts, Font Awesome CDN, Google AdSense) and also needs COEP headers faces a fundamental conflict. COEP requires all subresources to also set Cross-Origin-Resource-Policy: cross-origin. Google's CDN resources do not set this header. Result: either disable COEP (loses SharedArrayBuffer) or lose external CDN resources (loses fonts and icons).
Our resolution: for the WASM-heavy lab pages, we vendored the ONNX Runtime library locally (adding it to the repository's lib/ directory) and accepted that those pages wouldn't load Google Fonts or Font Awesome. This is why the lab experiment pages have slightly different typography from the main site. We chose functionality over visual consistency for the experiment pages.
The server.py That's Not Actually Used in Production
The repository contains a server.py file. This confuses people. It's a Python HTTP server with CORS and COIP headers injected for local development of the WASM lab pages. When you want to test the Sherpa-ONNX or ONNX Runtime pages locally, you run python3 server.py instead of opening the HTML file directly or using a basic HTTP server — because the WASM threading requires the cross-origin isolation headers that simple python3 -m http.server doesn't inject.
In production (Cloudflare Pages), the _headers file handles this. The server.py is purely a developer experience tool. We've considered removing it to reduce confusion but the SETUP.md file references it explicitly for the "how to develop locally" workflow. It stays, with a comment at the top explaining it's for local development only.
The Auto-Merge Workflow (And Why It's Necessary)
The repository has a GitHub Actions workflow called auto-merge.yml. The commit: "Create auto-merge.yml workflow to automatically merge PRs." This is infrastructure for the AI-assisted development workflow we use.
Pull requests from the bot branches get automatically merged when they pass basic validation. Without auto-merge, the workflow becomes: bot pushes PR → developer must manually review and merge → repeat for every change. With auto-merge: bot pushes PR → GitHub Actions validates → PR merges automatically if checks pass. This reduces the friction of keeping the main branch current.
There are obvious concerns about automatically merging PRs without human review. Our mitigations: all auto-merged PRs are reviewed after the fact; the auto-merge only applies to PRs from specific branches (the bot branches); and the pages are static HTML without server-side logic, reducing the blast radius of a mistake. If a bad change gets auto-merged and deployed, the fix is another commit and another auto-deploy cycle — typically under 2 minutes.
What We'd Configure Differently If Starting Over
Looking back, several Cloudflare Pages configurations we'd set up from day one rather than discovering later:
- Custom domain from day one: The
vora.vibed-lab.comURL is functional but presents the old product name permanently. A custom domain would have decoupled the URL from the project name. - _headers file from day one: Rather than adding CORS headers reactively when WASM experiments needed them, having the file in place from the start would avoid a confusing period where some pages worked and others didn't.
- sitemap.xml and robots.txt in root from day one: Never put these anywhere except the deployment root. There is no valid reason to put them anywhere else.
- Preview deployments for testing: Cloudflare Pages automatically creates preview deployments for non-main branches. We initially didn't use these, deploying directly to main for every change. Using preview deployments would catch deployment-specific issues (like the robots.txt placement) without affecting production.
Static hosting is not complicated. The complications are in the details: where specific files must live, how security headers interact with CDN resources, how to configure builds that don't actually need building. These are solvable problems with a day of reading documentation that we stretched into weeks of reactive fixes.