Back to blog
Infrastructure

CostBot: Crawling Construction Prices on Cloudflare Without Letting Costs Run Away

How Omnicost uses Cloudflare Workers, queues, schedules, and guardrails to collect price data while keeping infrastructure spend tight.

5/20/20262 min read

Price intelligence only works if it stays current. For construction, that means crawling supplier pages, public catalogs, procurement portals, and cost files on a schedule. It also means doing it cheaply enough that the data business can scale before revenue catches up.

Omnicost runs CostBot on Cloudflare services because the workload is naturally distributed. Workers handle orchestration, queues absorb crawl and normalization jobs, scheduled triggers dispatch freshness checks, KV stores guardrail state, D1 stores structured data, R2 holds downloaded artifacts, and Workers AI handles selective matching or enrichment.

The important constraint is cost control. Not every page deserves a browser render. Not every observation needs an AI call. Not every source needs the same freshness interval. CostBot ranks sources by value, crawl budget, trust, and staleness before dispatching work.

The pipeline is staged. Source discovery finds promising suppliers and BC3 files. Crawlers collect raw observations. Parsers extract prices. Normalizers convert messy rows into structured observations. Matchers map observations to canonical items, using deterministic rules first and AI only when cheaper methods fail.

AI calls are guarded by kill switches, per-job budgets, and usage ledgers. If a model is rate-limited or a daily budget is exhausted, the system degrades instead of spiraling. The goal is not maximum crawling. The goal is the cheapest crawl that improves price coverage.

This infrastructure lets Omnicost build a live catalog without behaving like an enterprise data platform from day one. Tight cost discipline is a product feature when the product itself depends on continuous data collection.