Cloud data-warehouse spend is usage-based and notoriously hard to forecast — credits, per-TB-scanned billing, and separately metered compute/storage/egress. Open engines and lakehouses — ClickHouse, Trino/Starburst, DuckDB, Druid — can cut cost dramatically for the right workloads, especially analytical and high-volume ones.
Choosing a target
- ClickHouse — blazing columnar OLAP; great for real-time analytics and high ingest.
- Trino — query engine over a lakehouse (Iceberg/Delta/Parquet on object storage); decouples compute from storage.
- DuckDB — in-process analytics, excellent for embedded and mid-size workloads.
Use open formats as the bridge
The clean migration path is export to Parquet/Iceberg on object storage, then load into the target. This avoids vendor-specific dump formats and gives you a reusable lakehouse layer. Inventory schemas, datasets, ETL/ELT jobs, BI connections, and SQL-dialect specifics first.
SQL & ETL conversion
Each engine has dialect differences (functions, types, window syntax, semi-structured handling). Convert and test key queries on a sample dataset before full load, and re-point ETL/ELT pipelines and BI tools (Tableau/Power BI/Looker) to the new engine.
Cutover via reconciliation
Export full datasets, load into the target, and run both systems in parallel, reconciling row counts and aggregate results plus query-correctness and performance benchmarks. Switch BI/consumers to the target only after sign-off; keep the source authoritative until reconciled.
Sizing & cost
Bills scale with TB managed and query concurrency. Self-hosted/open engines shift cost to compute + storage + engineering, usually far lower for steady analytical load. Model against your real data volume and concurrency.
Open a source→target page for engine-specific steps and a per-TB TCO model.