Every platform needs optimizations - so does BQ and even more so, because with scale, it gets extremely costly. When it comes to optimization, it’s always a tough tradeoff between performance and cost. But are there methods to optimize BQ performance when using at scale without really having to tradeoff performance with cost? Yes!
BigQuery is unpredictable when it comes to pricing. BigQuery mainly charges you for the storage and the queries but operations such as loading data, copying data, exporting data are free. This essentially means that if you use un-optimised queries, it can result in thousands of $$ in cost.
At the very same time, your requirement is best performance. Having to trade off performance for cost is unacceptable is this data growing world.
- Ways of cost saving: flex vs flat rate vs on-demand: BigQuery is unpredictable when it comes to pricing.
- Proper scheduling + capacity planning (eg: blog) : Know your pods, workflows, have separate projects
- engineer flex slots in a mechanical autoscaling way (like we did)
- Storage: Archival & Deletion
- Streaming Insert is costly, so only stream into tables which really, really need realtime insertion/updates. Else batch load data.
- Partitioning & Clustering
- How to (& not to) write queries (blog: watch out for query anti-patterns!)
- Optimizing query performance
- Take Advantage of Query Caching
- Monitor Performance & health of BQ through Metadata: