What is the GCP data analytics reference architecture (Modern Data Stack)?

Answer

GCP's modern data analytics stack: Ingest: Pub/Sub for streaming events, Cloud Storage for batch files, Datastream for CDC (change data capture) from operational databases. Process: Dataflow (Apache Beam) for stream and batch ETL, Dataproc (managed Spark/Hadoop) for batch processing, Cloud Data Fusion (managed Apache NiFi) for visual ETL. Store: BigQuery as the central data warehouse/lake; Cloud Storage as the raw data lake; Bigtable for high-throughput low-latency access. Analyze: BigQuery SQL + BQML, Looker (BI), Vertex AI (advanced ML). Orchestrate: Cloud Composer (managed Apache Airflow) or Workflows. Govern: Dataplex (unified data governance), Data Catalog (metadata), DLP (data classification). The complete pipeline from raw event to business insight can be entirely serverless and managed on GCP.