Real-time Data Integration and Analytics
2021 - 2022
This project involved a comprehensive training section focused on Google Cloud Platform. It included an introduction to various Google Cloud services such as Google Cloud Storage, Google BigQuery, Dataflow, and Google Data Studio. The main objective was to implement Change Data Capture from MongoDB to BigQuery using technologies like Kafka, Spark, and Debezium.
Research and use Docker to deploy Kafka, Spark, and Debezium.
Research and deploy MongoDB to store sample data locally.
Research and deploy Kafka and Spark to manipulate and send data.
Research and understand the permissions and integration of BigQuery and Google Cloud Storage.
Technologies: Google Cloud Storage, Google Cloud BigQuery, Dataflow, Google Data Studio, Change Data Capture (CDC), MongoDB, Kafka, Spark, Debezium, Docker