Project: System Infrastructure Maintenance
Ensure the maintenance and high availability of the system.
Maintenance and responsibility for installation, configuration, upgrade, and patching with various technologies:
ElasticSearch Stack for searching.
Kong API Gateway management.
Implementation of monitoring solutions, using Prometheus, Grafana, Splunk, and integration with New Relic.
Application of Ververica Platform and Apache Flink for detecting fraudulent transactions in real-time on Google Cloud Platform (GCP) for VinID, a trusted consumer application in Vietnam.
Management and maintenance of Ververica Platform deployments on Kubernetes.
Collaboration with the development team to identify root causes, troubleshoot issues, and support deployment of Apache Flink jobs to Google Kubernetes Platform.
Set up agents and Application Performance Monitoring (APM) tools for event and log collection from applications and VMs.
Monitor SSL certificate expiration for ELK Stack.
Documentation updates and guidelines creation for user access to the above services.
Monitoring and optimizing systems for a balanced performance and cost-effectiveness.
Support the migration from Kafka to Confluent self-hosted.
Provide performance product support, including system troubleshooting, performance data collection, and analysis.
Management of Vault Cluster for storing secrets, key values, SSL certificates, and integration with project services.
Collaborate with stakeholders to analyze requirements, clarify design dependencies, create test plans, and support functional and non-functional activities.
Develop Ansible playbooks for rapid installation, configuration, and versioning via GitLab.
Build reusable Terraform modules to efficiently create infrastructure following design specifications.
Technologies: Google Cloud Platform (GCP), Linux, Window, Ansible, Terraform, GitLab CI/CD, Prometheus + Grafana, New Relic, Veverica Platform + Apache Flink, Splunk, ElasticSearch Stack, Kubernetes, Logstash, Vault, Kong API