Jan 22 2026

Platform System Engineer

The incoming Platform System Engineer is responsible for ensuring the availability, performance, security, and scalability of the organisation’s core platform infrastructure across hybrid cloud and on premise environments. This role blends systems engineering, automation, observability, and operational excellence to support mission critical services used by internal teams, global customers, and partners. The engineer will manage Linux based platforms, virtualization and container environments, CI/CD tooling, monitoring systems, and cloud resources, while ensuring compliance with system hardening, patching, and audit requirements. A successful candidate brings strong technical depth, a reliability focused mindset, and the ability to operate independently in a fast moving production environment.

Key Responsibilities

  • Design and implement Service Level Indicators (SLIs), Service Level Objectives (SLOs), error budgets, and lead post‑mortem/incident review processes.
  • Ensure the reliability, availability, and performance of critical internal and external platform systems.
  • Provide operational engineering support to ensure performant and stable services across production environments.
  • Collect, analyse, and interpret system metrics and logs for capacity planning, performance tuning, and fault isolation.
  • Develop automation workflows to manage infrastructure, services, and applications efficiently.
  • Increase service reliability through proactive monitoring, alerting, and observability improvements.
  • Continuously measure, benchmark, and optimise system performance, while promoting engineering best practices.
  • Coordinate with hardware and software vendors, managing support contracts, renewals, and escalations.
  • Evaluate and recommend emerging technologies that support platform innovation and operational excellence.
  • Understand requirements and drive implementation/compliance/remediation for any system hardening, patching, compliance/vulnerability audits, penetration tests for their respective scopes.

Requirements

  • 3+ years of experience in technology operations as a Systems Engineer, Infrastructure Engineer, or Site Reliability Engineer.
  • Proven experience operating and supporting mission‑critical production systems (e.g. SaaS, Telco, banks).
  • Strong background in building automated monitoring, incident detection systems, runbooks, and supporting incident‑management processes.
  • Hands‑on experience designing automation solutions using provisioning tools, CI/CD pipelines and scripting languages.
  • Proficient in building and maintaining highly available, scalable hybrid‑cloud infrastructure, with expertise in:
  • *Linux administration (Fedora, Debian, Ubuntu)
  • *Cloud architecture (AWS)
  • *Containers & Virtualization (KVM, LXC, Proxmox, OpenStack)
  • *Scripting (Bash and Python)
  • *Infrastructure‑as‑Code tools such as SaltStack, Puppet, Terraform, or Ansible
  • *Service and equipment monitoring (PRTG, Grafana, Prometheus, Graylog)
  • Strong understanding of system hardening concepts, including secure OS baselines and safe configuration practices.
  • Ability to manage and track patching cycles, package updates, kernel updates, and dependency upgrades across production systems.
  • Familiarity with vulnerability scanning tools and ability to review or remediate findings, suitable for an ISO27000-compliant implementation.
  • Understanding of backup integrity, disaster‑recovery testing, and ensuring secure data handling in backups/snapshots.
  • Ability to comply with change‑management and deployment controls, preventing unauthorised or risky changes.
  • Able to work independently, prioritize effectively, solve complex system problems, and deliver on deadlines.
  • Strong communication skills in English for interacting with users, vendors, and management globally.
  • Capable of explaining complex system interactions clearly to both technical and non‑technical audiences.

Apply today: jobs@kacific.id