Capacity Planning, Monitoring and Service Continuity Engineer

  • Contract

Description

We are seeking a detail-oriented and proactive Capacity Planning, Monitoring, and Service Continuity Engineer to ensure optimal performance, scalability, and reliability of our IT infrastructure and services. The ideal candidate will be responsible for forecasting capacity needs, implementing monitoring solutions, and developing strategies to maintain uninterrupted service continuity.


Key Roles:

  • Develop and maintain capacity planning models to forecast current and future resource requirements based on business growth and technology trends.
  • Monitor system performance and resource utilization across infrastructure components including servers, storage, network, and applications.
  • Analyse monitoring data to identify bottlenecks, performance issues, and capacity constraints; recommend and implement improvements.
  • Design and implement monitoring tools, dashboards, and alerts for proactive identification of potential issues.
  • Collaborate with IT operations, engineering, and business teams to ensure capacity aligns with service-level agreements (SLAs) and business objectives.
  • Plan and coordinate capacity upgrades, expansions, and migrations to prevent service disruptions.
  • Develop and maintain disaster recovery and business continuity plans, ensuring service availability in the event of failures or disasters.
  • Conduct risk assessments and implement mitigation strategies to minimize downtime.
  • Maintain documentation of capacity planning processes, monitoring configurations, and continuity plans.
  • Stay updated with industry trends and best practices in capacity management, monitoring, and service continuity.

Requirements

Qualifications:

  • Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field.
  • Proven experience in capacity planning, performance monitoring, and service continuity in complex IT environments
  • Strong knowledge of infrastructure components (servers, storage, network, virtualization, cloud services).
  • Hands-on experience with monitoring tools (e.g., Nagios, Zabbix, Splunk, Datadog, Prometheus).
  • Familiarity with capacity planning methodologies and tools.
  • Experience with disaster recovery planning and business continuity management.
  • Analytical skills to interpret performance data and capacity trends.
  • Excellent communication and collaboration skills to work across multiple teams.
  • Relevant certifications (e.g., ITIL, PMP, cloud certifications) are a plus.