Resiliency and Continuity Specialist
IntraEdge
Job Description
Job Title: Resiliency and Continuity SpecialistExperience: 5+ YearsLocation: HyderabadEmployment Type: Full-TimeAbout the RoleWe are seeking a highly motivated Resiliency and Continuity Specialist to support enterprise technology resilience initiatives and ensure cloud-hosted applications and platforms maintain high levels of availability, recoverability, and operational readiness.This role serves as a subject matter expert in technology resilience, disaster recovery, and operational continuity. The ideal candidate will work closely with engineering, SRE, cloud infrastructure, application, and governance teams to coordinate resilience testing, validate recovery capabilities, review recovery plans, and ensure compliance with organizational resiliency standards.The successful candidate will play a critical role in strengthening enterprise recovery capabilities through resilience exercises, chaos testing, audit-ready documentation, and continuous improvement initiatives.Key ResponsibilitiesCloud Resilience Testing & Recovery ValidationCoordinate, plan, and support execution of cloud resilience and disaster recovery exercises across enterprise applications and platforms.Conduct both in-region and cross-region resilience testing to validate system recovery capabilities.Ensure recovery testing aligns with defined Recovery Time Objectives (RTOs) and Recovery Point Objectives (RPOs).Collaborate with engineering and operations teams to develop meaningful resilience and chaos engineering scenarios.Validate application recoverability, service continuity, and infrastructure resilience through structured testing exercises.System Recovery Plan (SRP) GovernanceReview and validate System Recovery Plans (SRPs) to ensure completeness, accuracy, and operational readiness.Verify recovery procedures, dependencies, sequencing, ownership, and execution timelines.Ensure adherence to organizational resiliency frameworks, templates, and standards.Identify gaps, risks, and improvement opportunities within recovery documentation.Drive remediation efforts to improve recovery preparedness.Resilience Exercise ManagementCoordinate all phases of resilience exercises, including:PlanningSchedulingStakeholder communicationExecution oversightPost-exercise reviewsEnsure pre-test documentation includes:ScopeSuccess criteriaRecovery stepsRoles and responsibilitiesDependency mappingTrack and document exercise outcomes, deviations, and lessons learned.Facilitate post-mortem reviews and continuous improvement activities.Evidence Validation & Audit ReadinessReview resilience testing evidence packages for completeness and compliance.Ensure evidence:Is properly timestamped and serializedMaps to documented recovery stepsDemonstrates successful recovery outcomesSupports audit and regulatory requirementsValidate recovery metrics, execution results, and test outcomes.Maintain audit-ready documentation and support compliance reviews.Operational Resilience & GovernanceSupport enterprise resilience programs and governance initiatives.Assist teams in applying resilience frameworks, assessment methodologies, and operational standards.Ensure resilience activities align with internal policies and regulatory expectations.Participate in risk assessments, control reviews, and resilience audits.Cross-Functional CollaborationPartner with:Engineering TeamsSRE TeamsCloud OperationsInfrastructure TeamsArchitecture TeamsRisk & Compliance TeamsCoordinate remediation activities and track closure of identified gaps.Provide guidance on resilience best practices for cloud deployments and system changes.Continuous ImprovementContribute to enhancement of:Recovery standardsResilience frameworksTesting methodologiesReporting processesGovernance controlsSupport resilience maturity initiatives across the organization.Required QualificationsExperience5+ years of experience in:Technology ResilienceDisaster RecoveryOperational ResilienceSite Reliability Engineering (SRE)Technology Risk ManagementIT GovernanceInfrastructure OperationsCloud & Infrastructure KnowledgeStrong understanding of cloud architecture and resilience principles.Experience working with cloud platforms such as:AWSAzureGCPUnderstanding of:Regions and Availability ZonesLoad BalancingAuto ScalingInfrastructure as Code (IaC)Backup & Restore StrategiesReplication MechanismsService DependenciesHigh Availability ArchitecturesResilience & Recovery ExpertiseExperience coordinating and executing:Disaster Recovery (DR) TestsBusiness Continuity ExercisesResilience TestingChaos Engineering SimulationsExperience creating and maintaining recovery plans and operational documentation.Strong understanding of RTO, RPO, and service recovery frameworks.Monitoring & ObservabilityFamiliarity with:Monitoring PlatformsObservability SolutionsAlerting SystemsIncident Management ProcessesUnderstanding of Chaos Engineering and resilience validation practices.Tools & TechnologiesExperience with:ServiceNowGRC PlatformsHarnessMicrosoft Office SuiteExcelWordPowerPointVisioMS ProjectPreferred QualificationsReporting & AnalyticsExperience with resilience metrics, reporting, and dashboarding.Knowledge of:Power BITableauAdvanced ExcelCrystal ReportsCloud CertificationsPreferred certifications include:AWS Certified Cloud PractitionerMicrosoft Azure FundamentalsGoogle Cloud Digital LeaderCloud Architecture Certifications (Preferred)Business Continuity & Disaster Recovery CertificationsCBCP (Certified Business Continuity Professional)Disaster Recovery Institute (DRI) CertificationsBusiness Continuity CertificationsProject ManagementPMPPRINCE2Agile CertificationsSoft SkillsExcellent stakeholder management and communication skills.Strong analytical and problem-solving abilities.Ability to coordinate multiple teams across complex environments.Strong documentation and governance mindset.Detail-oriented with a focus on audit readiness and operational excellence.