Military Spouse Connection Jobs

Military Spouse Connection mobile logo

Job Information

Onit Site Reliability Engineer in Pune, India

Site Reliability Engineer

Onit, Inc. is looking for a Site Reliability Engineer L2 to join our Core Infrastructure team. This role will help to ensure the reliability of a diverse set of applications across our AWS infrastructure. To be successful in this role you will need to collaborate and pair with team members, have strong technical skills, and a passion for technology. The individual we seek is skilled in observability, excellent at troubleshooting, and has strong problem-solving skills. You must be able to multi-task in a fast-paced environment and be a self-starter with the ability to work independently.

Responsibilities:

  • Troubleshoot deployment failures and infrastructure issues across our full AWS infrastructure stack (EKS, RDS, ..); This incudes dev, test, and production environments

  • Create and maintain monitors for uptime and performance using Datadog, CloudWatch and other monitoring tools.

  • Find ways to help reduce errors in systems and reduce noise in monitors and alerts

  • Work with others on user stories to improve system health

  • Help create and prioritize work / stories

  • Participate in standups with US and India team

  • Help define runbooks and automation to solve production problems

  • Troubleshoot applications from a configuration and logging perspective

  • Assist with responding to and analyzing security events from security tooling

  • Help train others to take on SRE responsibilities

  • Assist with performance optimization by identifying performing bottlenecks and making recommendations on improvements

  • Verify systems are monitored, backed up, and following best practices ... via audits and automation

  • Investigate how to take better advantage of the tools we use for monitoring, security, …

    Requirements:

  • Bachelor's degree in computer science or equivalent experience is required.

  • 3+ years of experience for the following:

  • AWS (EC2, EKS, ECS, S3, RDS, CloudWatch, CloudTrail, IAM, AWS CLI, etc.). Experience with containers and EKS is a must.

  • Linux (Centos, Amazon Linux, Ubuntu, ..)

  • Git source code management (Gitlab, GitHub)

  • Bash shell scripting or other scripting / programming experience

  • SaaS based Web application experience

  • Relational Database performance and monitoring (Postgres RDS preferred)

  • Experience with Jenkins or similar CI/CD tooling

     

  • A solid understanding of the components that make up production systems (Memory, CPU, Disk space, Disk i/o, Network i/o, etc.) is required.

  • Strong experience with monitoring, alerting, and log aggregation tools: Datadog, AWS CloudWatch, PagerDuty, Statuspage.

  • Ability to read and interpret application server logs, outputs, CloudTrail and other critical logging output

  • Excellent troubleshooting skills required.

    Nice to Have Skills

  • Prior application coding and debugging experience (Ruby, Python, etc.)

  • Terraform and/or CloudFormation

  • Experience troubleshooting application integrations

  • Other Technologies: Cloudflare, AWS Guard duty, Crowdstrik

Powered by JazzHR

DirectEmployers