Responsibilities
- Deliver maximum server fleet up-time and utilization rates, by leveraging data to understand hardware failure conditions and root cause. Identify trends and systemic issues in the fleet and drive resolution
- Collaborate with stakeholders and subject matter experts to interpret business and operational needs, articulate success criteria in partnership with engineering and field based operations teams
- Build cross functional relationships and have the capacity to influence policies and procedures to improve global data center operations
- Mentor team members to evaluate and identify better ways to resolve issues and define updates to tools and processes
- Write and review code, develop documentation, and debug the hardest problems, live, on some of the largest and most complex systems in the world
- Participate in defining diagnostic tooling requirements with multiple cross-functional support teams
- Execute validation and verification activities for the new product integration, including system level testing
- Through consistent collaboration with cross-functional tooling teams, help determine root cause and provide input into their development process, with an operations central view of how open issues are affecting the fleet
- Capacity to travel up to 25% required
Minimum Qualifications
- Engineering degree or commensurate experience
- 7+ years of experience in systems infrastructure operations or related field
- Experience in configuration and maintenance of applications such as web servers, load balancers, relational databases, storage systems and messaging systems
- Experience coding in higher-level languages (e.g., Python, PHP, C++, or Java)
- Experience learning software, frameworks and APIs
- Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
- Experienced with Linux systems
$144,000/year to $204,000/year + bonus + equity + benefits
Learn more about this Employer on their Career Site
