Responsibilities
- Design and develop systems software for managing, provisioning, and monitoring large-scale production hardware infrastructure including compute, storage, and networking components
- Build and maintain tooling for hardware lifecycle management, fleet health monitoring, and automated remediation of production system failures
- Collaborate with hardware engineering teams to define software interfaces and firmware integration requirements for new server and accelerator platforms
- Develop and optimize low-level systems software including kernel modules, device drivers, and platform management agents to improve hardware utilization and reliability
- Architect scalable infrastructure automation frameworks that reduce manual operational toil and accelerate hardware deployment across Meta's data center fleet
- Identify and resolve systemic reliability and performance issues across production hardware by analyzing telemetry, failure patterns, and system-level diagnostics
- Define technical direction for production systems software components, driving alignment across infrastructure engineering and data center operations stakeholders
- Mentor other engineers on systems software design patterns, debugging methodologies, and production infrastructure best practices
- Lead cross-functional efforts to evaluate and integrate new hardware technologies into the production environment, including bring-up, validation, and qualification workflows
Minimum Qualifications
- Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
- 6+ years of experience in systems software engineering, including development in C, C++, or Python for Linux-based production environments
- 6+ years of experience with large-scale infrastructure systems, including hardware lifecycle management, fleet automation, or data center operations software
- Experience developing or integrating with low-level systems components such as kernel interfaces, BMC/IPMI/Redfish management stacks, or hardware telemetry frameworks
- Experience designing and operating distributed systems software at scale, including monitoring, alerting, and automated remediation pipelines
- Experience communicating technical decisions and system designs through written documentation and cross-functional stakeholder alignment
Preferred Qualifications
- Experience working on hardware/software projects in the manufacturing and hardware validation space
- Experience with large-scale distributed systems
- Familiarity with test automation frameworks and CI/CD pipelines
- Strong debugging and troubleshooting skills across hardware and software boundaries
$144,000/year to $204,000/year + bonus + equity + benefits
Learn more about this Employer on their Career Site
