The Core OS team is seeking an exceptional engineering manager to lead the team responsible for enabling Apple's operating systems to achieve world-class reliability. This team develops and owns mission-critical tools and services that detect, analyze, and classify kernel panics and low-level crashes across all Apple platforms. You will be partnering with engineering teams across Software, Hardware, and Silicon groups to drive and deliver the rock-solid OS reliability for over 2 billion currently active Apple devices and shape the future of system reliability across Apple's entire product ecosystem.

Description

Lead a team of engineers triaging kernel panics and critical system-level issues across all Apple platforms (macOS, iOS, watchOS, tvOS). Build intelligent automation pipelines that analyze, group, and prioritize failure signatures based on their reliability impact. Mentor engineers to design and develop advanced systems diagnostic and at-scale debug services to realize the vision of zero-iteration debugging and fully automated triage and root cause analysis. Develop telemetry-based dashboards to monitor at-scale panic/crash triage and analysis services to ensure they are working as expected and efficiently. Collaborate with Core OS, Hardware, Silicon, and other engineering teams to champion and advance improvements in debuggability, panic data quality, symbolication, and automation of triage and debug workflows.

Minimum Qualifications

Demonstrated track record of building and scaling high-performing engineering teams Passion for solving challenging technical problems that directly impact millions of users Strong communication skills with ability to influence technical direction across organizational boundaries Experience managing complex, multi-platform technical initiatives with measurable reliability improvements Strong technical depth in operating system internals will be helpful BS/MS in Computer Science, Compute Engineering, Electrical Engineering, or equivalent experience

Preferred Qualifications

Experience applying AI/ML for automated triage and reliability services is preferred Experience with large-scale telemetry systems processing millions of events daily is preferred

Learn more about this Employer on their Career Site

Software Engineering Manager, Triage Services and Infrastructure

Description

Minimum Qualifications

Preferred Qualifications

Transmission Planning Engineer