SonicJobs Logo
Left arrow iconBack to search

Software Engineering Manager, Triage Services and Infrastructure

Apple
Posted 2 months ago, valid for 23 days
Location

Cupertino, CA 95015, US

Salary

Competitive

Contract type

Full Time

By applying, a Sonicjobs account will be created for you. Sonicjobs's Privacy Policy and Terms & Conditions will apply.

SonicJobs' Terms & Conditions and Privacy Policy also apply.

Sonic Summary

info
  • The Core OS team is looking for an engineering manager to lead efforts in enhancing the reliability of Apple's operating systems.
  • The role involves managing a team that addresses kernel panics and system-level issues across macOS, iOS, watchOS, and tvOS.
  • Candidates should have a strong background in operating system internals and experience managing complex technical initiatives, requiring a minimum of 5 years of experience.
  • The position offers a salary range of $150,000 to $200,000, depending on experience and qualifications.
  • Preferred qualifications include experience with AI/ML for automated triage and large-scale telemetry systems.
The Core OS team is seeking an exceptional engineering manager to lead the team responsible for enabling Apple's operating systems to achieve world-class reliability. This team develops and owns mission-critical tools and services that detect, analyze, and classify kernel panics and low-level crashes across all Apple platforms. You will be partnering with engineering teams across Software, Hardware, and Silicon groups to drive and deliver the rock-solid OS reliability for over 2 billion currently active Apple devices and shape the future of system reliability across Apple's entire product ecosystem.

Description


Lead a team of engineers triaging kernel panics and critical system-level issues across all Apple platforms (macOS, iOS, watchOS, tvOS). Build intelligent automation pipelines that analyze, group, and prioritize failure signatures based on their reliability impact. Mentor engineers to design and develop advanced systems diagnostic and at-scale debug services to realize the vision of zero-iteration debugging and fully automated triage and root cause analysis. Develop telemetry-based dashboards to monitor at-scale panic/crash triage and analysis services to ensure they are working as expected and efficiently. Collaborate with Core OS, Hardware, Silicon, and other engineering teams to champion and advance improvements in debuggability, panic data quality, symbolication, and automation of triage and debug workflows.

Minimum Qualifications


Demonstrated track record of building and scaling high-performing engineering teams Passion for solving challenging technical problems that directly impact millions of users Strong communication skills with ability to influence technical direction across organizational boundaries Experience managing complex, multi-platform technical initiatives with measurable reliability improvements Strong technical depth in operating system internals will be helpful BS/MS in Computer Science, Compute Engineering, Electrical Engineering, or equivalent experience

Preferred Qualifications


Experience applying AI/ML for automated triage and reliability services is preferred Experience with large-scale telemetry systems processing millions of events daily is preferred



Learn more about this Employer on their Career Site

Apply now in a few quick clicks

By applying, a Sonicjobs account will be created for you. Sonicjobs's Privacy Policy and Terms & Conditions will apply.

SonicJobs' Terms & Conditions and Privacy Policy also apply.