Description
- Own and support ML compute management for Apple’s inference workloads (GPU, TPU, and custom silicon) to enable large-scale model serving. - Collaborate closely with Apple Intelligence and ML engineering teams to understand roadmaps and resource pain points to develop and implement resource strategies. - Optimize Apple’s ML workloads by driving performance improvements, maximizing resource utilization, and reducing service costs through deep root cause analysis that shapes both engineering decisions and the end customer experience. - Architect solutions for large-scale optimization problems, including capacity allocation, workload scheduling, and cost reduction, enabling Apple's AI-driven experiences. - Advocate on behalf of Apple’s ML engineers to bring a consolidated view of ML platform and model inference requirements to Apple’s internal infrastructure platform providers and 3rd party public cloud providers.
Minimum Qualifications
MS or PhD in a relevant field Direct experience with foundation model serving, inference, and training at scale Familiarity with PyTorch, JAX, cluster management (Slurm, Kubernetes), or GPU/TPU hardware Prior experience in efficiency, FinOps, or capacity planning Experience negotiating technical roadmaps with platform or infrastructure teams Background in technical and financial decision-making (TCO modeling, cost optimization)
Preferred Qualifications
MS or PhD in a relevant field Direct experience with foundation model serving, inference, and training at scale Familiarity with PyTorch, JAX, cluster management (Slurm, Kubernetes), or GPU/TPU hardware Prior experience in efficiency, FinOps, or capacity planning Experience negotiating technical roadmaps with platform or infrastructure teams Background in technical and financial decision-making (TCO modeling, cost optimization)
Learn more about this Employer on their Career Site
