Description
The Special Projects team at Apple is developing novel experiences powered by state-of-the-art agentic vision-language models that incorporate visual context into conversational interaction. We are looking for a Machine Learning Engineer to help us build, fine-tune, and rigorously evaluate these systems. A successful candidate has hands-on experience with vision-language models, knows how to translate ambiguous product requirements into measurable evaluation criteria, and is excited to work at the intersection of multimodal modeling and agentic AI.
Minimum Qualifications
BA or Master’s degree in Computer Science or Machine Learning 2+ years of hands-on experience building and evaluating generative AI or multimodal models Experience working with vision-language models or multimodal systems Proficiency in Python and ML frameworks (Pytorch or Tensorflow)
Preferred Qualifications
PhD in Computer Science, Machine Learning, Statistics, or other STEM field Prior industry internship or research experience applying ML to product use cases Experience with video understanding, temporal reasoning, or activity recognition Familiarity with agentic system design including tool use, grounding, or perceive-act loops Experience building or working with large-scale multimodal data and annotation pipelines Proficiency in training, fine-tuning, and evaluation of foundation models and frameworks Publications or technical presentations in Machine Learning journals or conferences Excellent communication skills and cross functional collaboration
Learn more about this Employer on their Career Site
