The Speech organization within Siri is at the forefront of building the technologies that power conversational AI, speech recognition, speech synthesis, and speech-to-speech experiences across Apple's entire ecosystem. Our mission is to develop cutting-edge models, infrastructure, and datasets that enable Siri, dictation, and Apple Intelligence features to deliver natural, intelligent, and deeply personalized speech interactions for billions of users worldwide. We are seeking a Machine Learning Architect to serve as a senior technical leader spanning the full Speech organization. In this role, you will set the future modeling direction for all of conversational speech—charting the architectural and algorithmic course for how Apple's speech technologies evolve over the coming years. You will operate as a hands-on expert who not only defines strategy but also digs into the hardest technical problems, working shoulder-to-shoulder with teams to overcome critical obstacles and unlock breakthroughs. Reporting directly to the Speech organization leadership, you will have broad visibility and influence across speech recognition, synthesis, dialog, multimodal foundation models, and speech-to-speech systems, ensuring coherent technical vision and cross-team alignment. We believe the most impactful advances in deep learning emerge when world-class research is anchored in real-world production needs at scale. This role offers a rare opportunity to shape the trajectory of conversational speech technology across Apple's software, hardware, and services - improving speech interaction experiences for Apple's customers around the world.

Description

As the Machine Learning Architect for Conversational Speech, you will: - Define modeling strategy and technical direction across the Speech organization, establishing a unified architectural vision for speech recognition, speech synthesis, dialog systems, multimodal foundation models, and speech-to-speech technologies. - Serve as the organization's foremost modeling expert, providing deep technical guidance and mentorship to multiple teams of researchers and engineers working on distinct but interconnected speech capabilities. Identify and drive solutions to the most challenging technical problems—rolling up your sleeves to prototype, debug, and iterate on novel approaches when teams encounter critical obstacles. - Evaluate emerging research and industry trends (e.g., advances in large language models, multimodal architectures, full-duplex natural conversational systems) and translate them into actionable roadmaps aligned with Apple's product and platform priorities. - Drive cross-team technical alignment, ensuring that modeling choices, training methodologies, data strategies, and infrastructure investments are coherent and mutually reinforcing across the organization. - Champion production-readiness, bridging the gap between research innovation and deployed systems by ensuring that architectural decisions account for on-device constraints, latency requirements, scalability, robustness, and real-world data conditions. - Collaborate broadly with partner teams across Siri, Apple Intelligence, hardware, and platform engineering to ensure speech modeling investments are well-integrated into Apple's broader AI and product strategy. - Contribute to the broader ML and speech research community through publications, patents, and engagement with the state of the art.

Minimum Qualifications

10+ years of experience in machine learning applied to speech or multimodal systems, with progressively increasing technical scope and leadership. Demonstrated expertise as a technical leader or architect who has defined modeling direction across multiple teams or product areas—not solely an individual contributor on a single workstream. Deep, hands-on proficiency in modern deep learning, including large language models and end-to-end speech systems. Significant experience with multimodal LLMs, including architecture design, training, adaptation, and deployment of models that integrate speech, audio, and text modalities. Direct experience building speech-to-speech conversational systems, with a strong understanding of full-duplex natural conversational interaction and end-to-end speech pipelines. A track record of translating research into production-quality systems at scale. Expert programming skills in Python and deep learning frameworks such as PyTorch, JAX, or TensorFlow. Proven ability to diagnose and resolve complex, cross-cutting technical challenges spanning model architecture, training methodology, data quality, and systems integration.

Preferred Qualifications

Ph.D. in Computer Science, Electrical Engineering, Machine Learning, or a closely related field. Experience architecting or leading development of full-duplex natural conversational systems, speech-to-speech models, or multimodal foundation models that have shipped to large-scale user populations. Deep familiarity with the full stack of speech technologies—ASR, TTS, spoken dialog, speaker modeling, audio understanding—and an ability to reason about their interactions and dependencies. Experience with large-scale distributed training and the infrastructure considerations that shape model design at scale. A data-centric perspective on foundation model development, including experience guiding data collection, curation, annotation, and quality strategies. Track record of influencing technical direction across organizational boundaries, including the ability to build consensus, communicate complex trade-offs clearly, and drive alignment among diverse stakeholders. Experience with on-device ML deployment, including model compression, quantization, and latency-aware architecture design. Demonstrated ability to mentor and elevate senior technical talent, raising the bar for modeling excellence across an organization.

Learn more about this Employer on their Career Site

Machine Learning Architect - Conversational Speech

Description

Minimum Qualifications

Preferred Qualifications