SonicJobs Logo
Left arrow iconBack to search

Performance Engineer

Netflix
Posted 3 months ago, valid for 7 days
Location

Milpitas, Santa Clara 95036, CA

Salary

$230,000 per year

Contract type

Full Time

Retirement Plan
Paid Time Off
Flexible Spending Account

By applying, a Sonicjobs account will be created for you. Sonicjobs's Privacy Policy and Terms & Conditions will apply.

SonicJobs' Terms & Conditions and Privacy Policy also apply.

Sonic Summary

info
  • Netflix is seeking a highly experienced Performance Engineer to optimize GPU infrastructure efficiency and large-scale AI/ML workloads.
  • The role requires 10+ years of experience in systems performance analysis and optimization, particularly with large-scale distributed systems.
  • The salary range for this position is $499,000.00 to $900,000.00, with the option to choose between salary and stock options each year.
  • Responsibilities include driving performance optimization, collaborating with ML Platform teams, and resolving complex performance bottlenecks.
  • Netflix offers comprehensive benefits, including health plans, retirement plans, and flexible time off, while fostering a diverse and inclusive work environment.

About the team

Netflix has built a reputation as a world leader in Performance Engineering, and the Netflix Performance Tech blog is one of the de facto resources in the Industry. We are a small, seasoned, and highly impactful team that has a massive impact across Netflix engineering by being a trusted expert to countless teams, getting involved in the hardest problems and most critical projects to ensure Netflix is always delivering the very best performance to its customers. 

​About the role

We are looking for a highly experienced Performance Engineer to join our team, focusing on the critical area of GPU infrastructure efficiency and the optimization of large-scale AI/ML workloads. This role is essential to managing our rapidly growing computational footprint, ensuring we deliver maximum performance while optimizing cost and resource utilization. You will be a trusted expert, working at the intersection of infrastructure, ML platforms, and core engineering to drive meaningful impact across the organization.

What you will do:

  • Drive efficiency and performance optimization across our large-scale infrastructure.

  • Collaborate with ML Platform and Data Science teams to build and enhance comprehensive profiling, tracing, and observability capabilities for GPU workloads.

  • Analyze and resolve complex performance bottlenecks across the entire stack, including hardware, drivers, OS, Kubernetes/scheduling, networking, storage, and application code.

  • Evaluate and guide the adoption of new GPU architectures, interconnects, and cloud vendor services to maximize performance and cost efficiency within Netflix's AI/ML ecosystem.

  • Share knowledge by documenting best practices, contributing to Netflix Tech Blogs, and presenting at industry and vendor forums.

Must-Have Skills:

  • 10+ years of experience in systems performance analysis and optimization with a focus on large-scale distributed systems.

  • Deep understanding of GPU architecture, kernels,  and ML frameworks.

  • Experience in building and using CPU and GPU profiling and other performance analysis tools.

  • Expertise in identifying and resolving performance bottlenecks within the AI/ML infrastructure and software stack.

  • Experience with container orchestration platforms such as Kubernetes.

  • Experience with performance analysis and optimization in a multi-tenant, cloud-native environment.

  • Strong programming skills in languages such as Python and Java.

Nice-to-Have Skills:

  • Experience with large language model (LLM) serving and training optimization techniques.

  • Understanding of Linux internals such as resource scheduling, memory management, and I/O for GPU-intensive workloads.

  • Experience with the performance analysis of high-speed networking protocols and interconnect technologies, such as InfiniBand and NVLink.

  • Experience with capacity engineering and cost optimization in a major public cloud environment.

  • Proven track record of contributing to open-source performance tools or research in the field.




Learn more about this Employer on their Career Site

Apply now in a few quick clicks

By applying, a Sonicjobs account will be created for you. Sonicjobs's Privacy Policy and Terms & Conditions will apply.

SonicJobs' Terms & Conditions and Privacy Policy also apply.