SonicJobs Logo
Left arrow iconBack to search

ML Infrastructure Engineer

Mach9
Posted 2 months ago, valid for 18 days
Location

San Francisco, CA 94102, US

Salary

$160,000 - $200,000 per year

Contract type

Full Time

By applying, a Sonicjobs account will be created for you. Sonicjobs's Privacy Policy and Terms & Conditions will apply.

SonicJobs' Terms & Conditions and Privacy Policy also apply.

Sonic Summary

info
  • Mach9 is seeking a mid-career ML infrastructure engineer to build and maintain systems for AI models in civil engineering and surveying.
  • The role requires at least 3 years of experience and offers a competitive salary, which is not specified in the job description.
  • Key responsibilities include designing a centralized system for versioning training data, developing reliable ML training pipelines, and optimizing real-time model inference services.
  • Candidates should have a Bachelor's or Master's degree in Computer Science or a related field, along with strong communication skills and hands-on experience with ML pipeline orchestration tools.
  • Bonus qualifications include familiarity with AWS, experience with containerized ML workflows, and knowledge of infrastructure-as-code tools.

The role

At Mach9, ML infrastructure engineers build and maintain the systems that power production AI models for civil engineering and surveying. Our ML pipeline spans 10,000+ miles of labeled survey data, image segmentation networks, and 3D prediction models serving real-time inference to surveyors and engineers in the field.

This role is ideal for mid-career ML infrastructure engineers with experience building for both training and inference.

You'll build training pipelines that handle deep transformer models on hundreds of terabytes of 3D point cloud and image data. You'll also architect our inference infrastructure, delivering both heavy offline detection algorithms and real-time responsive inference that integrates directly with our CAD software.

Responsibilities

  • Design and build a centralized system for versioning training data, generated datasets, and model artifacts, with full lineage tracking from raw source data through to trained model outputs.

  • Develop and maintain reliable, reproducible ML training and data generation pipelines.

  • Refactor and harden existing training and data generation scripts into composable, testable, and maintainable components.

  • Create CI/CD workflows for validating data pipelines and model training runs, including automated correctness checks and regression detection.

  • Build tooling that enables ML engineers to launch, monitor, and debug training jobs with minimal friction.

  • Optimize and scale real-time model inference services to meet latency and throughput requirements in production, including profiling, batching strategies, and resource-efficient serving.

  • Own the deployment path from trained model artifact to production endpoint, ensuring reliable rollouts, rollback, and monitoring.

Requirements

  • 3+ years of work experience in relevant fields.

  • Bachelor's or Master's degree in Computer Science, Engineering, or equivalent experience.

  • Strong communication skills and the ability to work closely with ML researchers and engineers to understand their workflows and translate them into robust systems.

  • Experience designing and building data versioning, artifact management, or dataset lineage systems (e.g., DVC, LakeFS, Weights & Biases, or custom solutions).

  • Hands-on experience with ML pipeline orchestration tools (e.g., Airflow, Prefect, Metaflow, or similar).

  • Experience with model serving and inference optimization — profiling latency, reducing memory footprint, or scaling serving infrastructure to meet real-time constraints.

  • Ability to read and refactor ML training code — you don't need to design model architectures, but you need to understand what training pipelines are doing well enough to make them reliable.

  • Proficient with Python, PyTorch.

Bonus qualifications

  • Familiarity with AWS infrastructure services.

  • Experience with containerized ML workflows and GPU-accelerated training environments.

  • Experience with model optimization techniques (e.g., quantization, TensorRT, ONNX Runtime, distillation).

  • Knowledge of infrastructure-as-code tools (e.g., AWS CDK, Terraform).

  • Experience building or operating ML systems that handle large unstructured datasets (imagery, 3D data, sensor data).




Learn more about this Employer on their Career Site

Apply now in a few quick clicks

By applying, a Sonicjobs account will be created for you. Sonicjobs's Privacy Policy and Terms & Conditions will apply.

SonicJobs' Terms & Conditions and Privacy Policy also apply.