Staff LLM Systems Engineer

New York

USA

$300000.00/year

Permanent

Articificial Intelligence

1

Location: United States (West Coast preferred, remote considered)

About the Company

We are a rapidly growing AI company delivering large language models at scale. Our mission is to ensure models not only perform well in research but also serve real-world applications reliably and efficiently. We are looking for engineers who enjoy solving high-scale inference and systems challenges.


Role Overview

We are seeking a Senior / Staff LLM Systems Engineer to lead the development, optimization, and deployment of large language model inference pipelines. This role focuses on high-throughput, low-latency serving and production reliability, bridging ML research and platform engineering.

This is not a training-focused role – the emphasis is on serving models at scale, optimizing systems, and enabling production ML reliability.


Responsibilities

  • Design, implement, and optimize inference pipelines for large language models
  • Improve throughput and latency of model serving in production environments
  • Collaborate closely with infrastructure, platform, and ML research teams to ensure smooth deployment
  • Build monitoring, observability, and alerting systems for inference performance and reliability
  • Identify and solve scaling challenges across GPUs, TPUs, or distributed environments
  • Evaluate and adopt new technologies, frameworks, and architectures to improve inference efficiency
  • Mentor other engineers and contribute to technical strategy for production ML systems

Qualifications

  • 5+ years of software engineering experience, including hands-on ML systems experience
  • Strong background in distributed systems, performance tuning, and low-latency architectures
  • Experience with model serving frameworks (e.g., Triton, vLLM, Ray, TorchServe)
  • Familiarity with GPU/TPU infrastructure, multi-node deployment, and system-level optimization
  • Understanding of ML workloads and trade-offs between accuracy, latency, and cost
  • Proven ability to deliver production-grade ML systems at scale
  • Excellent collaboration and problem-solving skills

Why You’ll Enjoy This Role

  • Work on cutting-edge LLM inference systems at scale
  • Solve technically challenging, high-impact engineering problems
  • Collaborate with top ML researchers and platform engineers
  • Competitive compensation and flexible work arrangements

Darwin Recruitment is acting as an Employment Agency in relation to this vacancy.

Reece Waldon

To Apply for this Job Click Here

Reece Waldon

Submit Your CV

This field is for validation purposes and should be left unchanged.
Name_1
Max. file size: 512 MB.

UPLOAD CV WITH:

or Upload CV with

Similar Jobs

1

Contract

Memphis – BAS Programmer

Engineering

Other

Job Title: BAS Programmer (Building Automation Systems) Location: Memphis, TN Contract Type: Short-term Contract Start Date: ImmediateVehicle: Company vehicle may be available (safety training See more…

to $200/year

Memphis

USA

1

Contract

BAS Installer

Engineering

Other

Job Title: BAS Programmer (Building Automation Systems) Location: Memphis, TN Contract Type: Short-term Contract Start Date: Immediate Vehicle: Company vehicle may be available (safety See more…

to $200/year

Memphis

USA

1

Contract

Frontend Magento Engineer

Technology

Software Development

Senior Magneto Developer / Remote / E-Commerce Industry / Freelance Job title – Senior Magento Developer Client Location – Remote Remote work offering – See more…

to €50.00/hour

Bucharest

Romania