Staff Production Engineer
Groq
Remote
USD 236,360-278,070 / year + Equity
Posted on Sep 26, 2025
Staff Production Engineer
Remote
Cloud Inf, Sys SW & Model Mgmt
Remote
Full-time
About Groq
Groq delivers fast, efficient AI inference. Our LPU-based system powers GroqCloud™, giving businesses and developers the speed and scale they need. From our Bay Area roots to our growing global presence, we are on a mission to make high performance AI compute more accessible and affordable. When real-time AI is within reach, anything is possible. Build fast.
Mission
Join the team that builds and operates Groq’s real-time, distributed inference system delivering large-scale inference for LLMs and next-gen AI applications at ultra-low latency. As a Low-Level Production Engineer, your mission is to ensure reliability, fault tolerance, and operational excellence in Groq’s LPU-powered infrastructure. You’ll work deep in the stack—bridging distributed runtime systems with the hardware—to keep Groq systems fast, stable, and production-ready at scale.
Responsibilities & Opportunities in this role
- Production Reliability: Operate and harden Groq’s distributed runtime across thousands of LPUs, ensuring uptime and resilience under dynamic global workloads.
- Low-Level Debugging: Diagnose and resolve hardware-software integration issues in live environments, from datacenter level events to single component failures.
- Observability & Diagnostics: Build tools and infrastructure to improve real-time system monitoring, fault detection, and SLO tracking.
- Automation & Scale: Automate deployment workflows, failover systems, and operational playbooks to reduce overhead and accelerate reliability improvements.
- Performance & Optimization: Profile and tune production systems for throughput, latency, and determinism—every cycle counts.
- Cross-Functional Collaboration: Partner with compiler, hardware, infra, and data center teams to deliver robust, fault-tolerant production systems.
Ideal candidates have/are:
- Proven experience in production engineering across the stack and operating large-scale distributed systems.
- Deep knowledge of computer architecture, operating systems, and hardware-software interfaces.
- Skilled in low-level systems programming (C/C++ or Rust), with scripting fluency (Python, Bash, or Go).
- Comfortable debugging complex issues close to the metal—kernels, firmware, or hardware-aware code paths.
- Strong background in automation, CI/CD, and building reliable systems that scale.
- Thrive across environments—from kernel internals to distributed runtimes to data center operations.
- Communicate clearly, make pragmatic decisions, and take ownership of long-term outcomes.
Nice to have:
- Experience operating high-performance, real-time systems at scale (ML inference, HPC, or similar).
- Familiarity with GPUs, FPGAs, or ASICs in production environments.
- Prior exposure to ML frameworks (e.g., PyTorch) or compiler tooling (e.g., MLIR).
- Track record of delivering complex production systems in high-impact environments.
Compensation: At Groq, a competitive base salary is part of our comprehensive compensation package, which includes equity and benefits. For this role, the base salary range is $236,360 to $278,070, determined by your location, skills, qualifications, experience and internal benchmarks. This range is specific to roles in the United States, compensation for candidates outside the USA will be dependent on the local market.
Groq is an Equal Opportunity Employer. We are committed to creating an inclusive environment for all employees and applicants. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, sex (including gender identity, sexual orientation, and pregnancy), age, disability, genetic information, protected veteran status, or any other characteristic protected by applicable law.
Groq complies with all applicable federal, state, and local laws governing nondiscrimination in employment. We do not tolerate discrimination or harassment based on any protected characteristic.
Groq is committed to working with and providing reasonable accommodations to qualified individuals with physical or mental disabilities. If you require a reasonable accommodation to complete an application or to participate in the hiring process, please contact us at talent@groq.com. This contact is for accommodation requests only, which will be considered on a case-by-case basis.
All offers of employment are contingent upon verification of the applicant’s identity and employment authorization in accordance with federal law.
Groq encourages people with criminal record histories to apply for employment, and values diverse experiences, including prior contact with the criminal legal system. To that end, Groq welcomes such applicants in accordance with the California Fair Chance Act, Los Angeles City Fair Chance Act Ordinance, Los Angeles County Fair Chance Act Ordinance, and San Francisco Fair Chance Act Ordinance. Philadelphia applicants can review information pertaining to Philadelphia’s Fair Criminal Record Screening Standards Ordinance here: https://www.phila.gov/documents/fair-chance-hiring-law-poster.
First name *
Last name *
Email *
LinkedIn URL
Phone number *
Location *
Resume *
Click to upload or drag and drop here
Website
Do you currently require, or will you require in the future, sponsorship for an employment visa (e.g., H-1B or similar) to legally work for our company in the country where this role is based? *
Worth Authorization
Would you like to receive text message updates about your interview process and be notified of future job opportunities at groq? *
By selecting yes, you agree to receive SMS updates related to your interview status, scheduling reminders, and relevant future job openings from groq. Message and data rates may apply. You can opt out at any time by replying STOP.
Req ID: R633