Senior Observability Engineer
Groq
Remote
USD 214,952-278,070 / year + Equity
Posted on Oct 9, 2025
Senior Observability Engineer
Remote
Compute, Storage & Eng Infra
Remote
Full-time
About Groq
Groq delivers fast, efficient AI inference. Our LPU-based system powers GroqCloud™, giving businesses and developers the speed and scale they need. From our Bay Area roots to our growing global presence, we are on a mission to make high performance AI compute more accessible and affordable. When real-time AI is within reach, anything is possible. Build fast.
Senior Observability Engineer
Mission:
Ensure the reliability, scalability, and performance of Groq’s observability tools and services for provisioning and managing the full lifecycle of Groq hardware, software, and networking systems at massive scale.
Responsibilities & opportunities in this role:
- Build and maintain comprehensive observability systems at massive scale. Obsess about running high quality production systems with excellent uptime that engineers can trust.
- Constantly iterate on, maintain, update, automate, and dogfood your own systems. Put in place great monitoring of your own systems that can be used as best practices by the rest of the organization.
- Instrument Kubernetes clusters, applications, and datacenter infrastructure components such as switches, PDUs, environmental sensors, cameras, chillers, etc.
- Familiarity with and strong opinions on signals:
- Effective canonical logging and cost control
- Tracing expertise including context propagation, tail sampling strategies, attribute enrichment, querying
- Metrics derived from a variety of systems such as hosts, kube-state-metrics, kubelet, IPMI, SNMP
- Be a teacher: advise teams on instrumenting their applications in a variety of languages (Rust, C++, TypeScript, GoLang), implementing sensible SLO and alerting strategies, as well as on-call best practices.
- Be a student: Groq is uniquely vertically integrated. You will be challenged with tasks in unfamiliar domains, and constantly expand your knowledge of technologies ranging from networking to FPGA design.
Ideal Candidates have/are:
- 4+ years of experience in observability as a core responsibility of previous roles
- Deep understanding of cloud-native technologies and infrastructure as a service (IaaS) such as Terraform and Flux
- Have instrumented large Kubernetes clusters and built operators
- Expertise in standing up and running monitoring, observability, and alerting systems — OpenTelemetry Tracing and Collector, Grafana/Prometheus, PagerDuty, AlertManager, IPMI, SNMP, etc.
- Strong analytical and problem-solving skills with a focus on root cause analysis and mitigation
- Excellent communication and teamwork skills with the ability to collaborate effectively across engineering teams
Attributes of a Groqster:
- Humility – Egos are checked at the door
- Collaborative & Team Savvy – We make up the smartest person in the room, together
- Growth & Giver Mindset – Learn it all versus know it all, we share knowledge generously
- Curious & Innovative – Take a creative approach to projects, problems, and design
- Passion, Grit, & Boldness – No-limit thinking, fueling informed risk taking
Compensation: At Groq, a competitive base salary is part of our comprehensive compensation package, which includes equity and benefits. For this role, the base salary range is $214,952 to $278,070, determined by your location, skills, qualifications, experience and internal benchmarks. This range is specific to roles in the United States, compensation for candidates outside the USA will be dependent on the local market.
Groq is an Equal Opportunity Employer. We are committed to creating an inclusive environment for all employees and applicants. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, sex (including gender identity, sexual orientation, and pregnancy), age, disability, genetic information, protected veteran status, or any other characteristic protected by applicable law.
Groq complies with all applicable federal, state, and local laws governing nondiscrimination in employment. We do not tolerate discrimination or harassment based on any protected characteristic.
Groq is committed to working with and providing reasonable accommodations to qualified individuals with physical or mental disabilities. If you require a reasonable accommodation to complete an application or to participate in the hiring process, please contact us at talent@groq.com. This contact is for accommodation requests only, which will be considered on a case-by-case basis.
All offers of employment are contingent upon verification of the applicant’s identity and employment authorization in accordance with federal law.
Groq encourages people with criminal record histories to apply for employment, and values diverse experiences, including prior contact with the criminal legal system. To that end, Groq welcomes such applicants in accordance with the California Fair Chance Act, Los Angeles City Fair Chance Act Ordinance, Los Angeles County Fair Chance Act Ordinance, and San Francisco Fair Chance Act Ordinance. Philadelphia applicants can review information pertaining to Philadelphia’s Fair Criminal Record Screening Standards Ordinance here: https://www.phila.gov/documents/fair-chance-hiring-law-poster.
Req ID: R652