Inference engineer
Mirai On Device AI
We're looking for engineers who can bridge the gap between ML research and high-performance inference.
You'll work across our inference engine and model conversion toolkit, implementing new model architectures, supporting new modalities, writing optimized kernels, and building a wide range of features such as function calling and batch decoding.
This role is ideal for someone who reads papers for fun, enjoys writing high-performance code, and gets excited about constant learning.
Nobody knows everything. We'd rather you know one area deeply than everything superficially. If you're good at least in a couple of these areas, you're a great fit:
JAX / Equinox / Pallas stack
Rust systems programming with a focus on developer experience
Writing Metal / Vulkan kernels
Neural codecs and voice model architectures
Trellis-based quantization approaches
Advanced speculative decoding methods, such as EAGLE
Deep understanding of Transformer / SSM / Diffusion / Vision language models
Benchmarking inference performance and model quality
Strong linear algebra, optimization methods, and probability theory
And of course, basic engineering skills, we will ship a lot of code 🙃
We welcome applications from students and early-career engineers. If you've participated in projects that demonstrate systems thinking and ML understanding, we want to hear from you!