Back to Careers

Own compiler, kernel, and memory-layout capabilities that turn hardware features into measurable inference performance.

Serve as the core owner for compiler, kernel, and memory-layout capabilities, from planning and design to implementation.
Build low-level software capabilities for the hardware execution stack, including compilation abstractions, code generation, critical-path optimization, and low-level interfaces.
Implement and optimize critical LLM inference operators and memory-access paths to convert hardware capability into real performance gains.
Design and implement data layout, memory placement, execution paths, and optimization mechanisms around hardware memory characteristics.
Work with runtime, silicon architecture, and systems-software teams to define boundaries and interfaces between compiler, runtime, kernels, and hardware.
Drive performance iteration, debugging, and engineering reuse in prototype hardware environments.

PhD in computer science, electronic engineering, automation, mathematics, compilers, high-performance computing, or a related field.
Strong foundation in compilers, program optimization, or high-performance systems, with understanding of IR, lowering, bufferization, codegen, memory hierarchy, tiling, and fusion.
Strong low-level implementation skills in C/C++ and familiarity with one or more of CUDA, Triton, LLVM/MLIR, TVM, IREE, XLA, or CUTLASS.
Clear understanding of LLM inference hot paths and performance bottlenecks in attention, KV Cache, MoE, and related operators.
Ability to independently design, implement, debug, and optimize complex kernels or compiler modules.
Engineering ownership and willingness to iterate against real workloads and real hardware constraints.

Postdoctoral Researcher - LLM Compiler