Own compiler, kernel, and memory-layout capabilities that turn hardware features into measurable inference performance.
Responsibilities
- Serve as the core owner for compiler, kernel, and memory-layout capabilities, from planning and design to implementation.
- Build low-level software capabilities for the hardware execution stack, including compilation abstractions, code generation, critical-path optimization, and low-level interfaces.
- Implement and optimize critical LLM inference operators and memory-access paths to convert hardware capability into real performance gains.
- Design and implement data layout, memory placement, execution paths, and optimization mechanisms around hardware memory characteristics.
- Work with runtime, silicon architecture, and systems-software teams to define boundaries and interfaces between compiler, runtime, kernels, and hardware.
- Drive performance iteration, debugging, and engineering reuse in prototype hardware environments.
Requirements
- PhD in computer science, electronic engineering, automation, mathematics, compilers, high-performance computing, or a related field.
- Strong foundation in compilers, program optimization, or high-performance systems, with understanding of IR, lowering, bufferization, codegen, memory hierarchy, tiling, and fusion.
- Strong low-level implementation skills in C/C++ and familiarity with one or more of CUDA, Triton, LLVM/MLIR, TVM, IREE, XLA, or CUTLASS.
- Clear understanding of LLM inference hot paths and performance bottlenecks in attention, KV Cache, MoE, and related operators.
- Ability to independently design, implement, debug, and optimize complex kernels or compiler modules.
- Engineering ownership and willingness to iterate against real workloads and real hardware constraints.