compute, memory, and scheduling 4. Use and extend Triton / CUDA / CUTLASS, and integrate optimized kernels with PyTorch / XLA... and execution models 3. Proficiency in CUDA C++ or Triton, with the ability to independently write and optimize kernels 4...
. Implement and tune CUDA kernels and GPU-accelerated components to maximize throughput and minimize latency for inference..., cache-aware design, avoiding fragmentation, RAII, move semantics). Practical experience with CUDA (or similar GPU...
Research Scientist, Electronic Design Automation - New College Grad 2026
) Strong Programming & Systems Skills: Proficiency in at least two of Python, PyTorch, C++, or CUDA Publications in top EDA and AI/ML...
Nvidia ⚡ ⚡ Sat, 07 Feb 2026 00:07:35 GMT