Saransh Patel

Undergraduate · Computer Science

I am an undergraduate Computer Science student with a deep interest in ML Systems and Performance Engineering. I spend most of my time trying to understand how large language models actually run on hardware, what makes inference fast, what makes it slow, and what can be done about it.

My research interests are centered on LLM inference optimization, CUDA kernel engineering, GPU memory systems, and efficient AI deployment. I am particularly drawn to the intersection of systems and machine learning — the layer where software meets hardware and where performance decisions have real consequences.

I am currently building my foundations through a structured self-study path covering GPU architecture, CUDA programming, distributed training, and inference systems. Alongside that, I work on hands-on projects benchmarking and optimizing inference pipelines, profiling GPU kernels, and experimenting with quantization techniques like INT8 and AWQ.

Outside of technical work, I believe in learning in public — sharing what I build, what I measure, and what I learn along the way.

Go checkout my Readings & Blogs sections. Let's connect: saranshappy@gmail.com · Twitter/X · GitHub

Saransh Patel

New/Updates

Projects

Research

Education

Publications