Undergraduate · Computer Science
I am an undergraduate Computer Science student with a deep interest in ML Systems and Performance Engineering. I spend most of my time trying to understand how large language models actually run on hardware, what makes inference fast, what makes it slow, and what can be done about it.
My research interests are centered on LLM inference optimization, CUDA kernel engineering, GPU memory systems, and efficient AI deployment. I am particularly drawn to the intersection of systems and machine learning — the layer where software meets hardware and where performance decisions have real consequences.
I am currently building my foundations through a structured self-study path covering GPU architecture, CUDA programming, distributed training, and inference systems. Alongside that, I work on hands-on projects benchmarking and optimizing inference pipelines, profiling GPU kernels, and experimenting with quantization techniques like INT8 and AWQ.
Outside of technical work, I believe in learning in public — sharing what I build, what I measure, and what I learn along the way.
Go checkout my Readings & Blogs sections. Let's connect: saranshappy@gmail.com · Twitter/X · GitHub
A Retrieval-Augmented Generation chatbot for querying arXiv research papers. Built with LangChain, NVIDIA AI Endpoints, FAISS vector store, and Gradio.
Intelligent graceful degradation middleware for Node.js/Express apps. Routes requests to reduced-functionality fallbacks under load, using a physics-inspired priority model. 583 tests across 12 suites.
B.Tech in Computer Science
ABES Engineering College · 2023 – 2027