Research
Undergraduate thesis and research output.
This section centers on my undergraduate thesis work on runtime-adaptive pruning for large language models and the methodology behind the final system.
Thesis
Undergraduate Thesis: SPRINT - Sensitivity-guided Pruning for Inference-Time Adaptation of LLMs
June 2025 - April 2026 | Completed
A runtime-adaptive framework for selecting structural pruning intensity per prompt using oracle sensitivity labels, a learned router, and a DDQN controller. The work is centered on runtime adaptation: deciding how aggressively to prune an LLM based on prompt sensitivity, early backbone signals, and live hardware telemetry.
Overall gist
SPRINT combines oracle sensitivity labeling, a learned complexity router, and a DDQN controller so the system can choose different structural pruning actions per prompt instead of relying on one static pruning profile for every situation.
- Built around a balanced 10,000-prompt dataset across GSM8K, MBPP, WikiText-2, MMLU, and BoolQ.
- Uses a BERT-mini-based router to predict sensitivity before inference and guide pruning decisions.
- Targets real end-to-end speedup by applying structural actions such as layer skipping and head pruning.