Research

Undergraduate thesis and research output.

This section centers on my undergraduate thesis work on runtime-adaptive pruning for large language models and the methodology behind the final system.

Thesis

Undergraduate Thesis: SPRINT - Sensitivity-guided Pruning for Inference-Time Adaptation of LLMs

June 2025 - April 2026 | Completed

LLMsStructural PruningReinforcement LearningAdaptive Inference

A runtime-adaptive framework for selecting structural pruning intensity per prompt using oracle sensitivity labels, a learned router, and a DDQN controller. The work is centered on runtime adaptation: deciding how aggressively to prune an LLM based on prompt sensitivity, early backbone signals, and live hardware telemetry.

Overall gist

SPRINT combines oracle sensitivity labeling, a learned complexity router, and a DDQN controller so the system can choose different structural pruning actions per prompt instead of relying on one static pruning profile for every situation.

  • Built around a balanced 10,000-prompt dataset across GSM8K, MBPP, WikiText-2, MMLU, and BoolQ.
  • Uses a BERT-mini-based router to predict sensitivity before inference and guide pruning decisions.
  • Targets real end-to-end speedup by applying structural actions such as layer skipping and head pruning.
Open in new tab