Bibliographic record
How Do LLMs Use Their Depth?
- Authors
- Akshat Gupta, Jay Yeung, Gopala Anumanchipalli, Anna Ivanova
- Publication year
- 2025
- OA status
- oa_green
Print
Need access?
Ask circulation staff for physical copies or request digital delivery via Ask a Librarian.
Digital copy
Unavailable in your region (PD status unclear).
Abstract
Growing evidence suggests that large language models do not use their depth
uniformly, yet we still lack a fine-grained understanding of their layer-wise
prediction dynamics. In this paper, we trace the intermediate representations
of several open-weight models during inference and reveal a structured and
nuanced use of depth. Specifically, we propose a "Guess-then-Refine" framework
that explains how LLMs internally structure their computations to make
predictions. We first show that the top-ranked predictions in early LLM layers
are composed primarily of high-frequency tokens, which act as statistical
guesses proposed by the model early on due to the lack of appropriate
contextual information. As contextual information develops deeper into the
model, these initial guesses get refined into contextually appropriate tokens.
Even high-frequency token predictions from early layers get refined >70% of the
time, indicating that correct token prediction is not "one-and-done". We then
go beyond frequency-based prediction to examine the dynamic usage of layer
depth across three case studies. (i) Part-of-speech analysis shows that
function words are, on average, the earliest to be predicted correctly. (ii)
Fact recall task analysis shows that, in a multi-token answer, the first token
requires more computational depth than the rest. (iii) Multiple-choice task
analysis shows that the model identifies the format of the response within the
first half of the layers, but finalizes its response only toward the end.
Together, our results provide a detailed view of depth usage in LLMs, shedding
light on the layer-by-layer computations that underlie successful predictions
and providing insights for future works to improve computational efficiency in
transformer-based models.
uniformly, yet we still lack a fine-grained understanding of their layer-wise
prediction dynamics. In this paper, we trace the intermediate representations
of several open-weight models during inference and reveal a structured and
nuanced use of depth. Specifically, we propose a "Guess-then-Refine" framework
that explains how LLMs internally structure their computations to make
predictions. We first show that the top-ranked predictions in early LLM layers
are composed primarily of high-frequency tokens, which act as statistical
guesses proposed by the model early on due to the lack of appropriate
contextual information. As contextual information develops deeper into the
model, these initial guesses get refined into contextually appropriate tokens.
Even high-frequency token predictions from early layers get refined >70% of the
time, indicating that correct token prediction is not "one-and-done". We then
go beyond frequency-based prediction to examine the dynamic usage of layer
depth across three case studies. (i) Part-of-speech analysis shows that
function words are, on average, the earliest to be predicted correctly. (ii)
Fact recall task analysis shows that, in a multi-token answer, the first token
requires more computational depth than the rest. (iii) Multiple-choice task
analysis shows that the model identifies the format of the response within the
first half of the layers, but finalizes its response only toward the end.
Together, our results provide a detailed view of depth usage in LLMs, shedding
light on the layer-by-layer computations that underlie successful predictions
and providing insights for future works to improve computational efficiency in
transformer-based models.
Copies & availability
Realtime status across circulation, reserve, and Filipiniana sections.
Self-checkout (no login required)
- Enter your student ID, system ID, or full name directly in the table.
- Provide your identifier so we can match your patron record.
- Choose Self-checkout to send the request; circulation staff are notified instantly.
| Barcode | Location | Material type | Status | Action |
|---|---|---|---|---|
| No holdings recorded. | ||||
Digital files
Preview digitized copies when embargo permits.
-
View digital file
original
APPLICATION/PDF · 12.86 MB
Links & eResources
Access licensed or open resources connected to this record.
- oa Direct