My Workflow for Understanding LLM Architectures
My Workflow for Understanding LLM Architectures A learning-oriented workflow for understanding new open-weight model releases Many people asked me over the past months to share my workflow for how I come up with the LLM architecture sketches and drawings in my articles, talks, and the LLM-Gallery. So I thought it would be useful to document the process I usually follow. The short version is that I usually start with the official technical reports, but these days, papers are often less detailed than they used to be, especially for most open-weight models from industry labs. The good part is that if the weights are shared on the Hugging Face Model Hub and the model is supported in the Python transformers library, we can usually inspect the config file and the reference implementation directly to get more information about the architecture details.…