Context Engineering: From Prompts to Infrastructure
What is Context Engineering?
Context engineering is the practice of designing and structuring all the information an AI uses to generate a response, beyond just the user prompt. It’s about filling the LLM’s limited context window with precisely the right information for each task.
Why is it needed?
LLMs have finite context windows and suffer “context rot” performance degrades as the number of tokens increases. They can’t read minds; without proper context, even powerful models hallucinate or fail. As Gartner states:[3] “Most agent failures are context failures, not model failures.” Context engineering solves this by treating context as a managed resource.
How it works
Context engineering is fundamentally about understanding how Large Language Models process information and strategically structuring that information to achieve optimal results. Unlike traditional programming, where you write explicit instructions, LLMs operate on probabilistic pattern matching across the context you provide. This makes the art of context preparation crucial to success.
1. The context window
At the heart of context engineering lies the concept of a context window, a finite amount of text that an LLM can “see” and process at any given time. Think of it as the model’s working memory or short-term attention span.
Modern LLMs have varying context window sizes. GPT-5 ranges from 128,000 tokens to 272,000 tokens, while Claude models can handle up to 300,000 tokens. To put this in perspective, one token is approximately 0.75 words in English, so a 200K token window can process roughly 150,000 words, equivalent to a full-length novel.
Note:
The position of information within the context window dramatically affects how the model processes it. Research[7] shows that LLMs exhibit both primacy effects (paying more attention to information at the beginning) and recency effects (emphasising recent information), while potentially “losing track” of details in the middle.
The Context Engineering Pipeline
Context engineering follows a systematic pipeline that transforms raw user queries into optimally structured inputs for the LLM.
In practice, context engineering is an iterative process. Here’s a realistic workflow:
- Step 1: Start with a baseline prompt that includes basic instructions and the user query.
- Step 2: Evaluate the output. Is it accurate? Does it follow instructions? Is the format correct?
- Step 3: Identify gaps. Did the model miss key requirements? Did it misunderstand the task?
- Step 4: Refine the context. Add examples for missed patterns, clarify ambiguous instructions, or restructure for better flow.
- Step 5: Test again and repeat until you achieve consistent, high-quality outputs.
Current Context Engineering Methods
Leading LLM providers employ distinct approaches to context optimisation. Modern LLMs use specialised techniques tailored to their architecture. OpenAI’s GPT models emphasise system messages and function calling, providing structured API interfaces for tool integration. Few-shot learning demonstrates patterns through examples, while chain-of-thought prompting encourages step-by-step reasoning.
Claude (Anthropic) leverages XML tag structuring for hierarchical organisation and offers extended context windows up to 200K tokens. Constitutional AI principles guide ethical responses through context-embedded guidelines.
Both platforms support retrieval-augmented generation (RAG), dynamically injecting relevant documents.
Common context engineering methods include prompt chaining for complex workflows, role prompting for specialised personas, and contextual grounding with explicit constraints.
Context Adaptation: When Context Evolves
So far, we have treated context as something carefully designed upfront. In real systems, however, context continues to change after deployment.
As users interact with a model, recurring patterns emerge, such as repeated failures, common clarifications, and strategies that consistently produce better outcomes. Updating the information provided to the model based on these observations is known as context adaptation.
Today, most context adaptation happens implicitly. Engineers observe outputs, refine instructions, add or remove examples, reorder constraints, or adjust retrieved documents. This process naturally extends the iterative refinement loop described earlier and represents how context engineering is applied in production systems.
However, while effective at a small scale, this form of adaptation relies heavily on manual effort and human judgment.
Why Manual Context Adaptation Breaks Down?
As AI systems grow in complexity, manual context adaptation begins to show clear weaknesses:
- Limited scalability
Human-driven updates cannot keep pace with increasing users, tasks, and domains.
- Inconsistency
Context changes depend on individual judgment, leading to fragile and hard-to-reproduce behaviour.
- Brevity bias
To optimise prompts, useful details are gradually removed, slowly degrading performance.
- Context collapse
Rewriting large portions of context can cause a sudden loss of critical information.
These limitations suggest that context adaptation itself needs to be structured, incremental, and automated.
Agentic Context Engineering (ACE): New Frontier
Agentic Context Engineering (ACE) is a research-backed approach where context is no longer manually refined, but continuously improved by agents that observe outcomes, reflect on performance, and update context over time.
Rather than treating context as a static prompt, ACE treats it as a living playbook.
ACE is inspired by a dynamic cheat-sheet algorithm that adopts an agentic architecture, comprising three specialised components.
1. Generator — Produces responses and executes tasks using the current context.
2. Reflector — Analyses outcomes, identifying what worked and what failed based on execution feedback.
3. Curator — Updates the context incrementally, preserving useful information while avoiding full rewrites.
Crucially, context updates are applied as small, localized changes rather than complete rewrites, preventing information loss and preserving long-term knowledge.
Research shows that ACE improves task performance by up to 10–17%, reduces adaptation cost and latency by over 80%, and enables smaller models to match or exceed much larger systems without retraining or fine-tuning.
For production systems, this means faster iteration, lower operational costs, and more interpretable, auditable behaviour than fine-tuning or model replacement.
Conclusion
Context engineering has evolved from prompt design to structured pipelines and retrieval systems. As LLMs grow more capable, context has become the primary lever for task adaptation, often outweighing the need for fine-tuning. ACE represents the next frontier of this evolution.
In modern LLM systems, context is no longer just input — it is infrastructure.