From Context to Harness: Info Is Ready, But AI Is Still Unreliable

September 25, 2025 AI Tools AI Engineering, Paradigm Evolution, Harness Engineering AI Engineering Series 1227 words 6 min read

🔊

Scenario: Information Is Correct, But Execution Goes Wrong

A company deployed a RAG-based technical documentation Q&A system. The system worked well—when users asked “How to configure Redis cluster?” it could accurately retrieve relevant information from technical documents and provide detailed configuration steps.

Problem: When a user asked “Delete temporary files in the test directory,” the system correctly retrieved the right technical documentation, but during execution it mistakenly deleted the entire project’s core code.

Result: Technical knowledge transfer was perfect, but the execution result was catastrophic.

This scenario reveals a critical issue: Context Engineering solved the knowledge problem but not the execution problem.

The Core Problem

Two Dimensions of Challenges

Context Engineering gives the model the right information but doesn’t control how the model processes that information. This introduces two critical challenge dimensions:

1. Safety Dimension (What NOT to do)

Permission Boundaries: What the model should and shouldn’t do
Safety Red Lines: Absolutely prohibited operations
Compliance Requirements: Legal regulations and company policies

2. Reliability Dimension (How to verify)

Result Verification: How to judge if execution results are correct
Error Detection: Mechanisms to detect execution anomalies
Correction Capability: Remediation when problems occur

Knowledge vs. Execution Differences

Dimension	Context Engineering	Execution Challenges
Goal	Provide correct information	Control correct execution
Difficulty	Information retrieval and integration	Behavioral constraints and verification
Focus point	Information quality	Behavioral safety
Method	Optimize context	Design constraint systems

The Maturity of Tool Calling Capabilities

Starting from 2023, AI systems’ tool calling capabilities have undergone rapid evolution, directly driving the engineering paradigm shift.

OpenAI Function Calling (June 2023)

OpenAI officially launched Function Calling in June 2023:

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# OpenAI Function Calling example
import openai

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[
        {"role": "user", "content": "What's the weather like in Boston?"}
    ],
    functions=[
        {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                },
                "required": ["location"]
            }
        }
    ],
    function_call="auto"
)

This breakthrough allowed models to:

Understand tool purposes: Understand tool functionality through function descriptions
Parameter parsing: Automatically extract parameters from user intent
Execution coordination: Call external tools on demand

ReAct Pattern (2022-2023)

The ReAct (Reasoning + Acting) pattern proposed by Yao et al. at ICLR 2023:

mermaid
flowchart TD
    A["User Question"]
    B["Think"]
    C{"Need Tool?"}
    D["Call Tool<br/>Observe Result"]
    E["Direct Answer"]
    G["Final Answer"]
    A --> B
    B --> C
    C -->|"Yes"| D
    C -->|"No"| E
    D --> B
    E --> G

    style A fill:#FF9800,color:#fff
    style B fill:#2196F3,color:#fff
    style C fill:#FFC107,color:#000
    style D fill:#9C27B0,color:#fff
    style E fill:#2196F3,color:#fff
    style G fill:#4CAF50,color:#fff

Core innovations of ReAct:

Reasoning loop: Complete closed loop of think-act-observe
Tool orchestration: Ordered calls of multiple tools
Result integration: Integrate tool results into final answers

Toolformer (November 2023)

Toolformer proposed by Schick et al. at NeurIPS 2023:

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# Toolformer autonomously learning to use tools
class Toolformer:
    def __init__(self, model, tools):
        self.model = model
        self.tools = tools
    
    def learn_tool_usage(self, training_data):
        # 1. Identify scenarios needing tools
        tool_needs = self.identify_tool_needs(training_data)
        
        # 2. Learn calling patterns autonomously
        for need in tool_needs:
            tool_call = self.model.generate_tool_call(need)
            self.tools.execute(tool_call)
            
            # 3. Integrate results into training data
            enhanced_data = self.integrate_results(tool_call, training_data)
        
        return enhanced_data

The significance of Toolformer:

Autonomous learning: Models learn when to use tools by themselves
Tool library expansion: Not dependent on predefined tool lists
Context awareness: Dynamically select tools based on conversation

AutoGPT (March 2023)

AutoGPT marked the emergence of the first autonomous Agent framework:

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# AutoGPT's autonomous execution mode
class AutoAgent:
    def __init__(self, name, objective):
        self.name = name
        self.objective = objective
        self.tasks = []
        self.completed_tasks = []
    
    def generate_plan(self):
        # 1. Decompose objective into tasks
        self.tasks = self.decompose_objective(self.objective)
        
        # 2. Generate execution plan
        plan = self.create_execution_plan(self.tasks)
        return plan
    
    def execute_plan(self):
        # 3. Execute task sequence autonomously
        for task in self.tasks:
            if task not in self.completed_tasks:
                result = self.execute_task(task)
                self.completed_tasks.append((task, result))
        
        return self.evaluate_completion()

The Emergence of New Requirements

Shift from “Can Answer” to “Can Execute”

With the maturation of tool calling capabilities, the focus of AI systems has shifted:

Phase	Focus	Key Question
Early	Can it answer	“Does it work?”
Prompt Engineering	How to answer better	“How to make it better?”
Context Engineering	What information to know	“What should it know?”
Agent Era	Can it execute safely?	“Can it act safely?”

Complexity of Execution Scenarios

Modern AI Agents face increasingly complex execution scenarios:

1. File System Operations

python
1
2
3
4
5
# Risks in file operations
file_operations = {
    "safe": ["read_file", "list_directory", "create_file"],
    "dangerous": ["delete_directory", "modify_system_file", "execute_script"]
}

2. Network Access

python
1
2
3
4
5
# Risks in network operations
network_operations = {
    "safe": ["fetch_public_data", "send_api_request"],
    "dangerous": ["access_internal_system", "modify_database", "exfiltrate_data"]
}

3. Code Execution

python
1
2
3
4
5
# Risks in code execution
code_execution = {
    "safe": ["run_python_code", "execute_query"],
    "dangerous": ["system_command", "eval_user_input", "import_untrusted_module"]
}

The Contradiction Between Safety and Reliability

While pursuing AI execution capabilities, we face a dilemma:

Need: AI needs sufficient capability to complete complex tasks Risk: The stronger the capability, the greater the potential damage Challenge: How to find balance between capability and safety

The Core Cognitive Shift

From Information Optimization to Behavior Control

Context Engineering focuses on optimizing information flow, while Harness Engineering focuses on controlling behavior flow:

Optimization Direction	Context Engineering	Harness Engineering
Focus point	Information quality	Behavioral constraints
Method	Provide correct information	Design safety mechanisms
Goal	Knowledge accuracy	Execution safety
Evaluation	Information relevance	Behavioral reliability

Human Steer vs. Agent Execute

Core Philosophy: Human Steer, Agent Execute

Human Steer: Humans set objectives, define constraints, monitor processes
Agent Execute: AI executes autonomously within constraint frameworks

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# Human Steer, Agent Execute example
class ControlledAgent:
    def __init__(self, human_constraints):
        self.constraints = human_constraints
        self.execution_context = None
    
    def execute_task(self, task):
        # 1. Human-defined task and constraints
        if not self.validate_task_constraints(task):
            return "Task violates constraints"
        
        # 2. AI execution within constraint framework
        self.execution_context = self.setup_execution_context(task)
        
        # 3. Continuous monitoring during execution
        result = self.monitor_execution(task)
        
        # 4. Result verification and reporting
        return self.validate_and_report(result)

Evolution of Engineering Paradigms

1
2
3
Prompt Engineering → Context Engineering → Harness Engineering
   Optimize language         Optimize info            Control behavior
"How to say right"      "What to know"          "How to act"

This evolution reflects the transformation of AI systems from language models to action systems.

Harness Engineering: The Solution to Execution Problems

Context Engineering identified the existence of execution problems but didn’t provide complete solutions. Harness Engineering is specifically designed to solve AI execution safety and reliability.

The core of Harness Engineering includes:

Tool Injection System: Safe tool calling mechanisms
State Management System: Task execution state tracking
Verification Loop System: Execution result verification and correction
Constraint Layering System: Multi-level execution constraints

These subsystems together form the “safety reins” for AI Agents, layering safe execution on top of answer capability.

The goal of Harness Engineering is to let AI capabilities be exercised responsibly and controllably, advancing AI from an unreliable assistant to a deployable execution system. The next article will expand on the four core subsystems of Harness Engineering.

Part of series: AI Engineering Series

← Previous Context Engineering: Giving AI the Right Knowledge Next → Harness Engineering: Putting Reins and Brakes on AI