Prompt Engineering: Learning to Talk to AI Is Lesson One

November 20, 2023 AI Tools AI Engineering, Paradigm Evolution, Prompt Engineering, LLM AI Engineering Series 2895 words 6 min read

🔊

What is Prompt Engineering?

The core definition of Prompt Engineering is: Designing natural language inputs to guide Large Language Model outputs toward specific results.

This concept seems simple, but it hides a profound assumption: The same model, different prompts → completely different outputs.

Imagine you have an incredibly smart assistant with zero background knowledge. This assistant can perfectly understand and execute any instruction, but it lacks prior knowledge and has no memory. Prompt engineering is the art of learning how to converse with such an assistant.

Core Assumptions

Prompt Engineering is built on several key assumptions:

Fixed Model Capability: The core capability of LLMs is relatively fixed; prompts don’t change their fundamental abilities
Prompts Determine Output: Output quality largely depends on prompt design quality
Iterative Optimization: There’s no perfect prompt, only prompts more suitable for the current task
Context Sensitivity: The same prompt can produce different results in different conversation contexts

Why Prompt Engineering Matters

In the early stages of AI engineering, prompt engineering was almost the only controllable factor. Model selection, training data, and parameter settings were fixed—only prompts could be freely adjusted.

Just as an excellent chef must understand how to use seasonings, AI engineers must master prompt engineering. This is the foundation of conversing with AI systems and the prerequisite for all subsequent advanced technologies.

Origin and Development

The evolution of prompt engineering reflects the progress of the entire AI field:

Pre-Prompt Engineering Era (Before 2022)

2017: Transformer architecture proposed, laying the foundation for modern LLMs
2020: GPT-3 released, demonstrating the potential of large-scale language models
OpenAI Playground (circa 2020): Provided early model interaction and prompt experimentation platforms

Birth of Prompt Engineering (2022)

January 2022: ChatGPT released, bringing prompt engineering to the public eye
March 2022: InstructGPT paper, first systematic study of instruction following
October 2022: LangChain framework released, providing structured prompt templates

Technology Explosion (2023)

January 2023: DSPy v1 released, proposing programmatic prompt design
March 2023: Chain-of-Thought technology became mainstream
June 2023: Self-Consistency method proposed
August 2023: Tree-of-Thought expanded reasoning capabilities

This timeline shows the evolution of prompt engineering from nothing to something, from simple to complex.

Core Technologies

Basic Techniques

Role Prompting

Role prompting is the most fundamental and important technique. By setting a role for AI, you can significantly influence its response style and content depth.

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# Basic role prompting
prompt = "You are an experienced Python developer, please explain what decorators are..."

# Specific role prompting
prompt = """
You are a Python backend engineer with 10 years of experience, who has worked at Google and Facebook.
Your expertise includes:
- High-performance web architecture design
- Distributed system optimization
- Code refactoring and performance tuning

Now please explain the concept of Python decorators to beginners, requiring:
1. Use life-like metaphors
2. Give 3 practical application scenarios
3. Include code examples and performance comparisons
"""

Why role prompting works:

Provides framework and boundaries for responses
Activates relevant knowledge networks in the model
Ensures professionalism and consistency in responses

Clear Instructions

Clear, specific instructions are key to getting high-quality responses.

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Poor prompt
"Write an article about dogs"

# Good prompt
"""
Please write an 800-word popular science article about Border Collies. Include:
1. Origin and historical background
2. Physical characteristics and personality traits
3. Training difficulty and care recommendations
4. Suitable family types

Writing style: Scientific but accessible
Target audience: Potential dog owners considering getting a dog
"""

# Ultra-clear prompt
"""
Role: Canine behavior training expert
Reader level: People with basic dog knowledge
Task: Compare Border Collies and Poodles regarding care requirements
Structure:
- Introduction: Why choose these two breeds
- Middle section:
  * Living space requirements comparison
  * Exercise requirements comparison
  * Intelligence training methods comparison
  * Common behavior problems and solutions
- Conclusion: Summary, recommendations for different family types
Word count: Around 1000 words
Tone: Professional but friendly, avoid overly academic
"""

Formatted Output

Specifying output format can significantly improve practicality:

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
prompt = """
Please analyze the performance issues in the following code:

```python
def process_data(data):
    result = []
    for item in data:
        if item > 10:
            result.append(item * 2)
    return result

Please output the analysis in the following format:

Performance Analysis

Time Complexity

Analysis: …
Big O notation: O(??)

Space Complexity

Analysis: …
Big O notation: O(??)

Optimization Suggestions

Suggestion 1: … Reason: …
Suggestion 2: … Reason: …

Optimized Code

python
1
# Optimized code

"""

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

### Advanced Reasoning Techniques

#### Chain-of-Thought (CoT)

CoT technology was proposed by Google at NeurIPS 2022, revolutionizing LLM reasoning capabilities.

```python
# Traditional prompt
"Calculate: 234 + 567 + 891 = ?"

# Chain-of-thought prompt
"""
Let's calculate this addition step by step:

Step 1: 234 + 567
- 200 + 500 = 700
- 30 + 60 = 90  
- 4 + 7 = 11
- Total: 700 + 90 + 11 = 801

Step 2: 801 + 891
- 800 + 800 = 1600
- 1 + 900 = 901
- Total: 1600 + 901 = 2501

So: 234 + 567 + 891 = 2501
"""

Core idea of CoT: Make the reason step-by-step like humans, rather than giving direct answers.

Few-shot Learning

Few-shot learning guides the model to understand task patterns by providing several examples:

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Few-shot prompt
"""
Please complete the following sentiment analysis task, determining whether the sentence sentiment is positive, negative, or neutral.

Examples:
- This movie was so amazing, I watched it three times. Positive
- The phone battery life is terrible, it dies in just half a day. Negative  
- Today's weather is nice, sunny and bright. Neutral

Now analyze:
- This software interface is clean and the features are very practical.
"""

Best practices:

Examples should be diverse, covering all possible scenarios
Example format should be consistent
Provide sufficient contextual information

Tree-of-Thought (ToT)

Tree-of-Thought extends Chain-of-Thought, allowing the model to explore multiple reasoning paths:

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Tree-of-thought prompt example
"""
You need to solve a math problem, but there might be multiple solution approaches.
For each approach, please evaluate its feasibility and choose the optimal path.

Problem: How to find all prime numbers between 1 and 100?

Solution Path 1: Trial Division
- Steps: Check if each number can be divided by numbers between 2 and sqrt(n)
- Pros: Simple and easy to understand
- Cons: Inefficient for large numbers

Solution Path 2: Sieve of Eratosthenes
- Steps: Create a boolean array and progressively mark non-primes
- Pros: Time complexity O(n log log n), very efficient
- Cons: Needs extra space to store the sieve

Solution Path 3: Probabilistic algorithms
- Steps: Use probabilistic testing methods
- Pros: More effective for extremely large numbers
- Cons: Might have false positives

Please choose the optimal method and provide specific implementation.
"""

Self-Consistency

Self-Consistency method generates multiple reasoning paths and chooses the most consistent result:

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# Self-consistency prompt
"""
Please solve the following math problem: A cage contains chickens and rabbits totaling 35 animals, with 94 legs total. How many chickens and rabbits are there?

Please provide at least 3 different solution methods and verify the consistency of results.

Method 1:
Equation method...
Result: 23 chickens, 12 rabbits

Method 2:
Assumption method...
Result: 23 chickens, 12 rabbits

Method 3:
Enumeration method...  
Result: 23 chickens, 12 rabbits

Comprehensive analysis: All three methods yield the same result, confirming the answer is correct.
"""

Optimization Methods

Temperature Tuning

Temperature controls the randomness of LLM outputs:

Low temperature (0.1-0.3): More deterministic, conservative outputs
Medium temperature (0.5-0.7): Balance between determinism and creativity
High temperature (0.8-1.0): More diverse, creative outputs

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# Temperature tuning examples
creative_prompt = """
Imagine a future city scene, including the following elements:
- Air transportation system
- Ecological buildings
- AI service assistants

Describe the daily life in this city using imaginative language.
temperature=0.8
"""

technical_prompt = """
Analyze 10 main challenges future cities might face, each including:
1. Problem description
2. Impact assessment
3. Possible solutions

Requirement: Make reasonable predictions based on existing technology development trends.
temperature=0.3
"""

Iterative Optimization

Prompt engineering is an iterative process:

Initial prompt: Create basic prompt based on requirements
Test feedback: Run the model, observe output results
Problem identification: Find shortcomings in the output
Prompt adjustment: Modify the prompt针对性地
Effect verification: Test again to verify improvements

A/B Testing

Optimize by comparing effects of different prompts:

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# Comparative experiment
prompt_A = """
As a data scientist, please explain the concept of overfitting in machine learning.
Include definition, causes, detection methods, and solutions.
"""

prompt_B = """
You need to explain overfitting to non-technical people. Please use:
1. Life-like metaphors (like wearing shoes)
2. Specific code examples
3. Intuitive chart descriptions
4. Avoid technical jargon, use simple language
"""

# Compare both prompts for:
# - Answer accuracy
# - Readability
# - User satisfaction

Prompt Evaluation Framework (APE)

Zhou et al. proposed APE (Automatic Prompt Evaluation) framework in 2022 for systematic prompt quality assessment:

Relevance: Is the answer related to the question?
Accuracy: Is the information correct?
Completeness: Does it cover all important aspects?
Consistency: Is the answer internally logically consistent?
Safety: Does it contain harmful or biased content?

Hands-on Practice

Let’s master prompt engineering through a practical exercise. The goal is to create an assistant that can automatically analyze code quality.

Step 1: Define Requirements

We need something that can:

Analyze code readability
Check performance issues
Provide improvement suggestions
Generate optimized code

Step 2: Create Basic Prompt

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
basic_prompt = """
Please analyze the following Python code:

```python
def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n-1)

print(factorial(5))

Please analyze code quality and provide improvement suggestions. """

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

### Step 3: Iterative Optimization

Let's optimize this prompt step by step:

```python
# First iteration: Add specific requirements
improved_prompt = """
Please analyze the following Python code, focusing on these aspects:

1. **Code Readability**
   - Variable naming clarity
   - Code structure clarity
   - Sufficient comments

2. **Performance Analysis**
   - Time complexity analysis
   - Space complexity analysis
   - Potential performance bottlenecks

3. **Code Quality**
   - Compliance with Python best practices
   - Edge case handling
   - Error handling completeness

4. **Improvement Suggestions**
   - Specific improvement measures
   - Optimized code examples

Code:
```python
def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n-1)

print(factorial(5))

Please provide detailed analysis according to the above structure. """

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68

```python
# Second iteration: Add format requirements
final_prompt = """
Please analyze the following Python code in the following format:

## Code Analysis Report

### 1. Readability Analysis
- Variable naming assessment:
  - Current status...
  - Improvement suggestions...
- Code structure assessment:
  - Current status...
  - Improvement suggestions...
- Comment quality assessment:
  - Current status...
  - Improvement suggestions...

### 2. Performance Analysis
- Time complexity:
  - Analysis: Recursive calls lead to O(n) time complexity
  - Assessment: Acceptable for small inputs, possible stack overflow for large inputs
- Space complexity:
  - Analysis: O(n) due to recursive call stack
  - Assessment: Needs additional stack space for storing call information
- Performance optimization suggestions:
  1. Use iteration instead of recursion to avoid stack overflow
  2. Add input validation to handle negative inputs
  3. Consider memoization to avoid redundant calculations

### 3. Code Quality Assessment
- PEP 8 compliance: Basically compliant, improvements suggested
- Edge case handling: Missing input validation
- Error handling: Lack of exception handling mechanism

### 4. Improved Code
```python
def factorial(n):
    """
    Calculate the factorial of n
    
    Args:
        n: Non-negative integer
        
    Returns:
        Factorial result of n
        
    Raises:
        ValueError: If n is negative
    """
    if not isinstance(n, int) or n < 0:
        raise ValueError("Input must be a non-negative integer")
    
    result = 1
    for i in range(1, n + 1):
        result *= i
    return result

# Test code
if __name__ == "__main__":
    try:
        print(factorial(5))
        print(factorial(0))  # Should output 1
        print(factorial(3))  # Should output 6
        # print(factorial(-1))  # Should raise exception
    except ValueError as e:
        print(f"Error: {e}")

5. Summary Recommendations

Main improvements: Input validation, exception handling, iterative implementation
Expected effects: Improve code robustness and performance
Applicable scenarios: Need stable and reliable factorial calculation functionality

Code:

python
1
2
3
4
5
6
7
def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n-1)

print(factorial(5))

"""

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

### Step 4: Testing and Validation

Run different versions of prompts and compare output quality:

1. **Basic version**: Simple analysis
2. **Improved version**: Structured analysis, but not detailed enough
3. **Final version**: Complete detailed analysis report

Through this process, you can see the significant effect of prompt optimization.

## Why Prompt Engineering Is Not Enough?

Although prompt engineering is an important foundation, it has several fundamental limitations:

### 1. No Memory Capability

Each conversation is independent; AI cannot remember previous interactions:

```python
# First round
user: "I am a Python developer"
assistant: "Great, Python is a powerful programming language..."

# Second round (AI doesn't remember "I am a Python developer")
user: "What Python frameworks should I learn?"
assistant: "As a Python developer, you could learn..."

2. Knowledge Timeliness Issues

AI has knowledge cutoff dates and cannot get the latest information:

python
1
2
3
# AI doesn't know about latest version features
user: "What are the new features in Python 3.13?"
assistant: "Python 3.13 was released in 2024, including these new features..."  # May give wrong info

3. No External Knowledge Access

AI cannot access private documents, databases, or latest web information:

python
1
2
user: "Based on our company's API documentation, how to implement user authentication?"
assistant: "Sorry, I cannot access your company's private documents..."  # Cannot provide specific help

4. Tool Usage Limitations

AI cannot directly call external tools or execute code:

python
1
2
user: "Please help me run data analysis script and generate charts"
assistant: "I cannot directly run your code, but I can provide code suggestions..."  # Cannot actually execute

5. Single Interaction Limitations

Each conversation is independent; cannot build complex workflows:

python
1
2
3
4
5
user: "Help me analyze sales data"
assistant: "I can help you analyze, but you need to provide the data..."  # Needs user to provide info

user: "Here is the sales data file..."  # User has to repeat information
assistant: "Now I received the data, let me analyze..."  # AI doesn't remember previous request

Summary: The Value of Prompt Engineering

Foundational Position

Prompt engineering is the cornerstone of AI engineering, just like basic syntax in programming. No matter how technology evolves, effective communication with AI remains necessary.

Enduring Value

In the following scenarios, prompt engineering is still the optimal choice:

Quick prototype validation: No complex system needed, just quick answers
Creative inspiration: Brainstorming and creative generation
Learning assistance: Concept explanation and knowledge organization
Content creation: Various content generation like articles, code, poetry

Bridge to the Next Level

The value of prompt engineering lies not in what problems it can solve, but in helping us understand AI capabilities and limitations. Through prompt engineering, we learned:

How to clearly express requirements
How to guide AI to produce high-quality outputs
Recognize AI’s knowledge boundaries
Understand prompt limitations

Just like the evolution from assembly language to high-level languages, prompt engineering represents the first step in AI interaction. The real breakthrough comes from recognizing: The problem is not about communicating more clearly, but about giving AI more information and capabilities.

This naturally leads us to the next stage: Context Engineering.

In the next article, we’ll explore how to break through prompt limitations by providing external information to significantly enhance AI capabilities.

Part of series: AI Engineering Series

← Previous The Evolution of AI Engineering Paradigms: Four Shifts from Prompt Engineering to Loop Engineering Next → From Prompts to Context: Why Clear Instructions Alone Are Not Enough