Learning Objectives:
Simple LLM: User → LLM → Response (one-shot)
Agent: Autonomous system that:
Key difference: Autonomy and iteration
┌─────────────────────────────────────┐
│ Goal: Answer user question │
└───────────┬─────────────────────────┘
│
▼
┌────────────────────┐
│ 1. PERCEIVE │
│ What's the state? │
└────────┬───────────┘
│
▼
┌────────────────────┐
│ 2. REASON │
│ What should I do? │
└────────┬───────────┘
│
▼
┌────────────────────┐
│ 3. ACT │
│ Execute action │
└────────┬───────────┘
│
▼
Goal achieved? ──No──┐
│ │
Yes │
│ │
▼ │
Return ───────────┘
answer
User: “What’s the most recent BRCA1 paper?”
Agent loop:
Iteration 1:
Iteration 2:
Iteration 3:
Problems agents solve:
Example: “Find genes associated with condition X, get their pathways, and identify common drugs”
Traditional script:
def analyze():
data = fetch_data()
result = process(data)
return result
Fixed path, no adaptation
Agent:
def agent_loop():
while not goal_achieved():
observation = perceive()
action = reason(observation)
result = act(action)
if error: adapt_strategy()
return final_answer
Dynamic path, adaptive
Common patterns:
We’ll focus on ReAct (most popular)
ReAct = Reasoning + Acting
Pattern:
Thought: What do I need to do?
Action: [tool_name, args]
Observation: [tool result]
Thought: What does this mean?
Action: [next tool_name, args]
Observation: [tool result]
...
Thought: I can answer now
Answer: [final response]
Explicit reasoning makes debugging easier
Question: “How many exons does BRCA1 have?”
Thought: I need gene structure info for BRCA1
Action: search_gene_database("BRCA1")
Observation: {gene_id: 672, ...}
Thought: Got gene ID, now fetch detailed structure
Action: get_gene_structure(672)
Observation: {exons: 24, introns: 23, ...}
Thought: I have the exon count now
Answer: BRCA1 has 24 exons.
def simple_agent(question, tools, max_iterations=5):
messages = [{"role": "user", "content": question}]
for i in range(max_iterations):
# LLM reasons about what to do
response = completion(
model="anthropic/claude-sonnet-4-20250514",
messages=messages,
tools=tools
)
# Check if done
if not response.choices[0].message.tool_calls:
return response.choices[0].message.content
# Execute tools
for tool_call in response.choices[0].message.tool_calls:
result = execute_tool(tool_call)
messages.append({
"role": "tool",
"content": str(result),
"tool_call_id": tool_call.id
})
return "Max iterations reached"
Agents need to track:
Memory types:
Short-term: Current context window Long-term: Database/vector store for past interactions Working memory: Scratchpad for intermediate results
class AgentMemory:
def __init__(self):
self.short_term = [] # Recent messages
self.working_memory = {} # Key findings
self.tool_history = [] # Tool calls made
def add_observation(self, tool_name, result):
self.short_term.append({
"tool": tool_name,
"result": result,
"timestamp": time.time()
})
def get_context(self, max_tokens=4000):
# Return relevant context within token limit
return self.short_term[-10:] # Last 10 items
def store_finding(self, key, value):
self.working_memory[key] = value
Alternative to ReAct:
Phase 1: Planning
Goal: Find common pathways for gene set
Plan:
1. For each gene, query pathway database
2. Collect all pathways
3. Find intersection
4. Rank by gene count
Phase 2: Execution
Execute step 1... ✓
Execute step 2... ✓
Execute step 3... ✓
Execute step 4... ✓
Advantage: Clear structure Disadvantage: Less adaptive to unexpected results
Rigid plan:
1. Query database
2. Process results ← ERROR: No results found
3. Generate report ← Can't proceed
Adaptive agent (ReAct):
Try query database → No results
Thought: Database may be down, try alternative
Action: Use web search instead
Observation: Found relevant papers
Thought: Can proceed with alternative data
Agents can adapt, scripts cannot
Add self-critique:
Action: search_pubmed("BRCA mutations")
Observation: 50,000 results (too many)
Reflection: Query too broad, need refinement
Action: search_pubmed("BRCA1 pathogenic mutations clinical")
Observation: 2,000 results (manageable)
Reflection: Much better, proceed with these
Agent evaluates own performance
Essential tools:
Goal: “Summarize CRISPR base editing advances in 2024”
Agent workflow:
Autonomous multi-step research
Goal: “Is variant chr17:g.43094692G>A pathogenic?”
Agent workflow:
Integrates multiple evidence sources
Challenges:
Agents ≠ production-ready without validation
When should an agent stop?
Strategies:
for i in range(max_iterations):
...
if response.content.startswith("FINAL_ANSWER:"):
return response
if total_tokens > budget:
return "Budget exceeded"
with timeout(60):
agent.run()
Common issues:
1. Infinite loops
2. Wrong tool selection
3. Parameter hallucination
4. Premature termination
How to measure agent success?
Metrics:
Challenges:
Demo 1: Simple ReAct Agent
Demo 2: Literature Research Agent
Demo 3: Debugging Agent Failures
Components:
Example query: “Find the chromosomal location of TP53 and identify nearby genes within 1Mb”
Watch: How agent plans and executes steps
Task: Autonomous literature review
Capabilities:
Example: “What are therapeutic strategies for Huntington’s disease mentioned in 2023-2024 papers?”
Demonstrate:
Learn from failures
Theory: LLMs trained on reasoning patterns in text
Practice: Explicit “Thought:” prompts activate reasoning
Theory: Attention processes all context
Practice: Agent state/history must fit in context window
Theory: Autoregressive generation (one token at a time)
Practice: Agent decisions are sequential, not globally optimal
✅ Good for:
❌ Bad for:
Next topic: AI-Assisted Development Workflow
Bringing it all together:
The most practical session yet!
Papers:
Frameworks:
Demo code: lectures/demos/session_4/
Next session: AI-Assisted Development Workflow
The grand finale where we build a complete tool using everything we’ve learned