Workshop 3 Introduction Slides

BU Bioinformatics Computational Skills Workshop

Workshop 3 Introduction Slides

Workflow Automation with Snakemake

Make your genome analysis reproducible and shareable
Refactor your previous scripts into a Snakemake workflow
Use LLMs to help design, implement, and debug your workflow

Problem Statement

Refactor your scripts from previous workshops into a Snakemake workflow
Automate steps for downloading data, running sequence analysis, and summarizing results
Run your workflow on the compute cluster
Summarize your workflow and findings for your PI

Why Use Workflow Management?

Ensures reproducibility and transparency
Automates complex, multi-step analyses
Facilitates collaboration and sharing

What is Snakemake?

A workflow management system for reproducible data analysis
Uses a simple, readable syntax to define rules and dependencies
Integrates easily with Python scripts and cluster computing

Workshop Workflow: Problem → Prompt → Code → Debug → Result

Problem: Define the workflow challenge
Prompt: Craft an effective LLM prompt
Code: Generate and run Snakemake rules
Debug: Identify and fix errors (locally and on the cluster)
Result: Summarize and interpret findings

Getting Started: Example LLM Prompt

I need to refactor my genome analysis scripts into a Snakemake
workflow that downloads an ancient genome FASTA file, computes
sequence statistics, and summarizes the results. Please generate
a Snakefile and example rule for running the analysis on a compute
cluster.