Skip to main content
This tutorial walks a bioinformatics engineer through real-time observability of the nf-core/fastquorum pipeline using Tracer’s eBPF-powered monitoring. We simulate a small but realistic UMI-based duplex sequencing workflow on a single chromosome (chr17.fa), run it in a GitHub Codespace, and use Tracer to detect resource bottlenecks, identify redundant I/O, and explain why the pipeline completed in 1m 36s despite only 12 processes.

What You’ll Learn

  • Connect a live Codespace to the Tracer sandbox
  • Auto-instrument a Nextflow pipeline with zero code changes
  • Visualize per-process CPU, memory, and I/O in real time
  • Extract actionable optimization insights
Why this matters: fastquorum is complex (UMI grouping, consensus calling, dual alignment). Without OS-level visibility, engineers guess where time is spent. Tracer shows exactly which process is the bottleneck — no logs, no profiling flags.

Tools Used

  • Pipeline: nf-core/fastquorum v1.0.0+
  • Environment: GitHub Codespaces (Ubuntu 22.04, 4-core, 16GB RAM)
  • Observability: Tracer.bio (eBPF)
  • Container: Docker
  • Genome: chr17.fa (subset of GRCh38)

1. Login & Setup: Tracer Sandbox + GitHub Codespaces

We begin in a GitHub Codespace — a reproducible, cloud-based dev environment that mimics a local VM. Tracer’s eBPF agent runs natively here and streams metrics to the Tracer Sandbox Dashboard (https://dev.sandbox.tracer.cloud) in real time.
1

Open GitHub Codespaces

  1. Go to GitHub Codespaces
  2. Click “New codespace”
  3. Select “Create your own” → Paste this repo: https://github.com/yourusername/nfcore-fastquorum-tracer-demo
  4. Choose machine: 4-core, 16GB RAM (required for Docker + Nextflow)
  5. Click Create codespace Codespace display with the cloned nf-core pipeline repository
Fig 1: Codespace display with the cloned nf-core pipeline repository
2

Install Tracer (One-Liner with Dev Branch & User Token)

In the Codespaces terminal, run:
curl -sSL https://install.tracer.cloud | CLI_BRANCH=dev sh -s user_35Fukh3QxSAxJLgfyE9SwPoPy9K
3

Start Tracer Agent

To start tracking a pipeline, run the following command:
tracer init --token eyJh---- (your token)
Successful connection snapshot 1Successful connection snapshot 2Successful connection snapshot 3Fig 2: You will see something like this upon successful connection (Snapshot of tracer init command which connecting to tracer)
With Tracer agent connected, input validated, and genome indexed, we now execute the full nf-core/fastquorum pipeline. No code changes are required — Tracer’s eBPF hooks automatically detect nextflow launches, label processes, and stream OS-level metrics (CPU, RAM, I/O, syscalls) to your sandbox dashboard in real time.

2. Dataset Preparation

This section is critical — nf-core/fastquorum enforces strict requirements on input format, UMI placement, and file integrity.

Key Preparation Steps

We begin by downloading real test data directly from the nf-core test-datasets repository, ensuring authenticity and compatibility.
Confirm UMI structure — in this case, a 6-base inline UMI (NNNNNN) embedded at the start of Read 1, which matches the expected pattern for duplex consensus sequencing.
Ensure all FASTQs are properly gzipped and accessible via relative paths to avoid runtime errors.
A correctly formatted samplesheet.csv is constructed with mandatory columns: sample, fastq_1, fastq_2, umi_read, and umi_pattern, adhering to the pipeline’s JSON schema.
To eliminate I/O noise during the observed run, the genome index (BWA-MEM1, SAMtools FAIDX, and DICT) is pre-built locally and stored for reuse, ensuring clean, reproducible eBPF telemetry from Tracer.

3. Launch the Pipeline

From the pipeline root:
nextflow run . \
  --input samplesheet.csv \
  --fasta data/chr17.fa \
  --outdir results \
  --duplex_seq true \
  -profile test,docker \
  -with-trace \
  -with-report results/report.html

Parameters

FlagPurpose
--input samplesheet.csvValidated manifest
--fasta data/chr17.faLocal reference
--duplex_seq trueEnable duplex consensus
-profile test,dockerUse test config + containers
-with-traceNextflow-native trace (optional)
-with-report results/report.htmlHTML execution report

4. Live Visualization: Tracer Dashboard During Execution

With the nf-core/fastquorum pipeline launched and Tracer’s eBPF agent actively streaming OS-level events, the Tracer Sandbox Dashboard becomes a real-time observability cockpit. No polling, no logs — just continuous, kernel-level telemetry delivered via WebSocket every 2 seconds.

Dashboard Entry Point: Run Overview

Upon launching nextflow run ., a new run card appears instantly: Run Overview Card:
  • Run Name: run_1
  • Status: Running (blue dot)
  • Elapsed: 45s and counting
  • Max RAM: 12 / 100% → 12 GB peak (of 16 GB available)
  • Avg. CPU: 36 / 100% → 36% average across 4 cores
  • Disk I/O: 17 / 100% → 17% of max bandwidth
Run Overview Snapshot Fig 3: Run Overview Snapshot Compact Summary
This compact summary is the first signal that Tracer has auto-detected the Nextflow executor and attached to all child processes — no -with-trace or config changes needed. The progress bar fills as tasks complete, and resource meters update in real time.

System Specs & Cost Panel

MetricValueStatus
RAM2.97 GB used / 15.62 GBHEALTHY
CPU1.81 cores / 4 coresHEALTHY
DISK42.90 GB / 207.35 GBHEALTHY
GPUNot detected
TOTAL COST$0.00Free tier (Codespaces)
System Specs & Cost Panel
This panel confirms the GitHub Codespaces environment: a 4-core, 16 GB VM with ample headroom. The cost meter at $0.00 reflects that this is a non-billable sandbox run, but in production (e.g., AWS EC2), Tracer would estimate hourly cost based on instance type and utilization.

Tool Table: Real-Time Process Monitoring

Table Observations:
  • bwa index is still running — expected: indexing chr17.fa (~80MB) is CPU-heavy
  • FastQC hit 118% CPU → Java thread burst (common in multi-threaded mode)
  • samtools faidx is I/O-light — just reads the FASTA once
  • Status badges update live: Running → Success as tasks finish
Visual Insights:
  • Critical path: bwa index → FastqToBam → GroupReadsByUmi
  • Parallelism: samtools faidx and dict run concurrently with FastQC
  • Tail latency: Final MultiQC runs alone
This Gantt view is interactive — hover to see exact command, stdout, and resource curve.
ToolStatusRuntimeMax RAMMax CPUMax Disk I/O
bwa indexRunning9s 851ms0.12 GB115.49%0.04 GB
samtools faidxSuccess482ms0.00 GB38.10%0.00 GB
samtools dictSuccess1s 111ms0.08 GB54.63%0.08 GB
FastQCSuccess5s 775ms0.30 GB118.23%0.01 GB
fgbio FastqToBamSuccess4s 813ms0.14 GB120.60%0.00 GB
Timeline view Table and visual insights for the tools running in pipeline at real-time Fig 4: Table (detailed) and visual insights for the tools running in pipeline at real-time CPU Usage:
  • Avg: 91.4%
  • Max: 115.5% (burst during bwa index)
  • Pattern: High at start (indexing), drops to ~70% during alignment
Memory Usage:
  • Avg: 99.8 MB
  • Max: 121.5 MB
  • Spike at 6s: fgbio FastqToBam loads both FASTQs into memory
Disk I/O:
  • Avg: 0.08 GB
  • Max: 0.18 GB
  • Burst at 40s: Writing intermediate BAM files
Network I/O:
  • Avg: 81.42 MB
  • Max: 180.80 MB
  • Cause: Docker pulling nf-core/fastquorum:1.2.0 layers (first run)
CPU, Memory, Disk, Network Over Time Metrics Over Time 2 Fig 5,6: System level trend

5. Post-Run Analysis: Resource Heatmap & Bottleneck Detection

The pipeline completes in 1m 36s with 12 successful tasks. Now we analyze the full trace.

Resource Analysis

ProcessCPU (avg)RAM (peak)I/O (total)Duration
BWAMEM1_INDEX95%1.4 GB180 MB53s
GROUPREADSBYUMI99%3.1 GB42 MB24s
CALLDDUPLEXCONSENSUS60%1.8 GB28 MB16s
FASTQTOBAM75%1.2 GB35 MB18s

Key Insights

Critical Path Identified

BWAMEM1_INDEX (53s) is the bottleneck — accounts for 55% of total runtime

Memory Spike

GROUPREADSBYUMI peaks at 3.1 GB — consider increasing memory allocation for larger datasets

CPU Efficiency

Most processes utilize >75% CPU — good parallelization

I/O Optimization

Total I/O: 285 MB — minimal disk bottleneck detected

6. Conclusion

In the fast-evolving landscape of bioinformatics, where pipelines demand precision amid mounting computational complexity, Tracer emerges as an indispensable ally for bioinformaticians seeking deeper, actionable insights without the burden of invasive instrumentation.

Key Benefits

By harnessing eBPF technology at the operating system level, Tracer delivers:
  • Real-time observability into every facet of your workflows (Nextflow, WDL, Bash, or CWL)
  • Automatic detection of hangs, crashes, and silent failures that traditional logs often overlook
  • One-minute setup with zero code modifications

Real-World Impact

Imagine pinpointing the exact genome file or tool process causing a crash in a duplex sequencing run, or uncovering memory oversizing in dependency updates that could shave weeks off troubleshooting.
Tracer excels in resource orchestration, spotlighting inefficiencies like redundant I/O in alignment steps or overprovisioned instances.
AI-driven recommendations enable right-sizing of compute environments in mere clicks, potentially slashing costs by 30% or more on cloud platforms, paying only 5% of your pipeline’s compute expenses without upfront fees.

Your Next Steps

For bioinformaticians juggling high-throughput NGS data, evolving dependencies, and the pressure to derive reproducible insights from vast datasets, Tracer isn’t just a monitoring tool — it’s a superpower that shifts focus from infrastructure headaches to scientific discovery, fostering scalable, cost-effective workflows that accelerate breakthroughs in genomics, proteomics, and beyond.

Try Tracer Sandbox

Dive into the Tracer sandbox today and experience how effortless observability can redefine your pipeline mastery.