Skip to main content
Tracer organizes execution data into a small set of core entities that reflect how real workloads run: runs, tasks, tools, containers, and hosts. This data model allows Tracer to map low-level execution signals to the way teams reason about pipelines and infrastructure, without relying on workflow metadata, logs, or application instrumentation. This page describes those entities and how they relate to each other.

Overview

At a high level:
  • Tracer/collect observes execution events at the operating system level
  • These events are correlated into structured entities
  • Higher-level products (Tracer/tune and Tracer/sweep) operate on this shared model

Workflow-agnostic

Works with any orchestrator or scheduler

Stable

Consistent across environments

Expressive

Represents complex, multi-process execution

Core entities

Runs

A run represents a single execution of a pipeline or workload. A run typically corresponds to:
  • A workflow execution (for example, a Nextflow or Snakemake run)
  • A batch job or experiment
  • A repeated invocation of the same pipeline configuration
Runs provide the top-level boundary for grouping execution data and comparing behavior across executions.

Tasks

A task represents a logical unit of work within a run. Tasks often correspond to:
  • Workflow steps or processes
  • Batch jobs or array jobs
  • Scheduled units of execution
A task may:
  • Run on one or multiple hosts
  • Execute sequentially or in parallel
  • Spawn multiple tools and subprocesses
Tasks are the primary unit used for performance comparison and tuning.

Tools

A tool represents an executable program invoked during a task. Examples include:
  • Native binaries (for example, bwa, samtools)
  • Interpreters and scripts (python, bash)
  • JVM-based tools
  • Short-lived helper binaries and child processes
Tracer identifies tools based on observed process execution, not logs or configuration. Even tools that produce no logs are captured as first-class entities.

Containers

A container represents an execution context defined by container runtimes or Linux namespaces. Containers:
  • Group related processes
  • Provide isolation boundaries
  • May contain multiple tools and subprocesses
Tracer does not require containers to be present, but when they are used, container context is preserved and reflected in the data model.

Hosts

A host represents a physical or virtual machine where execution occurs. Hosts include:
  • Cloud instances (for example, EC2)
  • On-premises nodes
  • Batch or HPC worker nodes
Host-level data provides the infrastructure context needed to understand scheduling behavior, resource contention, and idle time.

Relationships between entities

The entities form a hierarchy:
  • A run contains one or more tasks
  • A task invokes one or more tools
  • Tools execute within a container or directly on a host
  • All execution ultimately occurs on a host
This structure allows Tracer to:
  • Attribute resource usage accurately
  • Compare behavior across runs and tasks
  • Correlate infrastructure behavior with pipeline execution

How correlation works

Tracer correlates execution events using identifiers exposed by the operating system, including:
  • Process IDs and parent–child relationships
  • Cgroups and namespaces
  • Container runtime metadata (when available)
This correlation happens automatically and does not require:
  • Workflow engine integration
  • Application instrumentation
  • Explicit tagging
The result is a consistent execution model across heterogeneous environments.

What the data model enables

This data model is the foundation for Tracer’s higher-level capabilities. It enables:
  • Execution timelines organized by run, task, and tool
  • Resource usage attribution at meaningful boundaries
  • Detection of idle execution and contention
  • Cost attribution aligned with real execution behavior
  • Cross-run comparison and regression detection
Tracer/tune and Tracer/sweep operate on this shared structure rather than raw telemetry.

What the data model does not represent

The data model intentionally excludes:
  • Application payloads or scientific input/output data
  • Source code, function calls, or language-level execution traces
  • Domain-specific semantics or correctness
Tracer models how workloads execute, not what they compute.

Orchestrator terminology mapping (reference)

Tracer’s data model is framework- and language-agnostic. The table below shows how Tracer entities typically align with common orchestrator concepts. Exact mappings may vary by workflow engine and configuration.

Run

Workflow run, DAG run, execution

Task

Process, step, task, op, node

Tool

Binary, script, container entrypoint

Container

Pod, container, namespace

Host

Worker node, instance, executor host
Tracer conceptCommon equivalents
RunWorkflow run, DAG run, execution
TaskProcess, step, task, op, node
ToolBinary, script, container entrypoint
ContainerPod, container, namespace
HostWorker node, instance, executor host
This mapping is provided for orientation only. Tracer does not depend on orchestrator metadata to build its execution model.

When to read this page

This page is most useful if you:
  • Want to understand how Tracer structures execution data
  • Are integrating Tracer data into external systems
  • Need clarity on attribution boundaries and terminology
  • Are evaluating Tracer for complex or regulated environments