Data model

Tracer organizes execution data into a small set of core entities that reflect how real workloads run: runs, tasks, tools, containers, and hosts. This data model allows Tracer to map low-level execution signals to the way teams reason about pipelines and infrastructure, without relying on workflow metadata, logs, or application instrumentation. This page describes those entities and how they relate to each other.

Overview

At a high level:

Tracer/collect observes execution events at the operating system level
These events are correlated into structured entities
Higher-level products (Tracer/tune and Tracer/sweep) operate on this shared model

Workflow-agnostic

Works with any orchestrator or scheduler

Stable

Consistent across environments

Expressive

Represents complex, multi-process execution

Core entities

Runs

A run represents a single execution of a pipeline or workload. A run typically corresponds to:

A workflow execution (for example, a Nextflow or Snakemake run)
A batch job or experiment
A repeated invocation of the same pipeline configuration

Runs provide the top-level boundary for grouping execution data and comparing behavior across executions.

Tasks

A task represents a logical unit of work within a run. Tasks often correspond to:

Workflow steps or processes
Batch jobs or array jobs
Scheduled units of execution

A task may:

Run on one or multiple hosts
Execute sequentially or in parallel
Spawn multiple tools and subprocesses

Tasks are the primary unit used for performance comparison and tuning.

Tools

A tool represents an executable program invoked during a task. Examples include:

Native binaries (for example, bwa, samtools)
Interpreters and scripts (python, bash)
JVM-based tools
Short-lived helper binaries and child processes

Tracer identifies tools based on observed process execution, not logs or configuration. Even tools that produce no logs are captured as first-class entities.

Containers

A container represents an execution context defined by container runtimes or Linux namespaces. Containers:

Group related processes
Provide isolation boundaries
May contain multiple tools and subprocesses

Tracer does not require containers to be present, but when they are used, container context is preserved and reflected in the data model.

Hosts

A host represents a physical or virtual machine where execution occurs. Hosts include:

Cloud instances (for example, EC2)
On-premises nodes
Batch or HPC worker nodes

Host-level data provides the infrastructure context needed to understand scheduling behavior, resource contention, and idle time.

Relationships between entities

The entities form a hierarchy:

A run contains one or more tasks
A task invokes one or more tools
Tools execute within a container or directly on a host
All execution ultimately occurs on a host

This structure allows Tracer to:

Attribute resource usage accurately
Compare behavior across runs and tasks
Correlate infrastructure behavior with pipeline execution

How correlation works

Tracer correlates execution events using identifiers exposed by the operating system, including:

Process IDs and parent–child relationships
Cgroups and namespaces
Container runtime metadata (when available)

This correlation happens automatically and does not require:

Workflow engine integration
Application instrumentation
Explicit tagging

The result is a consistent execution model across heterogeneous environments.

What the data model enables

This data model is the foundation for Tracer’s higher-level capabilities. It enables:

Execution timelines organized by run, task, and tool
Resource usage attribution at meaningful boundaries
Detection of idle execution and contention
Cost attribution aligned with real execution behavior
Cross-run comparison and regression detection

Tracer/tune and Tracer/sweep operate on this shared structure rather than raw telemetry.

What the data model does not represent

The data model intentionally excludes:

Application payloads or scientific input/output data
Source code, function calls, or language-level execution traces
Domain-specific semantics or correctness

Tracer models how workloads execute, not what they compute.

Orchestrator terminology mapping (reference)

Tracer’s data model is framework- and language-agnostic. The table below shows how Tracer entities typically align with common orchestrator concepts. Exact mappings may vary by workflow engine and configuration.

Run

Workflow run, DAG run, execution

Task

Process, step, task, op, node

Tool

Binary, script, container entrypoint

Container

Pod, container, namespace

Host

Worker node, instance, executor host

Tracer concept	Common equivalents
Run	Workflow run, DAG run, execution
Task	Process, step, task, op, node
Tool	Binary, script, container entrypoint
Container	Pod, container, namespace
Host	Worker node, instance, executor host

This mapping is provided for orientation only. Tracer does not depend on orchestrator metadata to build its execution model.

When to read this page

This page is most useful if you:

Want to understand how Tracer structures execution data
Are integrating Tracer data into external systems
Need clarity on attribution boundaries and terminology
Are evaluating Tracer for complex or regulated environments

Tracer/collect
Execution capture details

Tracer/tune
Optimization and analysis

Tracer/sweep
Cloud waste discovery

Getting started

Key Use Cases

Tutorials

Frameworks

How Tracer fits in your stack

Technology

Deployment Environments

Overview

Workflow-agnostic

Stable

Expressive

Core entities

Runs

Tasks

Tools

Containers

Hosts

Relationships between entities

How correlation works

What the data model enables

What the data model does not represent

Orchestrator terminology mapping (reference)

Run

Task

Tool

Container

Host

When to read this page

Getting started

Key Use Cases

Tutorials

Frameworks

How Tracer fits in your stack

Technology

Deployment Environments

​Overview

Workflow-agnostic

Stable

Expressive

​Core entities

​Runs

​Tasks

​Tools

​Containers

​Hosts

​Relationships between entities

​How correlation works

​What the data model enables

​What the data model does not represent

​Orchestrator terminology mapping (reference)

Run

Task

Tool

Container

Host

​When to read this page

Overview

Core entities

Runs

Tasks

Tools

Containers

Hosts

Relationships between entities

How correlation works

What the data model enables

What the data model does not represent

Orchestrator terminology mapping (reference)

When to read this page