Why We're Building Tracer: The Future of Scientific Computing Observability
More

Running NVIDIA Parabricks FQ2BAM on AWS with Nextflow (GPU Instances)

A comprehensive, hands-on tutorial covering setup, execution, validation, benchmarking, and best practices for GPU-accelerated FASTQ-to-BAM processing in the cloud.

const metadata = ; Introduction High-throughput DNA sequencing generates large volumes of raw data that must be processed before analysis. This step is computationally demanding and can be slow when run on traditional hardware as we transform the raw and messy data to organized and usable data. Luckily, NVIDIA Parabricks uses GPU acceleration which speeds up this process about 5 to 30 times, while AWS provides on-demand access to the required compute resources. This guide introduces how Parabricks FQ2BAM can be run on AWS using Nextflow to reliably convert raw FASTQ files into analysis-ready BAM files. How to Use This Guide: - This guide is in cheat sheet format with self-contained command-line snippets. - Jump to any section that is relevant to the task you are trying to complete. By the end of this tutorial, you will have: - Launched an AWS EC2 instance with a suitable GPU and configured it for Parabricks. - Pulled the NVIDIA Parabricks container for FQ2BAM and tested it on example data. - Installed Nextflow and set up a Nextflow pipeline for FQ2BAM. - Executed the FASTQ-to-BAM pipeline on the GPU instance, producing a BAM file and index. - Learned best practices for verifying results and troubleshooting common issues. Let's get started! Prerequisites In order to follow along with this guide, you need to meet the following prerequisites: - AWS Account: You have an AWS account with permissions to launch EC2 instances. - Local system: You have a Unix-like environment (Linux or macOS) for any local commands. If you're on Windows, using WSL2 or an EC2 Instance Connect web terminal is recommended for a Linux-like shell. (All commands in this tutorial are for a Linux/macOS Bash shell.) - Software prerequisites on the AWS instance: We will use an official AWS Deep Learning AMI that comes pre-installed with NVIDIA drivers and Docker, including the NVIDIA Container Toolkit. Thus, you don't need to manually install GPU drivers. However, we will need to install Java (for Nextflow) and Nextflow itself on the instance. - Data prerequisites: If you plan to use your own FASTQ files and reference genome, have them ready or know their S3 paths to transfer. If you do not have your own FASTQ files, you can download a [small public sample dataset provided by NVIDIA](https://docs.nvidia.com/clara/parabricks/latest/tutorials/cloudguides/aws.html#:~:text=Copied%21) to demonstrate the process. This guide focuses on AWS EC2 usage. If you instead want to use AWS Batch or AWS HealthOmics, the overall steps are similar (spin up GPU compute and run the container), but the setup differs. Here we'll do the process on a single GPU VM for simplicity, which can be adapted to other AWS services later. Glossary A specialized processor designed for parallel computation, capable of handling thousands of tasks simultaneously. GPUs excel at high-throughput workloads such as AI model training, image processing, and complex simulations. In cloud pipelines, they accelerate data-intensive stages like computation-heavy analyses or machine learning model execution, significantly reducing processing time. NVIDIA Parabricks is a GPU-accelerated bioinformatics software suite designed to dramatically speed up common NGS workflows. It follows the same underlying algorithms and best practices as standard CPU-based tools, but executes them using GPU parallelism to reduce runtime. FQ2BAM is a GPU-accelerated pipeline from NVIDIA's Parabricks suite that takes raw sequencing reads (FASTQ files) and produces an aligned, analysis-ready BAM file. Under the hood, FQ2BAM runs an optimized version of the BWA-MEM aligner and performs post-alignment processing such as coordinate sorting, duplicate marking, and optional Base Quality Score Recalibration (BQSR). The result is a sorted BAM file (including an index) and a BQSR report ready for downstream variant calling. The software is distributed as containers and integrates with workflow engines such as Nextflow, allowing it to be incorporated into existing pipelines on local systems or cloud platforms. AWS offers you access to powerful GPU instances on-demand, which you can scale up to multiple machines if needed. AWS offers GPU instance types like NVIDIA T4, V100, A10, or A100 GPUs, which are all compatible with [Parabricks](https://docs.nvidia.com/clara/parabricks/4.2.1/gettingstarted.html#:~:text=,on%20the%20following%20NVIDIA%20GPUs). Using Nextflow as the workflow orchestrator assures management of the pipeline execution. It handles task scheduling, container execution, and can easily integrate FQ2BAM into larger workflows (for example, chaining alignment to subsequent variant-calling steps). In short, Nextflow in combination with AWS GPUs gives you speed, reproducibility, and scalability in one solution. Now that we are all up to speed, let's dive into the tutorial. Launching a GPU EC2 Instance on AWS First, we need to launch an EC2 virtual machine with an NVIDIA GPU. We recommend using a GPU-optimized Amazon Machine Image (AMI) that already includes NVIDIA drivers and Docker support. AWS provides [Deep Learning AMIs](https://aws.amazon.com/ai/machine-learning/amis/) that meet the following requirements, therefore make sure your instance has the following specifications: - Choose an AMI with GPU support: In the AWS EC2 console, click "Launch Instance" and under _Application and OS Images (Amazon Machine Image)_, search for "Deep Learning AMI". Select a recent Deep Learning AMI (Ubuntu is a good choice) which comes with CUDA drivers pre-installed. For example, _Deep Learning AMI GPU-Optimized Ubuntu 20.04_. → This saves us from manually installing NVIDIA drivers. - Select an instance type with a GPU: For this tutorial, a g4dn.4xlarge instance is a suitable choice. This instance type provides 1 NVIDIA T4 GPU (16 GB GPU memory), 16 vCPUs, and 64 GB of RAM. It offers a good balance of cost and performance for testing FQ2BAM. _(Alternatively, you could use a p3 instance (V100 GPU) or g5 instance (A10G GPU), any GPU with at least 16 GB memory is fine.)_ - Configure storage: Genomic data can be large, so allocate sufficient disk space. We recommend at least 200 GB of storage for the root volume. If you plan to process whole genome data or keep intermediate files, consider 500 GB as in NVIDIA's recommendation. In the launch wizard, under _Configure Storage_, increase the root volume size (e.g., 200 GiB or more). - Security Group: Ensure the instance has a security group that allows you to connect (at least SSH on port 22). If using AWS Instance Connect, the default security group with SSH open is sufficient. - Key Pair / Access: Choose an SSH key pair to access the instance, or opt to use EC2 Instance Connect (no key required). For simplicity, you can proceed without a key pair and use EC2 Instance Connect through the Console to open a web-based terminal. If you do use a key pair, download it and note the path for SSH. Launch the instance: Review and launch. It may take a few minutes for the instance to be in running state. Once it's up, connect to it either via the AWS Console's "Connect > EC2 Instance Connect" option [docs.nvidia.com](https://docs.nvidia.com/clara/parabricks/latest/tutorials/cloudguides/aws.html#:~:text=Click%20on%20the%20checkbox%20next,pair.%20Click%20connect) or using your SSH client: `bash Example SSH if using a key pair (replace values with your details) ssh -i /path/to/your-key.pem ubuntu@ec2-.compute.amazonaws.com ` The default username for the Ubuntu AMI is ubuntu. If you used a different AMI (Amazon Linux, etc.), the user name may differ (e.g., ec2-user for Amazon Linux). Once connected, you should have a shell on the EC2 instance. We will perform all subsequent steps on this EC2 instance. GPU instances are expensive. For learning purposes, consider using a smaller GPU like the T4 (g4dn) as we chose. Remember to shut down or terminate your EC2 instance when you finish the tutorial to avoid ongoing charges. If you have difficulty remembering what instances are open and running, check out the [Tracer monitoring platform](https://www.tracer.cloud/technology). Setting Up the Parabricks Environment on AWS Now that your EC2 instance is running, we need to prepare it for deploying Parabricks FQ2BAM. The major tasks are: 1) Verifying the GPU and Docker setup 2) Fetching the Parabricks container 3) Getting the input data (reference genome and FASTQ files) 1) Verifying GPU and Docker Setup The Deep Learning AMI should already have NVIDIA drivers and Docker installed. Let's double-check that everything is working: Check GPU availability: Run the NVIDIA System Management Interface (nvidia-smi) to ensure the GPU is visible: `bash nvidia-smi ` You should see a table of GPU information. For example, a T4 GPU will show up with 16,384 MiB memory. If nvidia-smi fails, the NVIDIA drivers might not be installed, in that case, you may need to install them or confirm you used the correct AMI. Verify Docker can access the GPU: We run a test container that prints GPU info: `bash docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu20.04 nvidia-smi ` This command will download a small CUDA base image and execute nvidia-smi inside it. It should output the same GPU info as above, confirming that Docker is configured with NVIDIA Container Toolkit. If this fails, ensure that the NVIDIA Docker runtime is installed. On the Deep Learning AMI, it should be pre-configured. Otherwise, you might need to install nvidia-docker2 and restart Docker. (Optional) Update packages: It's generally a good idea to update the instance's packages. For Ubuntu, run: `bash sudo apt-get update && sudo apt-get upgrade -y ` This isn't strictly required for running Parabricks, but helps ensure you have the latest security updates and tools. At this point, we have a working GPU-enabled Docker environment on the EC2 instance. 2) Pulling the Parabricks FQ2BAM Container NVIDIA distributes Parabricks as a Docker container, hosted on NVIDIA's container registry. We'll download the latest Parabricks image which contains all tools including FQ2BAM. 2.1 Get the container image name According to NVIDIA's documentation, the Parabricks Docker image is nvcr.io/nvidia/clara/clara-parabricks:. For example, as of this writing the latest version is 4.6.0-1. (You can check the NGC catalog for updates on [Github](https://github.com/clara-parabricks-workflows/parabricks-nextflow#:~:text=Input%20File%20Purpose%20,file%20with%20the%20workflow%20definition), but we'll use 4.6.0-1 here.) 2.2 Pull the Docker image Run the docker pull command on the instance: `bash docker pull nvcr.io/nvidia/clara/clara-parabricks:4.6.0-1 ` This will download ~8-10 GB of data (the container includes the GPU-accelerated binaries and some dependencies). It may take a while depending on your network speed. After completion, verify the image is available: `bash docker images | grep parabricks ` You should see a listing for clara-parabricks 4.6.0-1 (or similar) in the output, confirming the image is ready. Now Parabricks is essentially "installed" on this machine (since all tools are within the container). We can use the pbrun command via Docker to run any Parabricks pipeline. The Parabricks container does not require a license file or NGC login for basic usage, it's free to pull and run for the versions we're using. Ensure you have a compatible NVIDIA GPU; otherwise, the tools will not function. 2.3 Check Parabricks version (optional) You can quickly check that the container works by running a simple help command: `bash docker run --rm --gpus all nvcr.io/nvidia/clara/clara-parabricks:4.6.0-1 pbrun --version ` When Parabricks version is shown, it confirms that the pbrun entrypoint is working. 3) Downloading Sample Data (FASTQ & Reference) To demonstrate how FQ2BAM adds value, we need input data: typically a reference genome FASTA and a pair of FASTQ files (for paired-end reads). NVIDIA provides a small sample dataset that we can use to test our setup, which includes a tiny reference and example reads. NVIDIA hosts a sample dataset tarball on S3. We'll download and extract it: `bash wget -O parabricks_sample.tar.gz "https://s3.amazonaws.com/parabricks.sample/parabricks_sample.tar.gz" tar -xzvf parabricks_sample.tar.gz ` This will create a directory called parabricks_sample/ with subfolders for reference and reads. Let's take a look at its structure: `bash tree -L 2 parabricks_sample ` You should see something like: ` parabricks_sample/ ├── Data │ ├── sample_1.fq.gz │ └── sample_2.fq.gz └── Ref ├── Homo_sapiens_assembly38.fasta ├── Homo_sapiens_assembly38.fasta.fai ├── Homo_sapiens_assembly38.dict ├── Homo_sapiens_assembly38.known_indels.vcf.gz └── Homo_sapiens_assembly38.known_indels.vcf.gz.tbi ` In this sample: - sample_1.fq.gz and sample_2.fq.gz are paired-end FASTQ files (a small excerpt of a human genome sequencing run). - The Ref/ directory contains a small reference genome (chr22 of GRCh38) and associated index files (.fai and .dict), plus a known indels VCF for BQSR. The reference is named _Homo_sapiens_assembly38.fasta_, a subset of the full reference. - The known indels VCF (Mills & 1000G indels for chr22) is provided, along with its index (.tbi). This can be used for BQSR in the pipeline. We will use these files as inputs to FQ2BAM. If you have your own data, you would replace these with your FASTQ files and a reference genome appropriate for your data. Note the absolute paths of these files on the instance (e.g., /home/ubuntu/parabricks_sample/Data/sample_1.fq.gz etc.), as we'll need to configure them in the Nextflow pipeline later. At this stage, our environment is ready: we have the Parabricks container and some test data on a running GPU instance. Next, we'll do a quick manual test of FQ2BAM via Docker (to ensure everything runs), and then set up Nextflow for a more automated execution. Quick Test: Running FQ2BAM via Docker (Baseline) Before jumping into Nextflow, it's wise to do a quick sanity check by running FQ2BAM directly with Docker. This will verify that our data and container work as expected. Consider this a "baseline" manual run, which we'll later automate with Nextflow. Use the pbrun fq2bam command within the container to run the pipeline. We need to mount the working directory so the container can see our data. Run the following command from the directory containing parabricks_sample/: `bash docker run --rm --gpus all \ -v $PWD:$PWD -w $PWD/parabricks_sample \ nvcr.io/nvidia/clara/clara-parabricks:4.6.0-1 \ pbrun fq2bam \ --ref Ref/Homo_sapiens_assembly38.fasta \ --in-fq Data/sample_1.fq.gz Data/sample_2.fq.gz \ --out-bam fq2bam_output.bam \ --out-recal-file fq2bam_output.recal.txt \ --no-sec ` Let's break down what this does: - -v $PWD:$PWD: Mount the current directory into the container at the same path. This allows the container to access the parabricks_sample subfolder (since our working dir is mounted, all subpaths are accessible). - -w $PWD/parabricks_sample: Set the container's working directory to the parabricks_sample folder. This means inside the container, paths like Ref/... will be relative to /workspace/parabricks_sample (or whatever $PWD is mapped to). We do this for convenience so we don't have to specify full absolute paths in the pbrun command. - The container image and --gpus all flag ensure we use the GPU. - pbrun fq2bam --ref ... --in-fq ... --out-bam ...: This is invoking the FQ2BAM pipeline. We provide the reference FASTA (with path relative to the working directory in container), the two input FASTQ files, and an output BAM name. We can also specify an optional, but best practice, output recalibration report (--out-recal-file) to generate a BQSR report. - --no-sec: This disables "soft error checking". It's an optional flag to slightly speed up the run by skipping certain validations. We include it here just to adhere to Parabricks recommended usage for speed. It's not strictly required. Expected outcome: The first time you run this, the container will do some initialization and then print a Parabricks banner and progress meter. You should see log lines indicating BWA-MEM starting, and a progress meter counting reads aligned. For the small sample, this should finish in a minute or two. If everything goes well, you'll get an output file fq2bam_output.bam in the parabricks_sample directory, along with its index (fq2bam_output.bam.bai) and the recalibration report fq2bam_output.recal.txt. This manual run should confirm that FQ2BAM works on our setup. For instance, Parabricks may output a message about GPU memory. If you used a T4 (16GB), Parabricks might warn that the default mode requires more memory. In fact, FQ2BAM by default expects ~38 GB of GPU memory; using --low-memory falls back to ~16 GB requirements. If you see a warning or the run fails due to memory, re-run the command adding --low-memory to the pbrun fq2bam arguments. This will allow it to run on a 16GB GPU at a small performance cost. Our Docker one-liner worked, but imagine scaling this up or adding steps: you would have to manage paths, parallelize across samples, and handle outputs manually. Nextflow will automate these aspects. The quick test assures us the container and data are correct; now we can confidently move to the Nextflow orchestration. Installing Nextflow and Preparing the Pipeline Nextflow will coordinate the execution of the FQ2BAM container, handle input/output management, and make it easy to extend or rerun the pipeline. In this section, we'll install Nextflow on the EC2 instance, obtain the Nextflow workflow script for FQ2BAM, and configure it with our file paths. 1) Installing Java Nextflow runs on the Java Virtual Machine, so we need to install Java version 17 or later, as of Nextflow 23.X. The Deep Learning AMI might not have Java by default. You can check by running: `bash java -version ` If this returns "command not found" or an older version (less than 17), install Java. On Ubuntu, one easy option is: `bash sudo apt-get install -y openjdk-17-jre ` This will install Java 17 (JRE). To verify run java -version to ensure it's properly installed. Nextflow requires at least Java 17 as of the latest versions. If you have an older Java (8 or 11) installed, upgrade it to avoid runtime errors. 2) Installing Nextflow Installing Nextflow itself is straightforward using their self-installing script: `bash Download Nextflow installer curl -s https://get.nextflow.io | bash ` This will create a file named nextflow in your current directory. Next, make it executable and move it into your PATH: `bash chmod +x nextflow sudo mv nextflow /usr/local/bin/ ` Make sure that /usr/local/bin is in your PATH, which it should be on Ubuntu by default. Alternatively, you can keep the nextflow binary in your home directory and call it with ./nextflow. Confirm the installation: `bash nextflow -version ` You should see Nextflow's version and build info. Now we're ready to use Nextflow in our EC2 instance. 3) Obtaining the FQ2BAM Nextflow Workflow NVIDIA has provided example Nextflow pipelines for Parabricks tools. We will use their FQ2BAM workflow as a starting point instead of writing one from scratch. The code is available in a public GitHub repository [clara-parabricks-workflows/parabricks-nextflow](https://github.com/clara-parabricks-workflows/parabricks-nextflow#:~:text=The%20Parabricks%20fq2bam%20tool%20is,BQSR), which contains the pipeline script fq2bam.nf and related config files. We have a couple of options to get these files onto our instance: Option 1: Git clone the repository. If git is available on your instance, you can run: `bash git clone https://github.com/clara-parabricks-workflows/parabricks-nextflow.git ` This will create a parabricks-nextflow directory with subfolders - nextflow/ - config/ - example_inputs/. The FQ2BAM workflow script is likely at nextflow/fq2bam.nf, and there's a config/local.nf.conf, and an example_inputs/test.fq2bam.json, which we'll confirm in a moment. Option 2: Download specific files. If you prefer not to clone the entire repo, you could use wget to fetch raw files from GitHub. For example: `bash wget https://raw.githubusercontent.com/clara-parabricks-workflows/parabricks-nextflow/main/nextflow/fq2bam.nf ` `bash wget https://raw.githubusercontent.com/clara-parabricks-workflows/parabricks-nextflow/main/config/local.nf.conf ` `bash wget https://raw.githubusercontent.com/clara-parabricks-workflows/parabricks-nextflow/main/example_inputs/test.fq2bam.json ` The URLs may need updating if the repo moves, but these are the general paths. No matter how we obtain the FQ2BAM Nextflow workflow, once inside this directory, familiarize yourself with the key files: - nextflow/fq2bam.nf: The Nextflow script that defines the FQ2BAM workflow (i.e., it will call pbrun fq2bam inside a Docker process with the given inputs). You don't need to modify this for basic usage, but you can open it in a text editor if you're curious. - config/local.nf.conf: A Nextflow config file that specifies the Docker container to use and any runtime settings. In this file, it defines the Docker image for Parabricks and likely adds the --gpus all flag for Docker. This is where the Parabricks container is defined, pointing to NGC's latest image. We will use this config as-is. - example_inputs/test.fq2bam.json: A JSON file containing parameter values for the workflow, such as the paths to input FASTQs, reference, known sites VCF, and desired outputs. → We will need to edit this file to point to our data's paths on the EC2 instance. 4) Configuring Input Paths Open the example_inputs/test.fq2bam.json file in a text editor (e.g., nano example_inputs/test.fq2bam.json). This file maps the expected pipeline parameters to actual file paths. It should look something like: `json ` _(The actual keys might differ slightly, but the idea is the same.)_ Now, replace these placeholder paths with the real absolute paths of our sample data on the instance: FASTQ files: Our sample_1.fq.gz and sample_2.fq.gz are located in ~/parabricks_sample/Data/. If you followed earlier steps, and assuming your home directory is /home/ubuntu, the full paths would be: `bash /home/ubuntu/parabricks_sample/Data/sample_1.fq.gz ` `bash /home/ubuntu/parabricks_sample/Data/sample_2[.fq.gz](http://.fq.gz) ` Put these in the reads_left and reads_right fields (left could be R1, right is R2). Use the absolute path, as the JSON expects this. Reference FASTA: The reference is Homo_sapiens_assembly38.fasta under parabricks_sample/Ref/. The absolute path: `bash /home/ubuntu/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta ` Set that for the reference fasta field. Known sites VCF (optional): The Parabricks FQ2BAM can produce a BQSR report if a known indels/sites VCF is provided. In our sample, we have Homo_sapiens_assembly38.known_indels.vcf.gz. If the Nextflow pipeline is configured to do BQSR, there will be a parameter for knownSites VCF. Put the path to the VCF: `bash /home/ubuntu/parabricks_sample/Ref/Homo_sapiens_assembly38.known_indels.vcf.gz ` And likely also a path for its index (.tbi). Some pipelines might ask for it or infer it. If there's a separate parameter for the index, provide the .tbi file path as well. Output paths: Decide where you want the output BAM and its index and BQSR report to be saved. You can output them to the parabricks_sample directory or anywhere on the instance. For simplicity, let's put them in /home/ubuntu/parabricks_sample/ as well: - e.g., "output_bam": "/home/ubuntu/parabricks_sample/fq2bam_nextflow.bam" - "output_bai": "/home/ubuntu/parabricks_sample/fq2bam_nextflow.bam.bai" - "output_bqsr": "/home/ubuntu/parabricks_sample/fq2bam_nextflow.recal.txt" - Ensure these paths are writable (using your home directory is fine). After editing, save the JSON file. Double-check that all paths are correct and exist for inputs. Use absolute paths only, as instructed in the repo, to avoid any container path issues. Now our pipeline configuration is set. We have: - local.nf.conf specifying to use the Parabricks Docker image and enabling GPU usage. - test.fq2bam.json with our input and output paths. - The Nextflow script fq2bam.nf ready to go. Running the FQ2BAM Pipeline with Nextflow Great job! Everything is configured, let's execute the pipeline using Nextflow. We will run Nextflow on the EC2 instance using its local executor with Docker. Since our local.nf.conf is set up for local execution with Docker, this will effectively spin up the Parabricks container as a process within Nextflow. Navigate to the repository directory. The example command given in NVIDIA's repo for running FQ2BAM is: `bash nextflow run -c config/local.nf.conf -params-file example_inputs/test.fq2bam.json nextflow/fq2bam.nf ` Let's run that ensuring we are in the parabricks-nextflow base directory: `bash cd ~/parabricks-nextflow if not already there nextflow run \ -c config/local.nf.conf \ -params-file example_inputs/test.fq2bam.json \ nextflow/fq2bam.nf ` Here's what each part means: - -c config/local.nf.conf: This tells Nextflow to use our custom config, which points to the correct Docker image and adds necessary Docker options (like --gpus all). This ensures the workflow uses the Parabricks container. - -params-file example_inputs/test.fq2bam.json: This provides all our input parameter values from the JSON file, so we don't have to specify each --param individually. It makes the command cleaner and less error-prone. - nextflow/fq2bam.nf: This is the path to the Nextflow script. In our case, the script is in the nextflow/ subfolder. What to expect while running this pipeline When you run this command, Nextflow will initialize the workflow. The expected behavior is the following: 1. Nextflow may create a work directory (./work) where it will keep temporary files and execution scripts for each process. 2. It will read the fq2bam.nf script, which likely defines one process for running pbrun fq2bam and possibly some staging of outputs. 3. Nextflow will pull the Docker image specified in local.nf.conf if it's not already pulled. We should have already pulled clara-parabricks:4.6.0-1, so it should use the local image. 4. The pipeline will start running. You should see Nextflow log messages indicating the process being run, e.g., something like Launching process > runFq2Bam (1) or similar. 5. Since this pipeline probably has a single step, it will execute that on the available GPU. You should see the familiar Parabricks output in the Nextflow logs. Nextflow will capture the tool output and display some of it, typically. 6. If all goes well, the process will finish successfully. Nextflow will report the execution time and indicate where outputs are saved. Monitoring the run While it runs, you can open another terminal on the instance and use nvidia-smi to see GPU utilization. Parabricks will load the GPU heavily during alignment, and you'll see near 100% GPU usage for the duration of the BWA-MEM alignment step. Once Nextflow completes, it should place the output files at the locations you specified in the JSON. We'll verify those next. Another possibility is to use [Tracer's dashboard](https://www.tracer.cloud/docs) to gain complete end-to-end insights in real time. If Nextflow immediately fails with an error like "Missing container or Singularity image", double-check that you ran the command with \-c config/local.nf.conf\. Without this, Nextflow might not know to use the Docker image. Another sign of a config issue is if the process tries to run the \bwa\ or \pbrun\ command on the host (instead of in Docker) and fails. Always use the config to ensure Docker is enabled and the container is specified. Validating the Output BAM and Index After the Nextflow pipeline run finishes, we should confirm that the results are as expected: 1\. Check for successful completion. Nextflow's last lines should say something like Pipeline completed at ... with a success status. If it says error, then something went wrong (we'll address errors in the next section). List output files: `bash ls -lh /home/ubuntu/parabricks_sample/ ` Look for the files you named in the JSON, for example fq2bam_nextflow.bam, fq2bam_nextflow.bam.bai, and fq2bam_nextflow.recal.txt. They should be present. Check their sizes: - The BAM file should be non-zero (for a small sample, it might be a few MB). - The BAI index is usually just a few KB. - The recalibration report (.recal.txt) is a small text file with statistics about base quality adjustments (if BQSR was done). 2\. Inspect the BAM file To ensure the BAM is valid, you can do a quick sanity check. If you have samtools installed (if not, you can install via sudo apt-get install -y samtools), run: `bash samtools quickcheck fq2bam_nextflow.bam && echo "BAM looks okay" ` This will exit quietly if the BAM is uncorrupted. You can also view the header: `bash samtools view -H fq2bam_nextflow.bam | head -20 ` This should show header lines (@HD, @SQ for sequence dictionary, etc.), including SN:chr22, since our reference subset is chr22. It will also show an @PG line indicating it was produced by Parabricks (pbrun fq2bam). 3\. Check read count (optional) As a further validation, you might want to ensure the number of reads in the output BAM matches input. Since this was a toy example, we won't worry too much, but you could do: `bash samtools flagstat fq2bam_nextflow.bam ` to get counts of reads mapped, etc. Congratulations: If all looks good, congratulations, you have successfully run the NVIDIA Parabricks FQ2BAM pipeline on AWS using Nextflow! You have a GPU-accelerated aligned BAM file to show for it. Benchmark summary Use this table format to publish reproducible, comparable benchmarks. Every row should be runnable with a single "command line + inputs + instance type" combination, and every row should have an explicit validation step. Dataset tier Approx input size GPU instance / GPU Mode / key flags GPU wall time CPU baseline wall time Small ~15 MB per FASTQ (R1) + ~15 MB per FASTQ (R2) "small GPU" w/ 24 GB VRAM (example class) --low-memory 1m36s 3m16s Small same as above same GPU, 2 GPUs / 4 GPUs --low-memory worse than 1 GPU n/a Medium ~110M reads (per transcript) GPU (same class) RNA two-pass ON ~20m ~1h Medium ~110M reads GPU (same class) RNA two-pass OFF ~11m ~1h Large ~230M reads GPU (same class) (RNA pipeline) ~1h15m ~2h Verification step (required for every benchmark row) Run these after each benchmark to prove correctness (example shown for BAM output): `bash Validate BAM integrity (fast fail if corrupted) samtools quickcheck -v output.bam Sanity stats (mapping rate, duplicates, etc.) samtools flagstat output.bam | tee output.flagstat.txt Confirm index exists and matches BAM timestamp ls -lh output.bam output.bam.bai ` If you're benchmarking RNA STAR outputs, add an RNA-appropriate check (e.g., splice junction summary if available), but keep samtools quickcheck/flagstat as the universal baseline. Top Tips Top Tip #1: GPU VRAM is a hard gate Parabricks FASTQ to BAM jobs will fail with GPU out-of-memory errors when the available VRAM is insufficient. For context, RNA FASTQ to BAM requires approximately 38 GB of GPU memory in standard mode. GPUs with 16 to 24 GB of memory must therefore use the low-memory mode. However, it might be beneficial to use a low-memory mode as it often improves cost efficiency, while it reduces performance. When running on T4 GPUs and smaller A10 configurations, low-memory mode is mandatory to avoid these out-of-memory failures. In order to do this, explicitly enable --low-memory when supported. You can verify GPU memory capacity using: `bash nvidia-smi --query-gpu=name,memory.total --format=csv ` For more detailed information on how to resolve this, go to Question #4 in this text (below). Top Tip #2: More GPUs can be slower on small inputs Scaling from 1 GPU to 2 GPUs and then to 4 GPUs made small datasets slower. This happens because fixed overhead from splitting, staging, synchronization, and merging dominates total runtime. Therefore, for small to medium inputs, use 1 GPU per sample and scale by running more samples in parallel. Only use multiple GPUs for a single sample only when inputs are large and scaling has been validated. Validate scaling by comparing wall time and output equivalence between one-GPU and two-GPU runs. Top Tip #3: Batch/ECS misconfig can cause "hung" jobs Jobs may hang if the AMI, ECS agent, Docker daemon, or NVIDIA runtime is misconfigured. These failures are expensive to diagnose because Batch feedback loops are slow. Always validate the environment on a standalone EC2 instance before using Batch. Verify GPU visibility on the host and inside containers using: `bash nvidia-smi docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu20.04 nvidia-smi ` Ultimately, there is a possibility to see your runs in real time and catch these 'hung' jobs, by running [Tracer](https://www.tracer.cloud/technology) alongside Nextflow in the EC2 instances. Due to the eBPF technology, it runs in the operating system and displays all possible data in its monitoring dashboard. Troubleshooting and FAQs Even with careful instructions, things can go wrong. Below are some common issues and frequently asked questions, along with tips to resolve them: Ensure that the Nextflow binary is in your $PATH. If you didn't move it to /usr/local/bin/, you might need to call it with ./nextflow. Also verify Java is installed, although Nextflow will print a clear error if Java is missing or the wrong version. Use java -version to confirm you have Java 17+. If not, install it as described earlier. This means your current user (Ubuntu on AWS) isn't in the Docker group. On Ubuntu, you can add it: sudo usermod -aG docker $USER and then log out/log in to apply the group change. As a quick workaround, you can prepend sudo to your docker commands, but adding to the docker group is cleaner. The Deep Learning AMI might already have the Ubuntu user in the docker group, but if not, do the above. The Parabricks container is supposed to be publicly pullable without login. If you hit issues, ensure you have internet connectivity from the instance. If using a proxy or custom VPC with no internet, you'll need to fix that. In rare cases, NGC might require acceptance of terms, check the NGC link for Parabricks and ensure it's publicly available. This likely means you are using a GPU with 16 GB (like T4 or V100) and Parabricks default mode wanted more. The solution is to use the --low-memory flag with pbrun fq2bam. In the Nextflow pipeline, you would need to enable that option. You could edit the fq2bam.nf script to add --low-memory to the command, or possibly the pipeline has a parameter for it. If doing a one-off, an easier way is to launch a larger instance type. For our sample, using low-memory on a T4 works fine. Double-check the paths in your JSON file. They must match exactly where the files are on the instance. Also confirm you used absolute paths and that they are accessible inside the container. Nextflow will mount the working directory by default. If your files are in your home directory and you launched Nextflow from /home/ubuntu/parabricks-nextflow, then /home/ubuntu is likely mounted. If you placed files elsewhere, you might need to adjust the Nextflow config's docker.runOptions to mount that path or just run Nextflow from a parent directory. Using absolute paths within the mounted directory (like we did) is the safest approach. Nextflow can integrate with AWS Batch or Kubernetes to distribute work across multiple machines. For example, you could configure Nextflow's executor to AWS Batch, which would spin up multiple GPU jobs in parallel, which is useful if you're processing many samples. That's beyond this single-instance tutorial, but the Nextflow config and pipeline we used could be extended for that. You would specify resource requirements in Nextflow and let AWS Batch handle the rest. NVIDIA's documentation and Nextflow's docs have guidance on this. For a single sample or a few, using one EC2 instance might be simplest. If you need to keep the BAM output, you have several options: - Download it to your local machine via SCP: e.g., scp -i your-key.pem ubuntu@:/home/ubuntu/parabricks_sample/fq2bam_nextflow.bam . - Copy it to S3: install the AWS CLI on the instance and do aws s3 cp fq2bam_nextflow.bam s3://your-bucket/path/. - Use AWS Session Manager or others to retrieve files. Make sure to also fetch the .bai if you plan to use the BAM in downstream analysis. The process is similar. NVIDIA's Nextflow examples also include a [germline pipeline](https://docs.nvidia.com/clara/parabricks/latest/documentation/tooldocs/man_germline.html) with HaplotypeCaller and DeepVariant. You would use germline.nf and its corresponding JSON. The environment setup is the same, just a different command. In fact, since you've done FQ2BAM, you could proceed to run variant calling on the BAM output using [Parabricks HaplotypeCaller](https://docs.nvidia.com/clara/parabricks/latest/Documentation/ToolDocs/man_haplotypecaller.html) within the same or new Nextflow script. Check NVIDIA's docs for required parameters and consider using the provided workflows as a starting point. If you're using an on-demand instance, just be mindful of cost. If you want to save money, you could request a spot instance, which is cheaper but can be reclaimed by AWS, however, Parabricks runs are fast, so the chance it finishes before interruption is high. Just ensure your AWS region has the instance type available and you have enough quota. As mentioned before, another solution to this might be to run Tracer in your EC2 as it shows your costs as well as your metrics in real-time. This way, you won't run into unexpected cloud costs. Also, forgotten running EC2 instances can be spotted and terminated with one click. General Comment If you encounter a problem not covered above, consult the Parabricks forum or Nextflow community. Often, error messages from Nextflow or Parabricks are descriptive. For Nextflow-specific errors, adding -resume to retry from where it left off (after fixing an issue) can save time, and using -with-report can generate an HTML report of the run for debugging. Conclusion and Next Steps In this tutorial, we achieved a full run of NVIDIA Parabricks FQ2BAM on an AWS GPU instance using Nextflow to orchestrate the process. We covered launching the cloud instance, setting up the GPU environment, pulling necessary containers and data, and running both a manual Docker command and an automated Nextflow pipeline. The result was a sorted, duplicate-marked BAM file aligned according to GATK best practices, in a fraction of the time a CPU-based pipeline would take. When to scale up The pattern learned here can be applied to more substantial tasks. For processing multiple samples, you could modify the Nextflow pipeline to accept a list of FASTQs and parallelize the fq2bam process for each sample. Nextflow excels at parallel workloads, you might use Channel.fromFilePairs to feed multiple FASTQ pairs into the pipeline. With AWS, you could either run them sequentially on one bigger instance or configure Nextflow to use AWS Batch for true scaling, as mentioned. NVIDIA's example workflows and Nextflow documentation on AWS Batch can guide this. Going further After FQ2BAM, the typical next step is variant calling. You can use Parabricks' GPU-accelerated HaplotypeCaller or DeepVariant to turn your BAM into VCF. NVIDIA provides a germline workflow example (which chains BQSR -> HaplotypeCaller -> optional DeepVariant) in both Nextflow and WDL. With minor tweaks, you can add that to your Nextflow run or run it separately, using the BAM you just generated as input. This means end-to-end FASTQ-to-VCF can be done entirely on GPUs, potentially in under an hour for a whole genome, an impressive feat. Cleanup: Don't forget to terminate your EC2 instance if you're done experimenting, to avoid incurring costs. Also, you can delete the large Parabricks docker image if you no longer need it. On the instance: docker rmi nvcr.io/nvidia/clara/clara-parabricks:4.6.0-1 will remove the image. Keep up to date Finally, keep an eye on updates. Both Parabricks and Nextflow are actively developed. Parabricks might release new versions with even faster algorithms or new tools, and Nextflow regularly updates (ensure compatibility with your Java version as they evolve). Always refer to the official NVIDIA documentation and Nextflow docs for the latest best practices. We included an "Updated on" date at the top of this tutorial; if you're reading this much later, some specifics (like exact version numbers or instance types) might have changed, but the core concepts should remain valid. Sources: - NVIDIA Parabricks documentation - [Running Parabricks on AWS](https://docs.nvidia.com/clara/parabricks/latest/tutorials/cloudguides/aws.html) and [FQ2BAM Tutorial](https://docs.nvidia.com/clara/parabricks/latest/documentation/tooldocs/man_fq2bam.html) for confirmation of commands and requirements. - [NVIDIA Parabricks Nextflow GitHub](https://github.com/clara-parabricks-workflows/parabricks-nextflow) - example Nextflow configuration and usage for FQ2BAM. - [Nextflow documentation](https://www.nextflow.io/docs/latest/index.html) - installation and config requirements - [AWS Documentation](https://aws.amazon.com/ec2/instance-types/g4/) - instance types and GPU details for g4dn (T4) instances.
Background

Get Started Now

Ready to See
Tracer In Action?

Start for free or
Tracer Logo

Tracer is the first pipeline monitoring system purpose-built for high-compute workloads that lives in the OS.

2025 The Forge Software Inc. | A US Delaware Corporation, registered at 99 Wall Street, Suite 168 New York, NY 10005 | Terms & Conditions | Privacy Policy | Cookies Policy