Why We're Building Tracer: The Future of Scientific Computing Observability
More

Interview with Mathieu Latreille: Bridging Biology and Computation

Mathieu Latreille, biomedical researcher and TechBio entrepreneur, discusses the bottlenecks in modern research: fractured datasets, inconsistent metadata, and the cultural challenges that make collaboration between biologists and computational scientists unnecessarily difficult.

const metadata = ; This interview is part of our "Perspectives in Practice" series, spotlighting real-world expertise in pipeline engineering and scientific computing. This interview features Mathieu Latreille, a biomedical researcher and TechBio entrepreneur whose work has focused on gene networks and protein quality control in metabolic and neurodegenerative diseases. His research has been published in journals such as the Journal of Clinical Investigation and Nature Medicine, and his work as a Group Leader at the MRC London Institute of Medical Sciences / Imperial College London and earlier at ETH Zurich has been cited more than 1,500 times. These contributions helped establish foundational intellectual property and highlighted new therapeutic avenues in complex diseases. After moving into biotech, Mathieu led the RNA and Computational biology team at Harness Therapeutics and later served as Chief Scientific Officer in multiple ventures. Today he is building Vileo Bio, a platform designed to help biologists analyse their own data independently while ensuring workflows remain structured, transparent, and reproducible for computational teams across academia, biotech, and pharma. In this interview, Mathieu reflects on the bottlenecks he encounters repeatedly in modern research environments: fractured datasets, inconsistent metadata, and the cultural and organisational challenges that make collaboration between biologists and computational scientists unnecessarily difficult. Introduction Over the past decade, advances in sequencing and high-throughput profiling technologies have transformed biological research. Generating data has become faster, cheaper, and far more accessible. Yet many labs continue to struggle with its downstream interpretation. Datasets arrive in heterogeneous formats, metadata is often incomplete or inconsistent, and analytical workflows depend heavily on personal know-how that is rarely documented. These gaps create avoidable delays that limit insight and slow the pace of discovery. This disconnect is not rooted in a lack of talent. Rather, it reflects an imbalance between the ease of producing data and the lack of intuitive, structured tools to analyse it. Biologists often lack the computational training to explore datasets directly, while computational scientists are overwhelmed by requests, unclear handovers, and the challenge of maintaining reproducible workflows across teams. When combined with rapid staff turnover, the result is an analytical pipeline that is fragile, difficult to scale, and vulnerable to knowledge loss. Mathieu has spent his career at the intersection of these workflows, first as a biologist generating omics datasets, then as a computational scientist, and later as a biotech leader responsible for aligning teams. His experience highlights a recurring need: tools that allow biologists to work more independently without sacrificing the structure and reproducibility required by computational colleagues. In this interview, he discusses the bottlenecks he sees most often and the shifts needed to bridge this gap. Q&A with Mathieu Q: What originally pushed you to move from a purely biological role into bioinformatics? A: I was trained as a biologist and most of my work involved in vivo and in vitro models of disease. These projects were driven by large omics datasets, but I couldn't analyse the data myself at the time. I relied heavily on computational teams. Sometimes those collaborations flowed smoothly, but often miscommunication slowed everything down. That dependency made me want to learn to code so I could understand and explore my own datasets directly. Q: What did that transition look like once you started to work with the data yourself? A: Initially, I just wanted to interpret my own experiments. But once you gain that ability, you realise how many people face the same barriers. Biologists understand disease deeply, but cannot always articulate what they need computationally. Computational scientists, in turn, often lack the domain-specific context to intuit what the biologists mean. Many delays originate from this gap in translation. Q: What is the central focus of the platform you are building now? A: The goal is to reduce friction between biologists and computational teams. Both sides experience frustration: skillsets differ, expectations differ, and communication through emails or meetings is slow. When people are waiting for therapies, these delays matter. Vileo Bio allows biologists to upload and analyse their data without writing code, while ensuring that the workflows they use are structured, transparent, and interpretable for computational teams. It also provides a curated library of omics datasets and centralises scripts, so knowledge isn't lost when people leave. Q: What was the biggest challenge in building something that has to work for two very different groups? A: The technical lift is enormous for someone coming purely from biology. Biologists are not trained to build software platforms. Partnering with an experienced technologist was essential. Without that, it would not have been possible to turn these ideas into something intuitive and robust enough for real-world use. Q: In your experience, how is data and analysis typically handled? A: Mostly through Excel files, CSVs, and HTML reports. Some labs use data management systems that track who did what, but scripts are often stored in personal folders or individual laptops. When people leave, knowledge disappears. This fragility is what we're trying to solve: a central hub for data, analyses, and visualisation that preserves context and ensures reproducibility across teams. Q: In your experience, what are the biggest bottlenecks around data? A: Data fragmentation and metadata. Public repositories contain incredible information, but they are inconsistent. One dataset might have rich metadata; another might have almost none. Even bioinformaticians sometimes avoid them because of the inconsistency. Missing metadata is a major reason why science is difficult to reproduce. I've worked on projects based on published studies that we simply couldn't reproduce because critical metadata was absent. Until the field aligns on how datasets and metadata should be annotated, reproducibility will remain a challenge. Q: What developments in computational biology are you most excited about? A: Integration across omics layers. We already have extensive transcriptomic data, and spatial transcriptomics, long-read sequencing, and similar technologies are expanding what we can measure. The next crucial step is integrating these with epigenomic and proteomic layers to produce more accurate models of disease. I'm also following single-cell proteomics closely. It's early, but once it matures, it will give us an entirely new view of how biological systems are organised and how they malfunction in disease. Q: What practical advice would you give biologists preparing for this next phase of multiomics integration? A: Each omics technology has its own biases and limitations. Before combining transcriptomics with proteomics or ATAC-seq, make sure you understand the coverage, dropout rates, batch effects, and noise in each dataset independently. Poor-quality data doesn't improve through integration; it simply distorts your conclusions. Capturing metadata at a granular level makes downstream integration far easier. Many groups still underestimate how much this matters. Q: What impact do you hope your work will have? A: Drug discovery remains slow and expensive, and people are waiting for therapies. Sequencing costs keep falling, so the volume of data will only increase. If we give researchers intuitive tools, we can shorten time to insight and lower the cost of discovery. My goal is to reduce the gap between biology and computational science. If biologists cannot explore their data, and computational teams are overwhelmed, a lot of potential insight remains unused. That slows down progress for everyone. Final Thoughts Mathieu's perspective highlights the growing vulnerability in modern biological research: data is abundant, but the tools to interpret it are often too complex or inaccessible for most scientists. Better tools are not only about new algorithms; they help reduce friction, preserve knowledge, and make analysis accessible to the people who understand the biology. As technology evolves and datasets grow, strengthening the connection between biological and computational perspectives will be essential for turning data into insight and insight into therapies. Connect with Mathieu You can connect with Mathieu Latreille on [LinkedIn](https://www.linkedin.com/in/mathieu-latreille-16a36a1a/) and explore his academic work on [Google Scholar](https://scholar.google.com/citations?user=ieyvfSoAAAAJ&hl=en).
Background

Get Started Now

Ready to See
Tracer In Action?

Start for free or
Tracer Logo

Tracer is the first pipeline monitoring system purpose-built for high-compute workloads that lives in the OS.

2025 The Forge Software Inc. | A US Delaware Corporation, registered at 99 Wall Street, Suite 168 New York, NY 10005 | Terms & Conditions | Privacy Policy | Cookies Policy