Jarrett D. Phillips (He/Him/His)
BSc. (Hons.), MBinf., PhD.
Adjunct Professor
School of Computer Science
Department of Integrative Biology
University of Guelph
Professional Summary of Research Interests and Expertise
I am a highly motivated and passionate bioinformatician, data scientist and statistician naturally driven by curiosity
to use mathematical, statistical and computational methods to answer fundamental and applied research questions
in biodiversity science, evolutionary biology, ecology, genomics and bioinformatics, particularly related to
molecular species identification and discovery through DNA barcoding, environmental DNA (eDNA) and other DNA-based approaches.
My academic work and research interests can best be described as computational molecular biodiversity science. Biodiversity
is under threat in a rapidly changing world, where mitigation requires innovative and collaborative solutions from multiple
disciplines. DNA-based specimen identification and species discovery through techniques like DNA barcoding and environmental
DNA (eDNA) offer promising ways forward, yet produce overwhelming amounts of data. I leverage AI/ML/Data Science/Big Data
methods to help researchers find meaningful signal in a vast sea of noise.
Recent News
- I received a $40000 CAD Food from Thought Advancing Research Impact Fund (ARIF) Livestock Innovation grant to develop the Dynamic Population Model (DPM) as part of my work with GBADs (13/06/2024)
- Two book chapters on DNA barcoding for specimen identification and species delimitation have now been published online by Springer Nature (30/04/2024)
- A preprint on GBADs informatics strategy, data quality, and model interoperability is now available
- My preprint on statistical modelling of seafood mislabelling in Canada is now on bioRxiv (08/02/2024)
- I attended the GBADs Technical Workshop in Liverpool, England from 11/12/2023 to 15/12/2023
- I was appointed Adjunct Professor in the School of Computer Science at the University of Guelph on 23/11/2023
- I joined the Global Burden of Animal Diseases (GBADs) Informatics team at the University of Guelph on 22/06/2023
- The VLF paper was published on 26/01/23
- My paper introducing and outlining the VLF R package has been accepted for publication in the Biodiversity Data Journal on 29/11/2022
- I received $30000.00 CAD in funding from the Food from Thought Advancing Research Impact Fund (ARIF) to develop a Bayesian hierarchical binary logistic time-series regression model of seafood fraud in the Canadian supply chain on 20/10/2022
- My recent paper arguing a lack of statistical rigor in DNA barcoding was featured as a University of Guelph College of Physical and Engineering Sciences (CEPS) Research Highlights article on 15/08/2022
- My opinion paper on statistical aspects of DNA barcoding and the DNA barcode gap was published in Frontiers in Ecology and Evolution on 14/04/2022
Research Interests
- Agent-based and individual-based modelling
- Bayesian computing and statistics in R and Stan
- Biodiversity informatics (e.g., DNA barcoding, environmental DNA (eDNA), DNA sequences analysis)
- Data science
- Machine learning (e.g., association rule generation, clustering, classification, and dimensionality reduction of (e)DNA sequence data and related metadata)
- Nonparametric statistics (e.g., bootstrap resampling, local regression, kernel methods)
- Spatiotemporal modelling (e.g., conditionally autoregressive models (CARs), Gaussian processes (GPs)/Kriging, generalized additive models (GAMs), time series)
- Stochastic optimization (e.g., random search algorithms, genetic algorithms (GAs), simulated annealing)
Current Projects
Agent-based, equation-based, and individual-based models for animal disease burden
- We are currently developing an age- and sex-structured compartmentalized equation-based model in R to assess disease burden in livestock species such as cattle, small ruminants (e.g., sheep and goats), and poultry in developing countries like Ethiopia
Theoretical Aspects of DNA-based Taxon Identification
- Developing methods to estimate likely required specimen sample sizes for genetic diversity assessment within species using DNA barcoding
- Practical sample sizes for DNA barcoding typically range between 5-10 indviduals per species, but required levels of sampling depth are highly
dependent on the evolutionary history and ecology of the species under study, as well as species rarity and overall project costs.
- I created HACSim (Haplotype Accumulation Curve Simulator), a novel nonparametric stochastic (Monte Carlo) local search optimization algorithm, which has been
developed to better estimate likely required specimen sample sizes based on asymptotic behaviour seen in species' haplotype accumulation curves.
- HACSim has been shown to work well for a variety of species of socioeconomic relevance such as fishes, insects and arachnids based on extensive
simulation studies.
- The publicatiion of HACSim in PeerJ Computer Science was one of the Top-5 most viewed articles in the category "Optimization Theory and Compution"
- Developing methods for better visualization and inference of the DNA barcode gap
- The DNA barcode gap — the difference between genetic variation observed with and among species, is most often visualized using histograms; however,
these plots can be misleading due to dependence on user-defined parameters, thus greatly affecting the overall shape of probability distributions.
- I propose Kernel density estimation provides a better path forward when it comes to establishing the efficacy of DNA barcoding as a molecular identification
tool.
- I also propose the use of nonparametric bootstrapping, specifically the m-out-of-n boostrap, which can be used to estimate the sampling distributions of quantities
of interest in DNA barcoding, particularly extreme order statistics like the minimum interspecific and maximum intraspecific genetic distances, in addition
to the DNA barcode gap.
- We are currently developing a population genetic model of the DNA barcode gap based on the Multispecies Coaleacent (MSC)
Building the Reference Library of Life
- Constructing the largest DNA barcode reference sequence library for North American butterflies
- Butterflies are some on the most genetically and morpohologically diverse insects on the planet. One aspect of my work involves generating a DNA barcode
library for over 97% (>800 species) of known butterfly species found in North America
- Current work using this dataset involves correlating species' genetic diversity with geographical covariates (such as latitude and longitude)
using semiparametric and nonparametric regression approaches including Generalized Additive Models (GAMs) and Local Regression (LOESS)
Development of R Software Packages and R Shiny Web Apps for Molecular Biodiversity Assessment
- HACSim is available for download as an R package through the Comprehensive R Archive Network (CRAN) and as an R Shiny web application at shinyapps.io
- VLF is an R package to detect very low frequency variants (VLFs), such as sequencing and PCR errors, and compute error rates at second codon positions in DNA sequences. The package is on CRAN.
Assessing R Metadata Reporting and Standards in the Ecological Literature
-
Within many scientific publications that employ R, lack of proper citation of R versions and packages employed within analyses is rampant
-
We mine article metadata spanning five different ecological journals for publications appearing in 2019 and find significant variation in how R is currently being reported
-
We propose a simple way of standardizing reporting that will help mitigate biases in future studies
Student Supervision and Mentorship
- Nikolett Toth (with Dan Gillis, 2024) -- University of Guelph -- Summer Undergraduate Research Assistant (URA) -- Mining association rules for eDNA spatiotemporal sampling
- Nathan Zeinstra (with Dirk Steinke, 2024) -- University of Guelph -- IBIO*6070 -- Habitat occupancy modelling of sea lamprey environmental DNA
- Fynn De Vuono-Fraser (with Dan Gillis, 2023) -- University of Guelph -- CIS*4900 -- Bayesian modelling of seafood fraud in the Canadian supply chain
- Zaid Al-Gayyali (with Dan Gillis, 2023) -- University of Guelph -- Summer Undergraduate Research Assistant (URA) -- Seafood Fraud Visualization Tool R Shiny web app
- Fynn De Vuono-Fraser (with Dan Gillis, 2023) -- University of Guelph -- STAT*4600 -- Bayesian modelling of seafood fraud in the Canadian supply chain
- Amina Asif (with Bob Hanner, 2022) -- University of Guelph -- BINF*6999 -- DNA barcode gap analysis of Canadian agricultural pests and disease vectors
- Navdeep Singh (with Dan Gillis, 2021) -- University of Guelph CIS*4900 -- HACSim RShiny web application
- Maya Persram (with Bob Hanner, 2020-present) -- University of Guelph Hanner Lab volunteer
- Ashley Chen (with Bob Hanner, 2020-present) -- University of Guelph Hanner Lab volunteer
- Olivia Friesen Kroeker (with Bob Hanner, 2020-present) -- University of Guelph Hanner Lab volunteer
- Scarlett Bootsma (with Dan Gillis, 2020-2021) -- University of Guelph -- CIS*4900/4910 -- HACSim simulation study
- Danielle St. Jean (with Dan Gillis, 2018-2019) -- University of Guelph -- MSc. thesis (Math) -- DNA barcode sequence classification with machine learning
- Christina Fragel (with Bob Hanner, 2018-2019) -- University of Guelph -- BINF*6999 -- DNA barcode sequence classification with machine learning
- Jiaojia (Paula) Yu (with Bob Hanner, 2018-2019) -- University of Guelph -- BINF*6999 MDMAPR qPCR R Shiny app
- Steven French (with Dan Gillis, 2018) -- University of Guelph -- CIS*4900/4910 -- HACSim R package
- Julia Harvie (with Bob Hanner, 2018-2019) -- University of Guelph -- MCB*4500/4510 -- Data mining GenBank and BOLD
- Ankita Bhanderi (with Bob Hanner, 2018) -- University of Guelph -- BINF*6999 -- Data mining GenBank and BOLD
Research Collaborators
- Dr. Daniel Gillis (School of Computer Science, University of Guelph, Canada)
- Dr. Robert Hanner (Biodiversity Institute of Ontario, Department of Integrative Biology, University of Guelph, Canada)
- Dr. Robert Young (Department of Integrative Biology, University of Guelph, Canada)
- Dr. Jacopo D'Ercole (Centre for Biodiversity Genomics, University of Guelph, Canada)
- Dr. Cortland Griswold (Department of Integrative Biology, University of Guelph, Canada)
- Dr. Nicolas Hubert (Institut de Recherche pour le Dévelopement, Université de Montpellier France)
- Dr. Dirk Steinke (Centre for Biodiversity Genomics and Department of Integrative Biology, University of Guelph, Canada)
- Dr. Luiza Antonie (School of Computer Science, University of Guelph, Canada)
- Dr. Deborah Stacey (School of Computer Science, University of Guelph, Canada)
- Dr. Theresa Bernardo (Department of Population Medicine, University of Guelph, Canada)
- Dr. Kurtis Sobkowich (Department of Population Medicine, University of Guelph, Canada)
- Dr. Le Nguyen (School of Computer Science, University of Guelph, Canada)
Funding
Curriculum Vitae
CV — last updated October 2024
Contact
GryphMail
GMail
Relevant Links
GitHub
Google Scholar
LinkedIn
ORCiD
ResearchGate
Twitter/X