2.5 Bioinformatics analysis
Samples were sequenced on the Oxford Nanopore GridION and PromethION
instruments until a sufficient average depth of coverage (minimum 8x,
with >20x preferred) was reached for variant calling. For
the few samples that did not reach this coverage threshold, individual
review was performed; and samples were included in the analysis if
informative variant calls were present that could be manually confirmed
by inspection of the alignment files. Nanopore data were aligned to the
ASFV Georgia 2007/1 reference genome (GenBank accession NC_044959.2)
using Minimap2 (v2.18-r1015) with the options “-N 1000 -a –eqx -x
map-ont” (Li, 2018). Illumina data were aligned to the same reference
genome using the Burrows-Wheeler Aligner (v0.7.17) with options “-a -h
2 -Y -M” (Li & Durbin, 2012). Insertions and deletions were called for
the subset of samples characterized with Illumina data using Freebayes
parallel (v1.3.4) with the option “–standard-filters” (Garrison &
Marth, 2012). SNPs for the epidemiological analysis were called using a
custom, open-source SNP caller (https://github.com/lakinsm/simple-snp).
Variants were required to meet the following thresholds to be considered
a true variant: a minimum depth of 10 observed alleles at a given
genomic location across the population of samples (DP >
10), a minimum observed alternate allele count of 7 at a given genomic
location across the population of samples (AO > 7), and an
alternative allele frequency greater than or equal to 70% at a given
site within a given sample. Additionally, all single nucleotide
polymorphisms described in the data were visually verified to be present
in the alignment files by a subject matter expert, and final variant
calls were manually corrected to match visual inspection if necessary.
Low-quality SNPs located in the 5,000 base pairs flanking the 5’ and 3’
terminal regions of the genome were not included in the analysis.
All publicly available raw data labelled as African Swine Fever Virus
whole genome sequence were downloaded from the National Center for
Biotechnology Information Sequence Read Archive (NCBI SRA). Genome
assemblies labelled as African Swine Fever Virus were downloaded from
the NCBI GenBank repository. Genotype II ASFV sequences were selected
from the NCBI SRA and GenBank data for comparison against samples from
the DR. The selected SRA and GenBank data were evaluated for quality.
Sequences that were of questionable quality based on the locations of
mutations and degree of relatedness via comparison using multiple
pairwise alignment were removed. A total of 54 ASFV genomes from public
databases were included in the final analysis (Supplementary File 1)
(Farlow et al., 2018; Gallardo et al., 2015; Olesen et al., 2009;
Kovalenko et al., 2019; Mazur-Panasiuk et al., 2020; Gilliaux et al.
2018; Xuexia et al., 2019; Olasz et al., 2019; Hakizimana et al., 2021;
Mazloum et al., 2021; Jia et al., 2020; Xiong et al., 2019).
NCBI SRA raw data retrieved from NCBI was aligned to the ASFV Georgia
2007/1 reference genome (GenBank accession NC_044959.2) using either
the Burrows-Wheeler Aligner (v0.7.17, short-read data) or Minimap2
(v2.18-r1015, long-read data) and variant-called as described above.
Consensus sequences including the SNP variants were produced for all DR
samples and external NCBI SRA data. The resulting consensus sequences
were multiple pairwise aligned against the whole genome sequences from
NCBI GenBank using MAFFT (v7.487) (Katoh et al., 2002). Phylogenetic
tree construction was performed using RAxML (v8.2.12) with the GTRGAMMA
model argument (as determined by model selection using likelihood
maximization) and visualized using FigTree (v1.4.4) (Kozlov et al.,
2019). A subset of 45 nodes was selected using the Treemmer software
(v0.3) to display on the phylogenetic tree in Figure 2 (Menardo et al.,
2018). SNP tables were visualized using the vSNP pipeline developed by
the USDA.