Raw reads assembly and analysis of E3 genome sequence
For genome sequence, software SOAPdenovo2.04 and Platanus1.2.4 was used
for reads assembly. After mapping reads to contigs, the assembly was
optimized according to relation between paired-end and overlapping.
Software FGAP was then used to fix gaps in the genome.
After genome sequencing and assembly, CDSs were predicted using software
Glimmer3.02. Annotation of CDSs were conducted based on blast of protein
sequences in database, i.e. KEGG, COG, SwissProt, TrEMBL, phi and
iprscan. Repetitive sequences were recognized by blasting contigs to
transposon database by software RepeatMasker and TRF (tandem repeat
finder). MUMmer was used for SNP calling by comparing E3 genome to the
reference genome of NRRL23338. After SNP calling, SNP filtration was
performed by removing SNPs with low quality. LASTZ1.01.50 was adopted to
identify InDel mutations in E3 genome.