Raw reads assembly and analysis of E3 genome sequence
For genome sequence, software SOAPdenovo2.04 and Platanus1.2.4 was used for reads assembly. After mapping reads to contigs, the assembly was optimized according to relation between paired-end and overlapping. Software FGAP was then used to fix gaps in the genome.
After genome sequencing and assembly, CDSs were predicted using software Glimmer3.02. Annotation of CDSs were conducted based on blast of protein sequences in database, i.e. KEGG, COG, SwissProt, TrEMBL, phi and iprscan. Repetitive sequences were recognized by blasting contigs to transposon database by software RepeatMasker and TRF (tandem repeat finder). MUMmer was used for SNP calling by comparing E3 genome to the reference genome of NRRL23338. After SNP calling, SNP filtration was performed by removing SNPs with low quality. LASTZ1.01.50 was adopted to identify InDel mutations in E3 genome.