Podosphaera xanthii is a widespread obligate biotrophic fungal pathogen responsible for powdery mildew disease in cucurbit crops. As an obligate biotroph,
P. xanthii can only grow and reproduce on living host tissues and cannot be cultured on artificial media, which has long hampered traditional microbiological and genetic research (
Fernández-Ortuño et al., 2007). This host-dependent lifestyle is sustained by specialized haustoria, which are intracellular feeding structures that extract nutrients from host cells and secrete effector proteins to modulate the host’s physiology (
Martínez-Cruz et al., 2014). Because
in vitro culturing is impossible, researchers must propagate
P. xanthii on host plants and then harvest fungal tissue from infected plant material for DNA extraction. These requirements make genomic studies technically challenging, as fungal material is limited and often intermixed with host tissue. Recent literature emphasizes that applying modern genetic and omics approaches to obligate powdery mildew fungi remains a major challenge due to their strict host dependency (
Kusch et al., 2022).
To date, including the present study, four
P. xanthii genomes have been reported (
Table 1) (
Kim et al., 2021;
Polonio et al., 2021;
Wang et al., 2023).
Kim et al. (2021) presented a 209.1 Mb assembly (N50: 581.6 kb) from a cucumber isolate, although this genome was flagged by National Center for Biotechnology Information (NCBI) for potential contamination.
Polonio et al. (2021) generated a 142.1 Mb hybrid assembly (N50: 162.8 kb) with an exceptionally high repeat content of 76.16%, predominantly composed of long terminal repeats (LTRs) and other transposable elements, resulting in approximately 14,911 predicted genes.
Wang et al. (2023) recently published a highly contiguous assembly (152.7 Mb, 58 contigs, N50: 691.6 kb) from race 2F (isolate YZU573), annotating approximately 6,491 protein-coding genes, with repetitive sequences constituting 72.39% of the genome. The substantial discrepancy in annotated protein counts between
Polonio et al. (2021) (~15,000 genes) and
Wang et al. (2023) (~6,500 genes) highlights potential methodological differences in gene annotation, repeat masking, or RNA-seq data usage, emphasizing the need for standardized annotation frameworks in obligate biotrophic fungi.
In the present study, we provide a high-quality genome of
P. xanthii Race 1 assembled from PacBio HiFi long reads, with Illumina short reads and RNA-seq used to polish the consensus and support precise gene annotation. Because
P. xanthii is an obligate biotroph and material must be collected from infected host tissue, we anticipated host and microbial carryover and therefore performed a two-step contamination screening. The contamination-free status was achieved not by sequencing alone but through post-assembly curation: initial NCBI contamination screening removed obvious non-target sequences, and subsequent ContScout-guided taxonomic filtering (
Bálint et al., 2024), identified and eliminated residual contaminants not flagged by NCBI. Together with rigorous sampling to limit host DNA carryover, these steps provide a reliable genomic resource for comparative genomics and functional studies of this powdery mildew pathogen.
For whole genome sequencing, previously obtained
P. xanthii Race 1 was artificially inoculated on a susceptible melon cultivar and kept in incubation for 10 days at 25°C, 16 h/8 h day/night under 65% humidity (
Hong et al., 2018). After 10 days, the spores and the conidial growth were collected from the leaves using a low-pressure handheld vacuum pump, collected in a sterile 1.5 mL tube and stored in −80°C refrigerator until used. This method helped in the collection of
P. xanthii spores and conidia with the least host contamination. Samples were then grounded to powder in a pre-chilled sterile mortar and pestle with liquid nitrogen. DNA was extracted using the cetyl trimethyl ammonium bromide method with slight modifications. Using the DNA Mini Kit (Qiagen, Melbourne, Australia), genomic DNA (gDNA) was extracted from the collected spores and conidia. Similarly, RNA was also extracted using TaKaRa Universal RNA extraction kit (TaKaRa, Osaka, Japan) according to manufacturer’s instruction. The quality and quantity of gDNA and RNA were observed using NANODROP 1000 spectrophotometer (Thermo Scientific, Wilmington, DE, USA). Additional quality control of gDNA was performed using Femto Pulse system for SMRTbell library preparation PacBio sequencing.
Extracted samples of DNA and RNA were given to Seeders (Daejon, Korea) for library preparation and sequencing. A HiFi SMRTbell library was prepared from the gDNA for PacBio. Using this library in the PacBio Sequel Hifi reads were generated. In case of Illumina, a TruSeq Nano DNA library with 350 bp insert size was prepared for generating short reads. Similarly, TrueSeq mRNA library was prepared and sequenced using Illumina. Using the gDNA reads generated from the Illumina, the genome size of P. xanthii was estimated. Low-quality reads and adapter sequences were removed from the generated reads to obtain the high-quality reads.
The genome was assembled
de novo using the PacBio HiFi Genome Assembly application with default parameters (
https://github.com/PacificBiosciences/pbipa). The HiFi reads were filtered to a minimum predicted accuracy of Phread score Q20, overlapped with Pancake, and phased with Nighthawk. Chimeric and duplicate reads were removed before string-graph construction. The resulting primary contigs were polished with Racon, and redundant haplotype sequences were removed with purge_dups. Although
P. xanthii is haploid during infection, we retained the default phasing because the genome is highly repeat rich, which helps separate recent duplications, reduce repeat collapse, and improve contiguity and structural accuracy for downstream annotation.
The initial genome assembly consisted of 414 scaffolds with a total length of 162.5 Mbp (
Table 2). Upon submission to NCBI, contamination from plant and basidiomycete fungal sequences was detected, and the assembly was registered after NCBI-curated removal of contaminants (GenBank accession no. GCA_037575625.1). To further identify and eliminate potential contaminants, we applied ContScout, a tool designed to sensitively detect genome contamination (
Bálint et al., 2024). This analysis revealed that approximately 7% of the initial
P. xanthii genome assembly was contaminated (
Fig. 1A). Among the contaminants, 0.5% were unclassified sequences, 0.06% originated from plants (Viridiplantae; Streptophyta), and 6.5% were from non-
P. xanthii fungal sequences. Within the fungal-derived contaminants, 6.3% belonged to Ascomycota and 0.2% to Basidiomycota.
After removing the contaminated scaffolds, the final assembly comprised 67 scaffolds with a total size of 151.7 Mbp (
Table 2) (
doi.org/10.6084/m9.figshare.29614655). Notably, 25 scaffolds exceeded 2 Mbp in length (
Fig. 1B). Both the N50 value and the average scaffold length increased, and the GC content was 43.32%, consistent with other reported
P. xanthii genomes. The quality of the genome was further assessed using BUSCO v5.8.2 analysis based on conserved orthologs (
Manni et al., 2021). The results indicated high genome completeness, with 99.4% and 98.7% complete BUSCOs detected at the Fungi level before and after contamination removal, respectively (
Table 2,
Fig. 1C). However, while the initial assembly contained 33.9% duplicated orthologs, this was reduced to 0.6% after contamination filtering. Similar reductions in duplication were observed at the Ascomycota and Leotiomycetes levels, while overall ortholog completeness was retained. These results confirm that essential phylogenetic markers of
P. xanthii were preserved in the final assembly despite the removal of contaminants.
The genome size of
P. xanthii is approximately four times larger than the average genome size (36.91 Mb) of Ascomycota, primarily due to a high proportion of repetitive sequences (
Mohanta and Bae, 2015). In this study, repeat elements of
P. xanthii Race 1 were annotated
de novo using RepeatModeler v2.0.5 and RepeatMasker v4.1.6, revealing that repeats constitute 80.04% (121,447,646 bp) of the genome (
Fig. 2A) (
Chen, 2004;
Flynn et al., 2020). Among the repeat elements, retroelements represent the largest fraction, comprising 51.03% of the genome, including 20.59% LINEs and 30.45% LTR elements. The LTR elements were the most abundant class, predominantly consisting of Ty1/Copia and Gypsy/DIRS1 types. DNA transposons accounted for 9.73%, the majority belonging to the Tc1-IS630-Pogo superfamily, while unclassified repetitive sequences represented 17.03%. The exceptionally high repeat content observed in
P. xanthii may reflect adaptive genomic expansion commonly observed in obligate biotrophic pathogens (
Kemen et al., 2015). Indeed, previous studies in powdery mildews and other obligate pathogens have suggested that elevated transposable element activity promotes genetic plasticity, facilitating adaptation to host immune responses and environmental challenges (
Kusch et al., 2024).
Gene annotation was performed on the repeat-masked genome using BRAKER3, which integrates
ab initio prediction with Augustus, RNA-seq evidence, and protein homology-based evidence derived from Erysiphaceae protein sequences downloaded from NCBI (
Gabriel et al., 2024;
Hoff and Stanke, 2019). As a result, a total of 7,452 genes and 8,078 protein products were predicted. Among them, 4,404 proteins (54.5%) were assigned functional categories through EggNOG annotation (
Cantalapiedra et al., 2021). The most abundant class was replication, recombination, and repair, followed by posttranslational modification, protein turnover, and chaperones; translation, ribosomal structure, and biogenesis; and intracellular trafficking, secretion, and vesicular transport (
Fig. 2B). Because these housekeeping categories prevail across eukaryotic genomes, they are not by themselves diagnostic of biotrophy. We therefore focus on secretion and host-interaction modules, including the predicted secretome of small secreted proteins and putative effectors, carbohydrate-active enzymes (CAZymes), and membrane transport and trafficking components that support haustorial function. These categories more directly explain immune evasion, plant cell wall remodeling, and nutrient uptake in obligate biotrophs, consistent with prior work on biotrophic compatibility and effector-driven host manipulation (
Gebrie, 2016;
Leiva-Mora et al., 2024). In
P. xanthii, a repeat-rich genome likely inflates genome-maintenance categories, whereas secretion-oriented repertoires, detailed below, provide a clearer link to biotrophic adaptation and align with the characteristic reduction in CAZymes and emphasis on secreted effectors reported for biotrophic fungi.
Biotrophic fungal pathogens are known to possess a reduced proportion of CAZymes and an increased proportion of putative effectors or small secreted proteins compared to pathogens with different lifestyles (
Kim et al., 2016). In
P. xanthii Race 1, analyses using dbCAN v4.1.4, SignalP v6.0, and EffectorP 3.0 predicted 130 CAZymes under strict criteria and up to 427 candidate CAZymes (all possible hits), along with 378 secreted proteins and 96 putative effectors (
Sperschneider and Dodds, 2022;
Teufel et al., 2022;
Zheng et al., 2023). Despite being an obligate biotroph,
P. xanthii exhibits a comparably low CAZyme and secretome profile similar to other biotrophic fungi (
Jia et al., 2023; Zhao et al., 2014). However, its effector count falls below the average (150) observed among obligate biotrophic fungi, suggesting a more specialized or streamlined effector arsenal (
Jia et al., 2023;
Liang et al., 2018).
To determine the precise phylogenetic position of
P. xanthii Race 1, we retrieved genome data of closely related species within the Erysiphaceae family from NCBI and performed ortholog clustering and phylogenomic analysis using OrthoFinder v2.5.4 (
Emms and Kelly, 2019). A phylogenomic tree constructed based on 1,897 conserved orthologs shared among the analyzed strains showed that
P. xanthii strains formed a distinct clade, with
Podosphaera fusca ZM-2022-MT899186 serving as the outgroup (
Fig. 2C). Within this clade, Race 1 clustered closely with the Wanju2017 strain, which was also isolated in Korea. The outgroup genera
Blumeria,
Golovinomyces, and
Erysiphe each formed their own distinct clades, supporting the reliability of the tree in representing evolutionary relationships within the Erysiphaceae family. The close relationship between Race 1 and Wanju2017 suggests a high degree of genetic similarity, likely due to their geographic proximity and potential intraspecific genomic stability.
Additionally, contamination analysis performed on other
P. xanthii strains, including 2086, YZU573, and Wanju2017, revealed that 0.08% (111 kb), 0.28% (434 kb), and 23.66% (49,465 kb) of their original genome sequences, respectively, originated from other organisms (
Table 3). In strain 2086, contaminated sequences primarily derived from plants and some Basidiomycota and Ascomycota fungi, whereas YZU573 additionally contained sequences originating from Arthropoda. The genome of Wanju2017, exhibiting the highest contamination level, was mixed with sequences from diverse organisms including bacteria, viruses, Stramenopiles, protists, plants, and fungi. After the removal of contaminated sequences, genome sizes, GC content, and gene numbers became more consistent across strains, which is particularly important for obligate biotrophs like
P. xanthii, where accurate genome representation is essential for studying their highly specialized and host-dependent lifestyles.
In conclusion, we present a high-quality, contamination-free genome assembly of P. xanthii Race 1, representing a valuable reference resource for studying the genomic basis of obligate biotrophy in powdery mildew fungi. By employing a combination of PacBio HiFi long reads, Illumina short reads, and RNA-seq-based annotation, alongside rigorous contamination filtering using ContScout, we overcame major challenges associated with host DNA interference and assembly quality. The final genome exhibits high completeness, low redundancy, and a repeat-rich architecture consistent with other obligate biotrophic pathogens. Comparative analyses of gene content, repeat elements, and effector repertoires support the specialized nature of P. xanthii and highlight its evolutionary adaptation to a host-dependent lifestyle. Moreover, contamination analyses of previously reported genomes revealed variable levels of foreign DNA, emphasizing the importance of stringent quality control in the genomic study of obligate biotrophs. Together, our findings contribute to a more accurate and comparative understanding of P. xanthii biology and provide a robust foundation for future functional and evolutionary investigations in Erysiphaceae.