Introduction
Grapevine rupestris stem pitting-associated virus (GRSPaV) of the genus Foveavirus (family Betaflexiviridae; Adams et al., 2012) is a common grapevine pathogen occurring worldwide. It is one of the most prevalent grapevine viruses in the world that is spread through vegetative propagation, grafting (Meng and Gonsalves, 2007) and possibly via pollen and seed (Lima et al., 2006; Rowhani et al., 2000; Stewart and Nassuth, 2001). However, biological vector (if any) involved in a short distance transmission is yet to be identified (Martelli, 2014). GRSPaV host range is restricted to wild and cultivated grapevines (Vitis spp). It may be associated with rupestris stem pitting, a disorder of the rugose wood complex (Meng et al., 1998, 1999; Zhang et al., 1998), as well as with vein necrosis (Bouyahia et al., 2005; Morelli et al., 2011) and Syrah decline (Lima et al., 2006). However, these reports are challenged by results of some other studies (Alliaume et al., 2012; Borgo et al., 2006; Della Bartola et al., 2012; Goszczynski, 2010) and definitive experimental proofs for the causal role of GRSPaV in any of the diseases is still lacking. In addition, the latent infections by GRSPaV are frequent (Meng et al., 2005).
GRSPaV genome consists of positive-sense, single-stranded RNA of about 8,700 nucleotides, excluding the poly(A) tail, which is encapsidated in filamentous virions of about 720 nm in length (Meng and Gonsalves, 2007; Petrovic et al., 2003). The genome contains at least 5 open reading frames (ORFs) encoding the replicase complex (ORF1), the triple gene block movement proteins (ORFs2-4) and the coat protein (ORF5) (Martelli et al., 2007; Meng et al., 1998; Morelli et al., 2011). A hypothetical ORF6, overlapping ORF5, and potentially encoding a product of 119 aminoacids in size and with no known function, has been reported (Morelli et al., 2011; Zhang et al., 1998). However, it remains to be clarified whether ORF6 is expressed in vivo and what is the role of the putative protein in the life cycle of the virus.
Analysis of the molecular variability and genetic structure of viruses allow understanding the evolutionary history of viruses in relation to their virulence, dispersion, and/or emergence of new epidemics. Besides a high mutation rates resulting from the lack of proofreading activity of their RNA-dependent RNA polymerases, genetic recombination has been shown to represent another evolutionary mechanism shaping the populations of RNA viruses and contributing to their diversity (García-Arenal et al., 2001).
Several papers described a substantial molecular variability of GRSPaV, resulting in the identification of distinct phylogenetic groups (Buzkan et al., 2015; Hu et al., 2015; Meng et al., 2006; Nakaune et al., 2006; Nolasco et al., 2006; Terlizzi et al., 2011). However, despite relatively high genetic diversity of the virus and its widespread distribution through the viticulture industry, there is an apparent absence of geographical clustering of variants (Alabi et al., 2010; Nolasco et al., 2006).
Additionally, several studies have shown that GRSPaV-infected samples harboured mixed infections of distinct genetic variants (Meng et al., 1999, 2006; Nolasco et al., 2006). Mixing of different variants in a single plant did not lead to a recombination according to results of the study conducted in Portugal (Nolasco et al., 2006). Nevertheless the importance of recombination events in the GRSPaV evolutionary history were recently suggested by partial sequence data (Alabi et al., 2010).
In this work, we have determined complete genome sequences of four distinct GRSPaV variants from the central European area, analysed their distribution in a single grapevine and identified their evolutionary origin and relationships with other fully sequenced GRSPaV isolates.
Materials and Methods
Source plants
Different type of tissue (i.e., leaves, petioles, mature canes) collected from a 10-year-old grapevine plant of cv. ‘Veltliner’, with a lab code SK704, grown in a vineyard in western Slovakia (48° 18′ 14.6″ N, 17°15′ 48.9″ E) has been used as one of GRSPaV sources in this study. This plant was used as a source for sequencing three different molecular variants of GRSPaV and for study on spatio-temporal distribution of these variants in the host.
Grapevine labelled as SK30, used for sequencing of an isolate of GRSPaV, was collected from an abandoned vineyard in western Slovakia (48°18′ 21.8″ N, 17°15′ 27.1″ E) located in the vicinity of one hosting SK704. None of two GRSPaV sources used in this study (SK704 and SK30) displayed visible viral symptoms or any obvious disorder.
Sequence analysis of the SK704 grapevine plant
Five dormant canes of the SK704 plant were taken in February 2015 and were forced to bud at room temperature. After the bud burst, a pool of randomly selected young leaves was used for total RNA extraction using the Nucleo Spin RNA Plant kit (Macherey-Nagel, Duren, Germany). The Ribo-Zero rRNA Removal Kit (Illumina, San Diego, CA, USA) was applied to remove undesirable ribosomal RNAs.
Depleted total RNA sample was used for ds cDNA reverse-transcription with SuperScript II kit (Thermo Fisher Scientific, Waltham, MA, USA). The ds cDNA was column-purified with DNA Clean & Concentrator™-5-DNA kit (Zymo Research, Irvine, CA, USA) and quantified fluorometrically with use of Qubit 2.0 Fluorometer (Thermo Fisher Scientific). The sample was then processed with transposon-based chemistry, Nextera XT (Illumina) library preparation kit. Briefly, virus ds cDNA was fragmented with transposon into fragments with average length of 435 bp. With a limited cycle PCR (12 cycles), the obtained fragments were amplified and simultaneously indexed with unique adapters at both, 5′ and 3′ ends. Consequently, sample purification and size selection was carried out with 1.8× sample volume with Agencourt AMPure XP magnetic beads (Beckman Coulter, Brea, CA, USA). The fragment size structure of the DNA library was assessed using Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). Indexed denatured DNA library was sequenced with 200-bp paired-end sequencing on Illumina MiSeq platform (Illumina).
The missing 5′ end of GRSPaV-SK704-A, -B, and -C variants was determined by standard Sanger sequencing of PCR products overlapping the contigs obtained by next generation sequencing (NGS). Oligo-dT-primed RT-PCR was similarly used to determinate the 3′ terminal sequences (Supplementary Table 1). The most divergent parts of the genome (approximate nt positions 1650-2200) were amplified in RT-PCR and re-sequenced to verify the NGS-generated data. When necessary, the viral sequences were verified by independent PCR amplifications and sequencing using custom-made primers. To assess the polymorphism of GRSPaV SK30 population, aliquots of three PCR products (GRSPaV-1303-s/GRSPaV-2744- as, GRSPaV-6664-s/GRSPaV-8436-as, and Rup_6288F/Rup_6948R; Supplementary Table 1) were cloned into the pGEM-T Easy vector (Promega Corp., Madison, WI, USA) according to the manufacturer’s instructions and eight to ten randomly chosen cDNA clones were sequenced on both strands for each PCR product.
Full-length genome determination of the GRSPaV SK30 isolate
NGS analysis of the SK30 plant has already been described (Glasa et al., 2014, 2015) and allowed to detect the presence of several viral agents, including GRSPaV (Glasa et al., 2014). As the complete GRSPaV nucleotide sequence were not covered by siRNAs reads (data not shown), the genome was completed with 7 overlapping PCR fragments amplified using the TaKaRa LA Taq polymerase (TaKaRa Bio Inc., Shiga, Japan; Supplementary Table 1). All PCR products were cloned and at least five clones were sequenced for each genome portion in order to investigate possible presence of multiple distinct genomic variants. When necessary, the viral sequences were verified by independent PCR amplifications and sequencing using custom-made primers.
Sequence analyses
Complete sequences of Slovak GRSPaV isolates were compared with those of other GRSPaV isolates available in the GenBank database (www.ncbi.nlm.nih.gov, accessed on January 2016). All complete sequences of GRSPaV isolates were used for comparison. Sequence analyses were performed using either MEGA 6.0 (Tamura et al., 2013) or DnaSP (Librado and Rozas, 2009) and on-line tool (http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml). The phylogenetic trees were inferred using the neighbor-joining algorithm implemented in MEGA 6.0 and strict identity distance metrics. Phylogenetic network analysis was performed using SplitsTree v. 4.14.2 program (Huson and Bryant, 2006).
The complete nucleotide GRSPaV sequences reported in this paper have been deposited in the GenBank database under accession numbers KX274274 (SK704-A), KX274275 (SK704-B), KX274276 (SK704-C), and KX274277 (SK30).
Study of in planta distribution of three GRSPaV-SK704 variants
Total RNA were extracted from various grapevine samples (phloem scrapings of dormant canes, leaf petioles, green grape rachis) using the Nucleo Spin RNA Plant kit (Macherey-Nagel). The first-strand cDNA was synthesized using random hexamers and the Prime-Script (MMLV) reverse transcriptase (TaKaRa Bio Inc.). The same cDNA was used in three independent PCRs using pairs of primers designed to enable the specific detection of GRSPaV variants present in the SK704 plant, i.e., SK704-A (generating a product of 401 bp): A-Rup1655F (5′-AGTGTAAGAATGGCTCTTGGG-3′, sense)/A-Rup2055R (5′-GAGTATGTGTCGGAGGGCTG-3′, antisense), SK704-B (300 bp): B-Rup1742F (5′-TCATACAGGTCAAGGCTGGC-3′, sense)/B-Rup2041R (5′-GCTAGAGCAGAATTTGGCAAC-3′, antisense) and SK704-C (413 bp): C-Rup1651F (5′-TCAAAGGTTAAGGCTGCGC-3′, sense)/C-Rup2063R (5′-CCTCTCAGAGGTTGGACTAGC-3′, antisense).
cDNA synthetized from total RNA was used as a positive control for PCR. Asymptomatic LN33 grapevine plant grown under controlled conditions was used as a negative control. For all combinations of primer sets, PCR amplification was performed using the Emerald Amp GT PCR Master Mix (Takara Bio Inc.) under the following cycling conditions: initial denaturation at 95°C for 1 min; 35 cycles of 95°C for 15 s, 56°C for 20 s, and 72°C for 30 s; final extension step at 72°C for 5 min. The specificity of PCR products was ascertained by sequencing. In case of the failed detection of a variant(s) in a given sample, the PCR test was repeated with the increased amount of cDNA.
Results
NGS analysis revealed the presence of three distinct GRSPaV variants in a single grapevine
Out of a total of 2,866,388 reads initially obtained from the SK704 grapevine plant, 90% reads were of quality at least 30 (≥ Q30). The mean length of analysed reads was calculated to 129.1 bp (minimum length 32 bp, maximum length 200 bp, average length 116 bp) allowing de novo assembly of a total of 219 contigs.
Mapping of reads enabled the reconstitution of nearly complete genomes of three different GRSPaV variants (referred to as GRSPaV-SK704-A, SK704-B, and SK704-C). While 2,042 mapped reads covered 87% of the genome of GRSPaV-SK704-A, approximately 3-fold higher number of reads (5,787 reads and 6,155 reads), covered ca. 97% of the SK704-B and SK704-C genomes, respectively. Additionally, 39,495 reads allowed the generation of nearly complete genome coverage (99.2%) for Grapevine Pinot gris virus (deposited to the GenBank under accession number KU949328), the only virus besides GRSPaV identified in vine SK704 (data not shown).
The accuracy of the NGS-based sequences and the presence of three GRSPaV variants was unambiguously confirmed by re-sequencing considerable portion of genomes by variant-specific RT-PCRs including both 5′ and 3′ termini (Supplementary Table 2).
The determined full-length sequences of SK704 variants consists of 8724 nts (SK704-B) or 8725 nts (SK704-A, SK704-C), respectively. This length difference is due to one nt deletion in the SK704-B 3′ untranslated region (UTR, at nt position 8709) as compared to two other variants. The complete genome comparisons showed that the three GRSPaV variants from SK704 grapevine were almost equidistant as their mutual sequence identities were in the range of 76.5-77.6%. The most conserved parts among 3 variants include the portion spanning 5′UTR-start of ORF1 (nt 1-123) and the 3′UTR (nt 8210-8376 and nt 8594-8725), all showing the nt divergence of 6% to 7%. On the contrary, the most variable genome part among three variants was a central region within the ORF1, approximately between nt positions 1400-2380, with the divergence more than 30%, with a hypervariable stretch of 100 nt (nt position 2077-2176) where differences among variants reached differences up to 67.6% (Fig. 1). Similar pattern of variability along the genome was noted when all 18 complete sequences (14 from GenBank and 4 from this work) were analysed (data not shown). This hypervariable region (HVR) is located within ORF1 and putatively codes for a protein sequences located between methyltransferase and AlkB-like domains in the viral replicase polyprotein (Meng and Gonsalves, 2007).
The organisation and length of the three variants from vine SK704 was similar with previously characterised GRSPaV isolates. Curiously, an additional putative ORF of unknown function, referred to as ORF6 by Zhang et al. (1998), was identified by computer analyses in variants SK704-B (nt 8227-8583) and SK704-C (nt 8227-8586), but not in SK704-A.
Single variant GRSPaV infection of SK30 plant
The sequencing of multiple clones of several RT-PCR fragments encompassing the genome of GRSPaV from plant SK30 along with partial NGS-based data revealed the presence of a unique genome variant.
The determined full-length GRSPaV SK30 sequence consists of 8724 nts. Pairwise comparisons with previously completely sequenced GRSPaV genomes revealed nt identities ranging from 77.3% (isolate GRSPaV-LSL) up to 97.2% (isolate GRSPaV-BS). Similarly, the deduced GRSPaV-SK30 aminoacid sequences showed the lowest divergence with the corresponding products deduced from the GRSPaV-BS genome, being 1.5%, 2.7%, 0%, 2.5%, and 1.9% for ORFs1-5, respectively. All domains characteristic of Foveavirus polypeptide encoded by ORF1 (metyltransferase, cysteine protease, peptidase, helicase, and RdRp) were conserved in the GRSPaV-SK30 at the positions reported for the isolate GRSPaV-MG as described previously (Morelli et al., 2011).
Phylogenetic and recombination analyses of fulllength GRSPaV sequences
Comparative analysis of 18 full-length genome sequences (4 generated sequences in this study and 14 sequences retrieved from GenBank) showed that GRSPaV isolates cluster into 4 major phylogenetic lineages supported by a high bootstrap value. Group 2 further was divided into three subgroups (Fig. 2). The GRSPaV-SK30 isolate groups with the Canadian BS isolate in the subgroup 2c. The three GRSPaV-SK704 variants were placed into three distinct lineages 1, 2a, and 3. The full-length genome phylogenetic analysis further confirmed the absence of any geographical origins of GRSPaV (Alabi et al., 2010; Hu et al., 2015; Meng et al., 2006).
A split decomposition analysis (Huson and Bryant, 2006) was performed to obtain a network structure of phylogenetic relationship. The analysis revealed the formation of a reticulate network structure and conflicting phylogenetic signals, suggestive of the putative recombination within GRSPaV population (Supplementary Fig. 1).
Therefore, four Slovak GRSPaV sequences, together with 14 previously reported genomes, were aligned and analyzed using the recombination detection program (RDP4; Martin et al., 2015). To exclude unreliable signals, we selected recombination events supported by at least 6 different methods with an associated P-value of < 1.0 × 10−6.
The analysis identified strongly supported recombination signals in 3 GRSPaV genomes (Fig. 3). In case of the Tannat isolate (Jo et al., 2015), the recombination involved the ORF1 (a part of the genome encoding the viral helicase domain). Putative recombination events in two other isolates, involved the 3′ part of the genome. The recombination event discovered in the isolate GRSPaV-BS (Meng et al., 2005) is characterized by the recombination breakpoints located at the 3′ terminus of ORF1 (nt position 6292), while breakpoint in a similar recombination event revealed in GRSPaV-SK30 was located in the intergenic region between ORF1 and ORF2 (nt position 6567). It is interesting that two latter isolates belong to clade 2c and involve the same putative parent isolates: GRSPaV-GG (Meng et al., 2013) and GRSPaV-704-A (this study).
Distribution of genetically distinct variants in the SK704 grapevine plant
The presence of 3 genetically distant GRSPaV variants in SK704 grapevine was revealed by NGS analysis of total RNAs isolated from a pooled leaf sample. To examine more in detail distribution of each of the three GRSPaV variants (SK704-A, -B, and -C) in the source grapevine plant, 3 sets of variant-specific primers were designed from the most variable part of the genome (Fig. 1) and repeatedly checked for the absence of any cross-reactivity. The variant-specific detection was performed by testing ten tissue samples collected from different coordinates in the canopy at three different timepoints representing distinct phenological stages/phases, i.e., leaf development, development of fruits and dormancy for a total of 30 samples.
The most frequently detected GRSPaV variant in all three timepoints was GRSPaV-SK704-B. It was detected in 28 out of 30 samples tested. Curiously, the two samples (704/6 and 704/27) that did not contain GRSPaV-704-B variant were also deprived from other two variants. The variant GRSPaV-SK704-B was the only variant detected in 10 tested samples. Interestingly, these results do not correlate with the NGS data that have shown the highest proportion of GRSPaV-SK704-C mapped reads.
The GRSPaV-SK704-A and -C variants were detected in 2-7 samples per timepoint, respectively, with GRSPaV-Sk704-A being the least frequent among the three studied variants. Curiously, all three studied variants could be detected in only 20% of sampled tissue (Fig. 4).
Discussion
In this work we demonstrated different scenarios of natural GRSPaV infections in grapevines using two plants collected in western Slovakia as a study model. While accession SK30 was infected by a single variant (GRSPaV-SK30), vine SK704 contained mix of three distinct virus variants denominated GRSPaV-SK704-A, GRSPaV-SK704-B, and GRSPaV-SK704-C. Curiously, the three isolates co-infecting the same plant (SK704) were mutually distinct as much as they differed from GRSPaV-SK30 and they belonged to different evolutionary lineages among population of fully sequenced isolates of the virus.
The presence of highly diverse molecular variants in a single plant SK704 might be a result of multiple and independent introductions of distinct GRSPaV isolates into the commercial vines due to grafting practices commonly applied over the past century in modern viticulture, as proposed by Meng and Gonsalves (2007). Therefore, the use of different rootstock (different species, hybrids) and scion (different variety) sources, infected by distinct GRSPaV variants, along with its possible pollen transmission which needs confirmation (Rowhani et al., 2000) are more plausible reasons for the presence of mixture of several distinct viral variants in vine SK704 than a long-term accumulation of point mutations due to intrinsically errorprone viral RNA-dependent RNA polymerase.
The four genetic variants described in this work were molecularly distinct as they shared in average only 75% nt among each other and, as ascertained in phylogenetic analyses, belonged to different evolutionary lineages among population of fully-sequenced GRSPaV isolates. Although several previous studies have documented an intra-host genetic diversity and occurrence of a mix infection of two or more GRSPaV molecular variants in a single plant (Meng et al., 1999, 2006), our results provide more in depth data that are based upon the full-length genome sequences of Slovak isolates.
In silico analyses revealed that 3 out of 18 fully sequenced GRSPaV variants are product of recombination events suggesting that the exchange of genetic material might be a major driving force behind its evolution. Interestingly, two out of four Slovak variants described in this work resulted prone to recombination and were involved in the events, either as a donor (GRSPaV-SK704-A) or as a product (GRSPaV-SK30).
Study of spatio-temporal distribution in planta of the three different genetic variants of GRSPaV showed remarkable differences among virus genotypes. While the variant GRSPaV-SK704-B prevailed among the three and was almost uniformly distributed in the host (detected in 28/30 tested tissues), the presence of GRSPaV-SK704-A and GRSPaV-SK704-C was rather erratic and uneven. Variant SK704-C was present in 53% samples, while GRSPaV-Sk704-A was detected in only 30% of the tissue collected/tested. Triple infections were detected in 20% of tested material, the same rate as single infections (all by GRSPaV-SK704-B). Therefore, our study showed significant differences in distribution of some of variants underlying the importance of proper sampling technique and use of composed tissue in order to avoid biased results as pointed out in previous studies concerning survey of viruses in perennial crops (Gambino et al., 2010; Komínek et al., 2009).