Introduction
Control of plant diseases is one of the challenges that humanity as a whole faces in ensuring that current and future populations are adequately fed (Strange and Scott, 2005). To address such a challenge, a number of different approaches have been used separately or in combination to prevent or control the plant diseases. Growers, however, often rely heavily on pesticides and other agrochemicals, although great efforts have been put into development and deployment of crop plants that are resistant to plant pathogen(s) (Chandler et al., 2011). Over the last few decades, molecular plant pathologists have strived to pinpoint molecular attributes of pathogens that confer ability to cause disease on their host plant, hoping to provide targets for molecular breeding and discovery of agrochemicals (Boyd et al., 2013; Hentschel et al., 2000).
During the last two decades, the ccomputer aided drug designing (CADD) has become a critical part of the development of novel drugs in pharmaceutical industries (Adam, 2005; Taylor, 2015). However, CADD has not been extensively used toward design/development of chemicals that can control diseases of crop plants. Only a handful of studies have been conducted using CADD in plant pathology (Kandakatla and Ramakrishnan, 2014; Pathak et al., 2016; Soundararajan et al., 2011; Zhou et al., 2015). For example, Yang et al. (2002) successfully screened the novel 2-heteroaryl-4-chromanones with antifungal activity against the rice blast fungus using CADD approach. Xue et al. (2014) modeled the structure of Tps1 (trehalose-6-phosphate synthase 1) as potential drug target in Magnaporthe oryzae and reported the potential chemical inhibitor by screening 400,000 chemical compounds from Molecular Libraries Small Molecule Repository. Liu et al. (2012) reported the modelled structures of Gnt-R like regulators from Xanthomonas axonopodis pv. Citri (Xac) and its binding sites. The homology modeling of 14α-demethylase from Ustilago maydis and the screening of synthetic XF-113 and ZST-4 fungicide lead compounds as novel 14α-demethylase inhibitors were reported by Han et al (2010). Dehury et al. (2013) reported the modelled structure of race-specific bacterial blight disease resistance protein (xa5) against Xanthomonas oryzae. Doucet-Personeni et al. (2001) reported the rational design of acetylcholinesterase inhibitors (insecticides) by combing the common features of organophosphate and carbamate compounds (tacrine and trifluoromethyl ketones). In another work, structures of hydacidin and hydantocidin (pro-herbicides) were combined to design the hybrid adenylosuccinate synthetase inhibitors as a novel herbicides by Hanessian et al (1999). The identification of indenone and salicylamide analogue as trihydroxy naphthalene reductase and scytalone dehydratase respectively is the most prominent example of CADD approach in the design of new agrochemicals (Walter, 2002).
Despite such examples, use of CADD in plant pathology has been significantly lagging behind. This was mainly due to 1) lack of information and training on CADD for researchers in plant pathology-related disciplines and 2) insufficiency of data for accurate modelling and simulation. Recent advances in genomics and bioinformatics, together with explosive increase in genome sequence information, enabled researchers to rapidly identify and test large number of candidate genes and their translational products that are responsible for pathogenesis of plant pathogens (Cairns et al., 2016; Imam et al., 2016;). Some of these candidates include proteins that are secreted into apoplasm or cytoplasm of plants to subvert defense mechanisms and to manipulate metabolism of host plants in favor of pathogens (Deslandes and Rivas, 2012; Franceschetti et al., 2017). In addition, data regarding three-dimensional (3D) structure of such virulence/pathogenicity proteins based on NMR or X-ray crystallography are rapidly increasing (currently PDB holds 128,962 biological macromolecular structures) (Rost and Sander, 1996; Schwede, 2013).
With burgeoning information and technologies, here we propose that the time is ripe for CADD in molecular plant pathology. Even with a personal computer connected to the internet, CADD can be carried out for efficient development of new agrochemicals having novel targets. From this perspective, here we outline basic concepts and overview of CADD (Fig. 1), without going into too much detail, to encourage and guide molecular plant pathologists who want to see translation of their works into chemical control of the diseases. To that end, we provide list of resources that are useful and currently available in drug designing pipeline (Table 1). Last but not least, we provide a case study of CADD targeting pathogenicity factors from selected plant pathogens for demonstration purpose. For those who are complete novice to CADD, terminologies that are going to be used throughout this review are listed and explained in Table 2.
What is CADD and Why Do We Want to Use It?
Drugs are the chemical compounds/molecules that can either activate or inhibit biomolecules, which in turn, promote health and survivability of human. In ancient times, the usage of plant extracts has been the source of treatment for various disease and many people believed that plants might possess protective means against infections disease as it continues to survive with high bacterial density in an environment (Al-Hussaini and Mahasneh, 2009; Cos et al., 2006). In late 1800s, with the key developments in the basic sciences such as identification of bacteria and virus, in-depth knowledge on the chemical substance in the plants has become essential in the treatment of disease (Katiyar et al., 2012). Further advancements in the technologies such as X-ray crystallography, NMR spectroscopy and structural genomics have improved the determination of the exact chemical composition of the compounds and selection of potential drug targets (Hughes et al., 2011; Weigelt, 2010). This gradual advancement in understanding the molecular basis of compound structure and drug targets has channelized the drug discovery pipeline with optimization and testing steps (Duffy et al., 2012). Such traditional drug discovery process includes screening, separation, characterization, and synthesis of the molecules with desired therapeutic activity on cultured cells or animals.
Among the steps in the traditional drug discovery pipeline, screening of large number molecules for the desired activity was one of the major bottlenecks. Although this brute force approach has the advantage in that it does not require prior knowledge of molecules, chances of finding novel drugs are generally very low (low hit rate), making it time-consuming and cost-ineffective (Kapetanovic, 2008; Sliwoski et al., 2013). This is the point at which CADD comes into play. CADD can be defined as the design/discovery of molecules that has strong binding affinity to biomolecular target in a computer-modeling-dependent manner. The fundamental goal of CADD is to predict which molecules among many will bind to a target and if so how strongly they will, using knowledge and information on molecular mechanics and dynamics. The CADD serves as an alternative strategy to experimental screening techniques, greatly reducing the cost, time, and workload for drug development. For example, researchers can significantly decrease the number of compounds that need to be screened without compromising lead discovery, because CADD allows them to skip many compounds that are predicted to be inactive (virtual screening). In addition, CADD can be used for other stages of drug discovery including hit-to-lead optimization of affinity and selectivity. There are two major types of CADD: structure-based drug design (SBDD) and ligand-based drug design (LBDD). We will briefly introduce key concepts of each type of CADD in the following section. In general, the availability of 3D structure information of targets (often proteins) is the key to determining which approaches should be taken.
Target Identification
Target identification is the foremost step in drug discovery pipeline. The properties such as essentiality (role in pathophysiology), specificity (most specific to particular disease/host), druggability (function modification through the binding of small molecules) and selectivity (well defined active site allowing differential binding of small molecules) of a macromolecule are considered as important features for the selection of ideal target. The selection and choice of ideal macromolecule as drug targets by considering the essential criterion will influence the outcome of the drug discovery process (Chandra, 2011). Based on the above mentioned criteria, certain macromolecules that could serve as potential drug targets in plant pathogens are summarized in Table 3.
Structure-Based Drug Design/Discovery (SBDD)
Availability of 3D structure and prior knowledge on biological function(s) of target protein are pre-requisites for SBDD approach. Based on the structure of the target protein, SBDD allows design of candidate drugs that are predicted to bind to the target with high affinity and selectivity (Kalyaanamoorthy and Chen, 2011). Assumption that underlies and justifies SBDD approach is that a molecule’s potential to have desired biological effects for a specific protein relies on its degree of ability to interact with binding sites on that protein.
Homology modelling
If there is no 3D structure information of the target, it may be possible to create homology model (this is called homology modelling) based on primary sequence similarity of the target to homologous proteins, of which 3D structure is empirically known (Fig. 2A). The 3D structure, whether it is experimental or predicted structure, of target protein provides information about chemical environment of the active site(s), enabling researchers to identify ligand(s) (drug or agrochemicals) that can bind to the active site with high affinity and selectivity. One thing that researchers should bear in mind is that, since homology modelling builds the 3D structures of proteins based on template sequences, the accuracy of the built model depends on the choice of template, alignment accuracy and refinement of the model (Rost, 1999). Generally, the models built with the templates exhibiting over 70% identities are considered to be accurate enough for drug discovery applications (Cavasotto and Phatak, 2009).
Molecular docking
Once the model providing chemical environment of active sites is built, protein-ligand interactions can be explored through molecular docking, a method that predicts energetically stable orientation of ligand when it is bound to target protein (Fig. 2B). Degree of stability of interaction between molecules is the key factor to determining biological consequences of the interaction (Durrant and McCammon, 2010). Molecular docking reports two important information: 1) correct conformation of a ligand-target (or ligand-receptor) complex and 2) its binding affinity which represents an approximation of the binding free energy (mathematical methods called scoring functions are used to estimate binding interaction of the protein-ligand complex). More than 30 molecular docking programs are currently available (Huang and Zou, 2010; Ferreira et al., 2015).
Structure-based virtual screening (SBVS)
The search for new chemical compounds as lead molecules is a critical step during the process of drug discovery. Once the target is selected, the small molecules database are selected for virtual screening and their binding interactions with the selected drug target are explored. The top ranked compounds with desired binding interactions are selected as lead compounds for the further steps. Virtual screening is a computational method that evaluates large libraries of compounds and subsequently identifies putative hits (leads) through comparison of 3D structures of ligands with the putative active site of the target (Fig. 2C). In structure-based virtual screening, affinity of the ligand to the target protein is estimated using molecular docking followed by application of scoring function. Virtual screening is extensively automated and fast method in identifying the candidate compounds from a large dataset based on their rank in docking interactions (McInnes, 2007). This allows researchers to focus their resources and efforts on testing compounds likely to have desired activity.
De novo ligand design (DnLD)
Using 3D structure information of the target, ligand can be designed de novo. DnLD can be carried out in both structure-based and ligand-based drug design. These de novo methods are usually carried with the placement of pseudo-molecular probe molecule and then addition of functional groups to satisfy the spatial constraints of target binding site. Also, the molecule will be grown fragment by fragment to occupy the active site of target molecules. Please see de novo ligand design in the next section.
Ligand-Based Drug Designing (LBDD)
In many cases, 3D structure of target protein or its homolog is not available for SBDD approach. This is true in particular for proteins that are present in cell surface or membrane due to their inherent difficulties in protein crystallization. In some cases, the use of unreliable homologous proteins (for example, low sequence identity) for homology modelling can result in high rate of false positive hits. In such situations, researcher can take LBDD. LBDD relies on knowledge of structural and chemical characteristics that molecules must have for binding to the target of interest (Geppert et al., 2010). What LBDD actually does is to build a model (so called, pharmacophore model) based on the knowledge of such molecules binding to the target and, in turn use this model for design of new drug candidates. Alternatively, LBDD can construct predictive, quantitative structure-activity relationship (QSAR).
Pharmacophore modelling
Pharmacophore is an abstract description of minimum, steric and electronic features that are required for interaction of target protein with ligand(s). Inference of pharmacophore using knowledge on a set of ligands (training set) that can bind to the target is called pharmacophore modelling (Fig. 3A). The process in the development of pharmacophore model involves the alignment of multiple ligands (training set), which can determine the essential chemical features that are responsible for their bioactivity. The alignment of these multiple ligands can be achieved by superimposing a set of active molecules. Such superimposed molecules are then transformed into abstract representation of different features. Pharmacophore model explains why molecules of structural diversity can bind to the common sites and have the same biological effects (Yang, 2010).
Ligand-based virtual screening (LBVS)
Once pharmacophore model is built, then researchers can make prediction about whether candidate ligands are likely to bind to the target through comparison to the pharmacophore model. Such process is called ligand-based virtual screening (Fig. 2C). This approach is known to work best in scanning through candidate compounds with desired chemical features from a large, diverse set of chemical libraries (Oprea and Matter, 2004). In a way, LBVS works as a chemical database filters, and therefore can drastically reduce the number of chemical compounds for in vivo and in vitro studies.
Quantitative structure-activity relationship (QSAR)
Hansch and Fujita introduced QSAR method based on the ground works of Hammett and Taft (Hansch and Fujita, 1964; Hansch, 1969). Quantitative structure-activity relationship (QSAR) models are regression or classification models used to predict activities of new chemical compounds based on their physico-chemical properties. In general, QSAR is a regression model where it relates a set of ‘predictor’ variables (X) such as physico-chemical properties and molecular descriptors to the potency of the ‘response’ variable (Y) such as biological activity of the compound. The QSAR summarizes the relationship of molecular descriptors (chemical structures) that describe the unique physico-chemical properties of compound sets of interest with their respective biological activity (Bordás et al., 2003; Goodford, 1985). Using this relationship, QSAR model is used to predict the activity of new compounds. The predictive ability of the QSAR model is dependent on the descriptors that were employed in the model generation (Fig. 3B).
de novo ligand design (DnLD)
This method is generally known as fragment-based drug designing approach in which the novel ligand is built from the scratch using the creativity of the researchers to a very large extent. Owing to the advantage of tailoring the new compounds, DnLD has been effectively employed in both SBDD and LBDD approaches. In case of SBDD, the chemical features of active site in the target are considered to design the novel compounds, whereas in LBDD, the chemical features derived from pharmacophore or QSAR are used for novel compound design (Dey and Caflisch, 2008; Yang, 2010). However, the true behaviour of the chemical compound is in many cases uncertain and thus such lack of empirical evidence is a pitfall of this approach. The combinations of various modifications are made possible with the existence of chemical drawing tools, and the optimizations of the structure are performed with the energy minimization tools. Further, the knowledge on accurate modification that is essential for the ligand in the binding site can be revealed through docking programs (Böhm, 1996). The refinement of ligand molecules is generally an iterative process in which the emergence of optimal ligand with good binding stability and interactions results from convergence criteria.
Drug Discovery - A Case Study
In this section, we provide examples of CADD for plant pathogens. Below we demonstrate how candidate agrochemicals targeting a bacterial pathogen, Pseudomonas syringae and a fungal pathogen, Colletotrichum gloeosporioides can be discovered using concepts and tools of CADD (Fig. 4).
Homology modelling and validation
For Pseudomonas syringae, through literature survey, we selected two enzymes, MurD and MurE ligases, which are involved in biosynthesis of bacterial peptidoglycan, as potential targets for rational drug designing approach (Bratkovič et al., 2008; Feil et al., 2005). For Colletotrichum gloeosporioides, the pelB gene encoding pectate lyase was selected as a target, since it is an important cell-wall-degrading enzyme for pathogenesis (Yakoby et al., 2000). Since the experimental 3D structures of these selected targets are unavailable in the PDB database, we employed theoretical protein modeling strategy. Using the homology modelling tool, Modeler9v9 (Eswar et al., 2006), the homology models of pectate lyase B protein sequence from C. gloeosporioides, and of MurD and MurE protein sequences from P. syringae were built by employing the target-template sequence alignment files (Fig. 4A). The X-ray crystal structures of cedar pollen allergen from Juniperus ashei (PDBID: 1pxz_A Chain) was used as potential template for modelling pectate lyase B protein from C. gloeosporioides. While the crystal structures of MurD (PDBID: 5a5f_A chain) and MurE (PDBID: 1E8C_A chain) from E. coli are used as potential template from PDB database (Berman et al., 2000) for modelling the structures of MurD and MurE proteins from P. syringae. Among the generated models, one with the least RMSD (root-mean-square deviation of atomic positions: measure of the average distance between the atoms of superimposed proteins) value and final energy-minimized model was used for further analysis. The phi and psi angles representing the stereo-chemical parameters of the model (Fig. 4B), the compatibility of a generated 3D structure with its own amino acid sequence, and the regions of the modelled structure that can be rejected at the 95% and 99% confidence intervals were determined through PROCHECK (Laskowski et al., 1993), Verify3D (Eisenberg et al., 1997), and ERRAT (Colovos and Yeates, 1993), respectively, at SAVES structural analysis server (https://services.mbi.ucla.edu/SAVES/).
Candidate ligand selection (lead compounds) and virtual screening
Instead of scanning all available chemical databases, here we employed heuristic approach starting from compounds with known activity. Penicillin (anti-bacterial) and curcumin (anti-fungal) (Fig. 4C) are known inhibitors of MurD and MurE ligases and pectate lyase, respectively (Cho et al., 2006; Tomašić et al., 2012). Their structures were used to generate the various compounds modified with the combinations of halogen elements (Br, Cl, F and I) using ACD Chemsketch (Version 11, Advanced Chemistry Development, Inc., Toronto, ON, Canada, http://www.acdlabs.com/), resulting in 291 and 52 compounds that are similar in their structure to penicillin and curcumin, respectively. These new molecules were subjected to energy minimization by using CHARMm force field (Vanommeslaeghe et al., 2010) and collectively retrieved as 3D structures in SDF (structure data file) format for virtual screening against pectate lyase, MurD and MurE. Virtual screening of these 291 and 52 compounds by using FlexX module of LeadIT suite (Rarey et al., 1996) revealed their binding efficiencies through docking in the predicted binding pockets of modelled proteins. The compounds (2S,5R,6R)-3,3-dimethyl-7-oxo-6-[(2-pyridin-4-ylacetyl)amino]-4-thia-1-azabicyclo [3.2.0]heptane-2-carboxylic acid (hereafter compound-1) similar to penicillin and (1E,6E)-1,7-bis(3,4-dihydroxy-5-methoxyphenyl)hepta-1,6-diene-3,5-dione (hereafter compound-2) similar to curcumin with the best docking score (binding energy) were used for the pharmacophore modelling. Note that here we have employed both SBDD and LBDD approaches.
Pharmacophore modelling and database screening
The pharmachophore model was generated against the compound-1 and compound-2 by using auto pharmachophore generation option in Discovery Studio 2.5 (Accelrys Software Inc., San Diego, USA), which considers the chemical feature types and resulted ten pharmachophore models (hypotheses). The best pharmachophore model was selected based on the high correlation coefficient and lower RMSD (Fig. 4D). The Search 3D Database protocol with default search option implemented in Discovery studio was used for screening against Maybridge database (http://www.maybridge.com/) with various filters such as estimated activity and Lipinski’s rule of five (≤ 5 hydrogen bond donors; ≤ 10 hydrogen bond acceptors; molecular mass < 500 dal and log P (octanol-water partition coefficient) < 5) (Lipinski et al., 1997). The final hit compounds from Maybridge database having good fit scores (> 3) for the compound-1 pharmacophore are PD00533, CD01374, CD04888 and CD01278. While the final hit compounds for compound-2 pharmacophore are CD01278, S10124, HTS05738 and BTB01629 (Fig. 4E) (The IUPAC name of these compounds are given in Supplementary Table 1). These compounds are further used for docking studies against the modelled proteins and found that the compound CD01278 (Fig. 4G) exhibited better binding energies with the three protein models, which significantly implies that this compound might serve as potential lead molecule to control the disease caused by both the P. syringae and C. gloeosporioides.
Success and Limitations of CADD
The advent of robust technologies in the computational methods such as combinatorial chemistry and highthroughput screening methods has accelerated the modern drug discovery process to screen large number of chemical compounds, compared to the conventional drug designing process that is time-taking, laborious and costly (trial and error), and has high risk of failure (Talele et al., 2010). These in silico methods opened the possibility of synthesizing and/or testing novel compounds as drugs. Moreover, the optimization process of lead chemical compounds has allowed medicinal chemists to synthesize the novel compounds with reduced risk of toxicity. Although the conventional methods can successfully yield candidate compounds, the high molecular weight, lipophilicity, toxicity, low absorption, distribution, metabolism excretion and non-specific binding may result in negative consequences in clinical trials. In this aspect, CADD is highly effective, since predictive power of CADD facilitates selection of more promising lead candidates by minimizing time being wasted on dead end compounds.
Despite all these advantages of CADD, its limitations should be noted. Most prominently, accurate simulation of complexity that biological systems have is not possible even with state of art techniques and high computational power. Such limitation is the biggest challenge in CADD and pertains to uncertainties associated with high flexibility of target, conformational changes of proteins, and scarce of experimental data for absorption, distribution, metabolism, and toxicity of compounds (Baig et al., 2016; Singh, 2014).
Conclusion
CADD combines knowledges from diverse disciplines such as chemo-informatics, and genomics with computational algorithms and technologies to enable discovery of new knowledge - discovery/development of new drugs. In this review, we outlined key concepts of CADD and provided application of CADD to discovery/development of potential lead agrochemicals targeting proteins from plant pathogens. Considering current pace at which virulence/pathogenicity factors are uncovered with aid of genomics, CADD will be playing important roles in narrowing widening gaps between our understanding of pathogenesis mechanisms at the molecular level and translation of such knowledge into development of disease control measures and management strategies.
One of the barriers for CADD in gaining considerable momentum in areas other than pharmaceutical industry is probably accessibility of CADD to researchers from other disciplines. Despite increasing efforts, many of CADD tools still lack user-friendly interface, and configuration of overwhelming number of parameters is a daunting task. Therefore, successful use of those tools naturally requires a great deal of expertise. Fortunately, recent design of CADD tools is putting more emphasis on creating tools and algorithms that are accessible to wider range of researchers. Furthermore, it should be noted that there are no fundamentally better techniques, even though wide variety of computational tools are used. From these perspectives, we would like to strongly encourage researchers in molecular plant pathology to adopt these robust technologies in order to combat important crop diseases, using our review as a guide. Tools and strategies that we have used for P. syringae and C. gloeosporioides will be a good starting point toward that end.