ResearchPad - Computer Graphics and Computer-Aided Design https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[A Case of Isolated Congenital Left Ventricular Diverticulum with Acute Myocarditis]]> https://www.researchpad.co/product?articleinfo=5adcaada463d7e041fb478f7 ]]> <![CDATA[vSDC: a method to improve early recognition in virtual screening when limited experimental resources are available]]> https://www.researchpad.co/product?articleinfo=5989da27ab0ee8fa60b80f24

Background

In drug design, one may be confronted to the problem of finding hits for targets for which no small inhibiting molecules are known and only low-throughput experiments are available (like ITC or NMR studies), two common difficulties encountered in a typical academic setting. Using a virtual screening strategy like docking can alleviate some of the problems and save a considerable amount of time by selecting only top-ranking molecules, but only if the method is very efficient, i.e. when a good proportion of actives are found in the 1–10 % best ranked molecules.

Results

The use of several programs (in our study, Gold, Surflex, FlexX and Glide were considered) shows a divergence of the results, which presents a difficulty in guiding the experiments. To overcome this divergence and increase the yield of the virtual screening, we created the standard deviation consensus (SDC) and variable SDC (vSDC) methods, consisting of the intersection of molecule sets from several virtual screening programs, based on the standard deviations of their ranking distributions.

Conclusions

SDC allowed us to find hits for two new protein targets by testing only 9 and 11 small molecules from a chemical library of circa 15,000 compounds. Furthermore, vSDC, when applied to the 102 proteins of the DUD-E benchmarking database, succeeded in finding more hits than any of the four isolated programs for 13–60 % of the targets. In addition, when only 10 molecules of each of the 102 chemical libraries were considered, vSDC performed better in the number of hits found, with an improvement of 6–24 % over the 10 best-ranked molecules given by the individual docking programs.

Graphical abstract

In drug design, for a given target and a given chemical library, the results obtained with different virtual screening programs are divergent. So how to rationally guide the experimental tests, especially when only a few number of experiments can be made? The variable Standard Deviation Consensus (vSDC) method was developed to answer this issue. Left panel the vSDC principle consists of intersecting molecule sets, chosen on the basis of the standard deviations of their ranking distributions, obtained from various virtual screening programs. In this study Glide, Gold, FlexX and Surflex were used and tested on the 102 targets of the DUD-E database. Right panel Comparison of the average percentage of hits found with vSDC and each of the four programs, when only 10 molecules from each of the 102 chemical libraries of the DUD-E database were considered. On average, vSDC was capable of finding 38 % of the findable hits, against 34 % for Glide, 32 % for Gold, 16 % for FlexX and 14 % for Surflex, showing that with vSDC, it was possible to overcome the unpredictability of the virtual screening results and to improve them

Electronic supplementary material

The online version of this article (doi:10.1186/s13321-016-0112-z) contains supplementary material, which is available to authorized users.

]]>
<![CDATA[Asymmetric transfer hydrogenation of imines and ketones using chiral Ru(II)Cl(&#951;6-p-cymene)[(S,S)-N-TsDPEN] catalyst: a computational study]]> https://www.researchpad.co/product?articleinfo=5989d9fbab0ee8fa60b71f62 ]]> <![CDATA[Combination of fingerprints and MCS-based (inSARa) networks for Structure-Activity-Relationship analysis]]> https://www.researchpad.co/product?articleinfo=5989d9ecab0ee8fa60b6cab0 ]]> <![CDATA[Inferring multi-target QSAR models with taxonomy-based multi-task learning]]> https://www.researchpad.co/product?articleinfo=5989daa6ab0ee8fa60ba798e

Background

A plethora of studies indicate that the development of multi-target drugs is beneficial for complex diseases like cancer. Accurate QSAR models for each of the desired targets assist the optimization of a lead candidate by the prediction of affinity profiles. Often, the targets of a multi-target drug are sufficiently similar such that, in principle, knowledge can be transferred between the QSAR models to improve the model accuracy. In this study, we present two different multi-task algorithms from the field of transfer learning that can exploit the similarity between several targets to transfer knowledge between the target specific QSAR models.

Results

We evaluated the two methods on simulated data and a data set of 112 human kinases assembled from the public database ChEMBL. The relatedness between the kinase targets was derived from the taxonomy of the humane kinome. The experiments show that multi-task learning increases the performance compared to training separate models on both types of data given a sufficient similarity between the tasks. On the kinase data, the best multi-task approach improved the mean squared error of the QSAR models of 58 kinase targets.

Conclusions

Multi-task learning is a valuable approach for inferring multi-target QSAR models for lead optimization. The application of multi-task learning is most beneficial if knowledge can be transferred from a similar task with a lot of in-domain knowledge to a task with little in-domain knowledge. Furthermore, the benefit increases with a decreasing overlap between the chemical space spanned by the tasks.

]]>
<![CDATA[Reliable estimation of externally validated prediction errors for QSAR models]]> https://www.researchpad.co/product?articleinfo=5989da8eab0ee8fa60b9ee48 ]]> <![CDATA[The influence of negative training set size on machine learning-based virtual screening]]> https://www.researchpad.co/product?articleinfo=5989daddab0ee8fa60bbab50

Background

The paper presents a thorough analysis of the influence of the number of negative training examples on the performance of machine learning methods.

Results

The impact of this rather neglected aspect of machine learning methods application was examined for sets containing a fixed number of positive and a varying number of negative examples randomly selected from the ZINC database. An increase in the ratio of positive to negative training instances was found to greatly influence most of the investigated evaluating parameters of ML methods in simulated virtual screening experiments. In a majority of cases, substantial increases in precision and MCC were observed in conjunction with some decreases in hit recall. The analysis of dynamics of those variations let us recommend an optimal composition of training data. The study was performed on several protein targets, 5 machine learning algorithms (SMO, Naïve Bayes, Ibk, J48 and Random Forest) and 2 types of molecular fingerprints (MACCS and CDK FP). The most effective classification was provided by the combination of CDK FP with SMO or Random Forest algorithms. The Naïve Bayes models appeared to be hardly sensitive to changes in the number of negative instances in the training set.

Conclusions

In conclusion, the ratio of positive to negative training instances should be taken into account during the preparation of machine learning experiments, as it might significantly influence the performance of particular classifier. What is more, the optimization of negative training set size can be applied as a boosting-like approach in machine learning-based virtual screening.

]]>
<![CDATA[Structured chemical class definitions and automated matching for chemical ontology evolution]]> https://www.researchpad.co/product?articleinfo=5989db16ab0ee8fa60bcd370 ]]> <![CDATA[Development of target focused library against drug target of P. falciparum using SVM and Molecular docking]]> https://www.researchpad.co/product?articleinfo=5989d9f6ab0ee8fa60b7053a ]]> <![CDATA[Probing the impact of protein and ligand preparation procedures on chemotype enrichment in structure-based virtual screening using DEKOIS 2.0 benchmark sets]]> https://www.researchpad.co/product?articleinfo=5989db43ab0ee8fa60bd7642 ]]> <![CDATA[DecoyFinder, a tool for finding decoy molecules]]> https://www.researchpad.co/product?articleinfo=5989d9f6ab0ee8fa60b701ba ]]> <![CDATA[Exploring and cataloguing the substrate space of prenyltransferases: automatic generation of SMARTS]]> https://www.researchpad.co/product?articleinfo=5989da75ab0ee8fa60b963c7 ]]> <![CDATA[PubChem: atom environments for molecule standardization]]> https://www.researchpad.co/product?articleinfo=5989dab3ab0ee8fa60bac0ab ]]> <![CDATA[Extracting and connecting chemical structures from text sources using chemicalize.org]]> https://www.researchpad.co/product?articleinfo=5989daa8ab0ee8fa60ba81e3

Background

Exploring bioactive chemistry requires navigating between structures and data from a variety of text-based sources. While PubChem currently includes approximately 16 million document-extracted structures (15 million from patents) the extent of public inter-document and document-to-database links is still well below any estimated total, especially for journal articles. A major expansion in access to text-entombed chemistry is enabled by chemicalize.org. This on-line resource can process IUPAC names, SMILES, InChI strings, CAS numbers and drug names from pasted text, PDFs or URLs to generate structures, calculate properties and launch searches. Here, we explore its utility for answering questions related to chemical structures in documents and where these overlap with database records. These aspects are illustrated using a common theme of Dipeptidyl Peptidase 4 (DPPIV) inhibitors.

Results

Full-text open URL sources facilitated the download of over 1400 structures from a DPPIV patent and the alignment of specific examples with IC50 data. Uploading the SMILES to PubChem revealed extensive linking to patents and papers, including prior submissions from chemicalize.org as submitting source. A DPPIV medicinal chemistry paper was completely extracted and structures were aligned to the activity results table, as well as linked to other documents via PubChem. In both cases, key structures with data were partitioned from common chemistry by dividing them into individual new PDFs for conversion. Over 500 structures were also extracted from a batch of PubMed abstracts related to DPPIV inhibition. The drug structures could be stepped through each text occurrence and included some converted MeSH-only IUPAC names not linked in PubChem. Performing set intersections proved effective for detecting compounds-in-common between documents and merged extractions.

Conclusion

This work demonstrates the utility of chemicalize.org for the exploration of chemical structure connectivity between documents and databases, including structure searches in PubChem, InChIKey searches in Google and the chemicalize.org archive. It has the flexibility to extract text from any internal, external or Web source. It synergizes with other open tools and the application is undergoing continued development. It should thus facilitate progress in medicinal chemistry, chemical biology and other bioactive chemistry domains.

]]>
<![CDATA[Using structure- and Ligand-based pharmacophores as filters to discriminate Human Aryl Sulfotransferase 1A1 (SUL1A1) binders into substrates and inhibitors]]> https://www.researchpad.co/product?articleinfo=5989da1eab0ee8fa60b7dd35 ]]> <![CDATA[Protoss: a holistic approach to predict tautomers and protonation states in protein-ligand complexes]]> https://www.researchpad.co/product?articleinfo=5989daecab0ee8fa60bbf83c

The calculation of hydrogen positions is a common preprocessing step when working with crystal structures of protein-ligand complexes. An explicit description of hydrogen atoms is generally needed in order to analyze the binding mode of particular ligands or to calculate the associated binding energies. Due to the large number of degrees of freedom resulting from different chemical moieties and the high degree of mutual dependence this problem is anything but trivial. In addition to an efficient algorithm to take care of the complexity resulting from complicated hydrogen bonding networks, a robust chemical model is needed to describe effects such as tautomerism and ionization consistently. We present a novel method for the placement of hydrogen coordinates in protein-ligand complexes which takes tautomers and protonation states of both protein and ligand into account. Our method generates the most probable hydrogen positions on the basis of an optimal hydrogen bonding network using an empirical scoring function. The high quality of our results could be verified by comparison to the manually adjusted Astex diverse set and a remarkably low rate of undesirable hydrogen contacts compared to other tools.

]]>
<![CDATA[The MoSGrid - e-science gateway: molecular simulations in a distributed computing environment]]> https://www.researchpad.co/product?articleinfo=5989da8aab0ee8fa60b9d940 ]]> <![CDATA[Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets]]> https://www.researchpad.co/product?articleinfo=5989da9aab0ee8fa60ba35a1

Background

While a large body of work exists on comparing and benchmarking descriptors of molecular structures, a similar comparison of protein descriptor sets is lacking. Hence, in the current work a total of 13 amino acid descriptor sets have been benchmarked with respect to their ability of establishing bioactivity models. The descriptor sets included in the study are Z-scales (3 variants), VHSE, T-scales, ST-scales, MS-WHIM, FASGAI, BLOSUM, a novel protein descriptor set (termed ProtFP (4 variants)), and in addition we created and benchmarked three pairs of descriptor combinations. Prediction performance was evaluated in seven structure-activity benchmarks which comprise Angiotensin Converting Enzyme (ACE) dipeptidic inhibitor data, and three proteochemometric data sets, namely (1) GPCR ligands modeled against a GPCR panel, (2) enzyme inhibitors (NNRTIs) with associated bioactivities against a set of HIV enzyme mutants, and (3) enzyme inhibitors (PIs) with associated bioactivities on a large set of HIV enzyme mutants.

Results

The amino acid descriptor sets compared here show similar performance (<0.1 log units RMSE difference and <0.1 difference in MCC), while errors for individual proteins were in some cases found to be larger than those resulting from descriptor set differences ( > 0.3 log units RMSE difference and >0.7 difference in MCC). Combining different descriptor sets generally leads to better modeling performance than utilizing individual sets. The best performers were Z-scales (3) combined with ProtFP (Feature), or Z-Scales (3) combined with an average Z-Scale value for each target, while ProtFP (PCA8), ST-Scales, and ProtFP (Feature) rank last.

Conclusions

While amino acid descriptor sets capture different aspects of amino acids their ability to be used for bioactivity modeling is still – on average – surprisingly similar. Still, combining sets describing complementary information consistently leads to small but consistent improvement in modeling performance (average MCC 0.01 better, average RMSE 0.01 log units lower). Finally, performance differences exist between the targets compared thereby underlining that choosing an appropriate descriptor set is of fundamental for bioactivity modeling, both from the ligand- as well as the protein side.

]]>
<![CDATA[In Silico target fishing: addressing a &#8220;Big Data&#8221; problem by ligand-based similarity rankings with data fusion]]> https://www.researchpad.co/product?articleinfo=5989dab9ab0ee8fa60badd5a

Background

Ligand-based in silico target fishing can be used to identify the potential interacting target of bioactive ligands, which is useful for understanding the polypharmacology and safety profile of existing drugs. The underlying principle of the approach is that known bioactive ligands can be used as reference to predict the targets for a new compound.

Results

We tested a pipeline enabling large-scale target fishing and drug repositioning, based on simple fingerprint similarity rankings with data fusion. A large library containing 533 drug relevant targets with 179,807 active ligands was compiled, where each target was defined by its ligand set. For a given query molecule, its target profile is generated by similarity searching against the ligand sets assigned to each target, for which individual searches utilizing multiple reference structures are then fused into a single ranking list representing the potential target interaction profile of the query compound. The proposed approach was validated by 10-fold cross validation and two external tests using data from DrugBank and Therapeutic Target Database (TTD). The use of the approach was further demonstrated with some examples concerning the drug repositioning and drug side-effects prediction. The promising results suggest that the proposed method is useful for not only finding promiscuous drugs for their new usages, but also predicting some important toxic liabilities.

Conclusions

With the rapid increasing volume and diversity of data concerning drug related targets and their ligands, the simple ligand-based target fishing approach would play an important role in assisting future drug design and discovery.

]]>
<![CDATA[Entropy gain due to water release upon ligand binding]]> https://www.researchpad.co/product?articleinfo=5989dad2ab0ee8fa60bb6816 ]]>