UCR :: IIGB :: CEPCEB

ChemMine Tutorial

  1. General Functionality   [Demo Data]

    ChemMine is an integrated database that consists of a compound mining environment, a cheminformatic workbench and a screening database. The main functionalities of the three database components are summarized here:

    Compound Database: [Annotation Search]   [Structure Search]

    ChemMine's compound database provides access to over 6,200,000 compounds from a wide variety of bioactive, natural and screening compound sources from public and commercial providers. A detailed list of all available compound sets is available on the Compound Source Page. Their structures and functional annotations can be searched by chemical properties, substructure matches, structural similarities and biological activities.

    Cheminformatics Workbench: [Start Analysis]

    In addition to a comprehensive information retrieval system, ChemMine is also a cheminformatics service for analyzing the structural and chemical properties of lead compounds. This online service is available for compounds that are represented in the database and those provided by the user. The current set of online analysis tools includes structure-based clustering of compounds, generation of chemical descriptors, and various viewing and reformatting functionalities. To efficiently share the developed informatics resources with the community, the ChemMine project uses exclusively open access and open source technology. Users who are mainly interested in compound analysis tools may want to try the ChemmineR package.

    Screening Database: [Browse Screens]

    The recently added public screening database is a versatile publication and management system for diverse compound bioactivity and screening data. It supports any type of annotation information, like tables, images, etc. A detailed tutorial for uploading and managing screening data in ChemMine is available on the Screen Upload tutorial.

    ChemMine flow

    Data resources and web services offered by ChemMine

  2. Searching ChemMine   [Demo Data]

    1. Annotation Searches Demo Data

      The Annotation Search page provides access to fast full-text and field-specific searches of all annotation data associated with compounds. By utilizing the Library selection menu on the top of the page users can delimit their searches to specific compound libraries. The Field menu allows to execute the following search types for query strings provided in the Search field:

      All
      This option allows powerful full-text searches against all annotation fields in the database. To test this search function, the term 'herbicide' is a good sample query.

      Compound ID
      Compound ID queries in single or batch mode are possible by providing as many compound IDs as required. Space or line separated formats are supported. If known, then the proper compound set should be selected in the Library menu. [ Demo Set ]

      Physicochemical Properties
      To search compounds based on their physicochemical properties, the Property (JOELib) field needs to be selected. The following comparison operators are supported to select exact values or ranges of property values: >, < and =. A short description of the available property descriptors and their acronyms is available in the JOELib section of this tutorial. For example, the query "LGP > 0.4 AND LGP < 0.5" will return all compounds in a library with a logP value (octanol/water partition coefficient) between 0.4 and 0.5. In addition, it is possible to use different descriptors in the same query (much slower!). For instance, the query "LGP > 0.4 AND N > 2" will return compounds that have a logP of greater than 0.5 and at least 2 N atoms.

      Plate and Well Locations
      Screening compounds often have plate and well locations. To search them, a second Search field needs to be opened by clicking the Add Field button. For example: select 'Sigma-TimTec Myria Screen' in the library field, type '1' in the plate field and 'A03' in the well field. This query will return the compound that matches the requested plate and well location.

      In addition to selecting the search methods, user can choose how the results of a query should be presented by selecting one of follwoing three options:
      Standard View
      Provides a table of the query results with links to detailed compound annotation pages.

      Extended View
      Provides the information of the standard view along with the corresponding compound structure images.

      JOELib View
      Provides for all compounds of a query result a sortable table of their JOELib property values.
    2. Similarity Searches Demo substructure search Demo similarity search

      Structure similarity searches are the most important functionality for compound queries in databases. The following search functions are available for exploring the chemical space in ChemMine efficiently. Query structures for searching can be provided in SMILES or SDF formats. Alternatively, they can be generated by drawing a query molecule with the available JME Molecular Editor from Peter Ertl.

      1. Substructure and Superstructure Searches

        They allow the retrieval of all those molecules in a database that contain a user-defined query substructure, irrespective of the structural environment. On the returned result pages the queried substructure is highlighted in color in the matching molecules.

      2. Similarity Searches

        They are an alternative approach for finding similar molecules in databases. In contrast to substructure searches, this approach can retrieve molecules with similarities to a query structure without depending on perfect matches. The generated similarity scores allow ranking of the retrieved molecules based on their degree of similarity to a query structure (nearest neighbor output). An improved 2D fragment-based algorithm from Chen & Reynolds (2002) is implemented in ChemMine. It can use either atom pairs or atom sequences as structural descriptors, and uses the Tanimoto coefficient as similarity measure (Willett et al., 1998). The C++ and R implementations of this search program are available from the Software Download page.

  3. Compound Annotation Pages

  4. The query results in ChemMine are structured into two different levels: an initial 'Result List' page and a more detailed 'Compound Annotation' page. The initial 'Result List' pages are for batch viewing and selection purposes, while the Compound Annotation pages provide much more detailed information for individual compounds. These include the following annotation fields and download options:

    • Color images of the compound structures.
    • Interactive 3D viewing of compounds with Jmol (click '3D View').
    • Download of the compound structures in SDF, SMILES, MOL, InChI and other formats.
    • Query for compounds with similar structures by clicking on the link 'find similar'.
    • Identification of duplicates across the entire database by clicking 'View xx duplicates'.
    • The physicochemical property descriptors from JOELib.
    • The available screening data are provided at the bottom of the page. They can be viewed in all details by clicking the table rows and the expansion buttons. The provided confidence scores of the screening data consist of 5 integer values reaching from 0 to 5. The value 0 represents inactive compounds, and the values 1 through 3 are assigned to compounds that show activity in the primary, the secondary screen and/or all follow-up experiments, respectively. The value 4 is assigned to compounds for which a target site has been identified, whereas the value 5 is assigned to compounds which are selective for the identified target site.
  5. Online Analysis Tools

  6. Several online analysis tools are available on ChemMine's Workbench for analyzing the structural and chemical properties of compounds. To utilize them, users can retrieve compounds from the database and send them interactively to the analysis page of the interface. Alternatively, users can provide their own compound structures to these analysis tools.

    1. Similarity Workbench [ Demo Similarity Comparison ]

      Similarity scores between compound pairs can be computed on the Similarity Workbench. The interface calculates atom pair and maximum common substructure (MCS) similarities with the Tanimoto coefficient as similarity measure (Chen & Reynolds, 2002; Cao et al. 2008). The MCS tool allows to identify the best substructures that two compounds have in common.

    2. Compound Viewing and Format Interconversions [ Demo Structure Formats ]

      This utility allows users to provide their own compounds in SDF or SMILES format, view the compound structure images in batches and pass them on to the other online services (see below). For reformatting purposes, the compounds can be saved in SMILES or SDF formats.

    3. Descriptor Generation [ Demo Descriptor Generation ]

      Molecular descriptors provide quantitative information about chemical properties of compounds. They can be very useful for prioritizing lead compounds, property clustering and basic QSAR analyses. Over 40 different molecular descriptors are currently provided by the ChemMine interface either for custom compounds or those contained in the database. The JOELib package is used for their calculation.

    4. Structure-Based Clustering [ Demo Clustering ]

      Clustering of compounds by structural similarities is another powerful approach for correlating structural features of compounds with their activities. ChemMine provides facilities for hierarchical clustering and binning clustering by similarity cutoffs. The required distance matrices for hierarchical clustering are calculated by all-against-all comparisons of compounds using atom pair similarity measures (see above) and transforming the generated similarity scores into distance values. The resulting trees are presented on the web interface in interactive mode using an internally developed tree viewing program. The compound indentifiers in the trees are hyperlinked to the corresponding stucture images of the compounds. To simplify the analysis of this output, the compound structures are sorted in the same order as they appear in the tree.

  7. Viewing and Upload of Screening Data

  8. ChemMine's screening database is a versatile publication and management system for diverse compound bioactivity and screening data. It supports any type of annotation information, like tables, images, etc. The database provides users with several possibilites to access the screening data. These include the following possibilities:

    1. Browse Screening Data

      To obtain an overview of the available screening data sets, users can browse and search the screens by clicking the titles of the screens on this overview table. The available search field on the same page allows full-text searching in all screening data sets.

    2. Search Screening Data

      All screening data can be searched in ChemMine using full-text searches on the Annotation Search page or structure similarity searches on the Similarity Search page.

    3. Upload Screening Data

      A detailed tutorial for uploading and managing screening data in ChemMine is available on the Screen Upload page. To upload screens, users are required to create a ChemMine account. This registration is necessary to communicate with users during the upload, approval and curation steps of new screening data sets.