Similarity Search Programs

Introduction

This is a package of programs similar to the ones used by ChemMine for similarity searches. Provided are statically linked binaries compiled for use on x86 and x86_64 computers.

System Requirements

Notes

Running the programs

There are a total of 5 programs provided each requiring its own specific set of parameters. These parameters must all be provided in order
(dbdir) :directory that was created in step 1
(id list file) :file with list identification numbers
(sdf dir) :directory where all sdf file were placed
(setname) :database name that you want to give to this group of compounds
(dbtype) :type of database that is to be created this can only be 1 or 2.
1 for atom pair 2 for atom sequence
(create) :run initial setup of database can only be 0 or 1
0 to not run initial setup
1 to run initial setup the first time you create a database you should set this variable to 1
(sdf file) :an sdf file containing the query compound
(cutoff) :program will only return scores higher than the number provided here
(sort) :1 to sort the results 0 for unsorted
(setx) :set names to search
(smiles file):a file with the query smiles string

Example

In the example subdirectory contains sample data that this example will follow.
The files myset and myset1 contains a list of identification numbers for assigned for the compounds
The sdf directory contains sdf files that are used
  1. Create a directory for the database

    • This can be simply done by using the mkdir command in linux. This example assumes that there is a empty directory named db in the current directory.
  2. Creating a database

    Initialize the database and create the first compound set with the descriptor_gen and load_smi programs.

    • To generate the data for a compound set execute: "./descriptor_gen db/ example/myset example/sdf/ myset 1 1"
      This will generate a atom pair for the compound set defined by the file myset and name this set myset
    • To generate atom sequence data for the compound set execute: "./descriptor_gen db/ example/myset example/sdf/ myset 2 0"
      Note that the create parameter given to the program is now set as 0
    • To generate the smiles data for the compound set execute: "./load_smi db example/myset example/sdf/ 0"
  3. Adding another compound set to the database

    This step is similar to the previous steps taken in creating the database. The following commands are executed
    • ./descriptor_gen db/ example/myset1 example/sdf/ myset1 1 0
    • ./descriptor_gen db/ example/myset1 example/sdf/ myset1 2 0
    • ./load_smi db example/myset1 example/sdf/ 0
    Notice that for this step the create paramter are all set to 0 since the database has already been created
  4. Searching the database

    Similarity searches are done by executing the descriptor_compare program.
    To do a similarity search with a compound against set myset execute:
    • ./descriptor_compare db/ example/sdf/1.sdf 1 0.3 1 myset
    Same thing but search both myset and myset1
    • ./descriptor_compare db/ example/sdf/1.sdf 1 0.3 1 myset myset1
    Use atom sequence instead of atom pairs
    • ./descriptor_compare db/ example/sdf/1.sdf 2 0.3 1 myset
    Substructure searches with the substructure program.
    • ./substructure db/ example/test.smi myset
download x86 program
download x86_64 program