next up previous contents
Next: Knowledge representation Up: Ligand Recognition with GAs Previous: GA-based concept learning and

The ligand recognition problem

    Ligand recognition or docking is the problem of understanding and modeling the interaction between an enzyme or receptor protein and a smaller molecule called the ligand. The interaction is perhaps best described as a lock and key interaction between the protein and the bound ligand.

Since the first 3D structure of a protein was determined by X-ray crystallography in 1964 various techniques for predicting ligand binding to proteins have been proposed. Especially in the 80:s, with the development of more efficient computer systems, new software for the purpose of ligand docking has been developed. One of the better known programs in this field is DOCK [409] . Most of the programs fit ligands to a protein using the 3D structure of the molecules. The reliability of the software is however not to good.

In our approach only sequence alignments for protein superfamilies are used to generate rules for classifying the sequences according to their binding of the endogenous ligands. The rules identify the amino acid residues in the sequences that are important for ligand binding in the proteins or the residues giving the receptor its characteristic.

  G-protein coupled receptors are present in all mammalian cells and consists of seven hydrophobic segments spanning the cell membrane. In fact, the only common theme to all proteins in the G-protein coupled receptor superfamily is the presence of seven hydrophobic trans-membrane segments separated by polar loops. In addition they function in a similar manner, namely to transfer a signal from the outside of the cell to the inside of the cell.

   figure22481
Figure: of the G-protein coupled receptor superfamily. Numbers in brackets indicate the number of gene sequences for the different classes in the human genome.


The G-protein coupled receptor superfamily (see Figure gif) is hierarchical in the sense that the receptors have evolved throughout time giving rise to new proteins with different binding characteristics simultaneously as the ligands have evolved. The terms monoamine and non-monoamine receptors come from the fact that part of the receptor population binds ligands containing one positively charged amino group while the other receptors in the superfamily binds ligands without amino groups. The further sub-classification of the receptor families is also based on their ligand binding properties. This binding characteristics of the receptors is encoded in the amino acid sequences. In this approach we determine which combination of amino acids in the cavity forming region of the protein is responsible for ligand binding of a certain type of receptor.

   figure22490
Figure: of 18 G-protein coupled receptors. The sequences in this subset can be classified into Monoamine receptors and Non-monoamine receptors based on their ligand binding. The examples of monoamine sequences can be further subclassified into Adrenergic receptors, dopaminergic receptors, serotonin receptors and acetylcholine receptors. The non-monoamine receptors are exemplified by opsines.


The amino acid sequences for the superfamily can be aligned to each other according to the physical chemical properties of the amino acids (see Figure gif). This in the majority of cases results in alignment of protein segments of both topological and evolutionary correspondence. In the alignment we have used [446]  there are 477 receptor sequences divided into 42 families, each of which has its own endogenous ligand. The receptor families can be further divided into subtypes that bind the same endogenous ligand but bind with different characteristics. Receptor sequences are also available from many species. Both subtype and species variants, though binding the same endogenous ligand, can differ in pharmacological profile as determined typically through the testing of man-made ligand analogues. For quite a number of receptors discovered by gene cloning the endogenous ligand remains to be identified.

We have used from the alignment only those sequences with a known ligand. Furthermore one Opioid receptor was removed from consideration due to their inconsistent amino acid sequences. With the completion of the Human genome project a large set of sequences will be available for analysis. The sequences can easily be classified into superfamilies and with our approach also the function can be addressed without time-consuming experimental work.


next up previous contents
Next: Knowledge representation Up: Ligand Recognition with GAs Previous: GA-based concept learning and

Tommi Rintala
Thu Jul 4 10:59:43 EET DST 1996