Ligand recognition or docking is the problem of understanding and modeling the interaction between an enzyme or receptor protein and a smaller molecule called the ligand. The interaction is perhaps best described as a lock and key interaction between the protein and the bound ligand.
Since the first 3D structure of a protein was determined by X-ray crystallography in 1964 various techniques for predicting ligand binding to proteins have been proposed. Especially in the 80:s, with the development of more efficient computer systems, new software for the purpose of ligand docking has been developed. One of the better known programs in this field is DOCK [409] . Most of the programs fit ligands to a protein using the 3D structure of the molecules. The reliability of the software is however not to good.
In our approach only sequence alignments for protein superfamilies are used to generate rules for classifying the sequences according to their binding of the endogenous ligands. The rules identify the amino acid residues in the sequences that are important for ligand binding in the proteins or the residues giving the receptor its characteristic.
G-protein coupled receptors are present in all mammalian cells and consists of seven hydrophobic segments spanning the cell membrane. In fact, the only common theme to all proteins in the G-protein coupled receptor superfamily is the presence of seven hydrophobic trans-membrane segments separated by polar loops. In addition they function in a similar manner, namely to transfer a signal from the outside of the cell to the inside of the cell.
Figure: of the G-protein coupled receptor superfamily. Numbers in
brackets indicate the number of gene sequences for the different
classes in the human genome.
The G-protein coupled receptor superfamily (see Figure
) is
hierarchical in the sense that the receptors have evolved throughout time giving
rise to new proteins with different binding characteristics simultaneously as
the ligands have evolved. The terms monoamine and non-monoamine receptors come
from the fact that part of the receptor population binds ligands containing one
positively charged amino group while the other receptors in the superfamily
binds ligands without amino groups. The further sub-classification of the
receptor families is also based on their ligand binding properties. This binding
characteristics of the receptors is encoded in the amino acid sequences. In this
approach we determine which combination of amino acids in the cavity forming
region of the protein is responsible for ligand binding of a certain type of
receptor.
Figure: of 18 G-protein coupled receptors. The sequences in this
subset can be classified into Monoamine receptors and Non-monoamine
receptors based on their ligand binding.
The examples of monoamine sequences can be further subclassified into
Adrenergic receptors, dopaminergic receptors, serotonin receptors and
acetylcholine receptors. The non-monoamine receptors are exemplified
by opsines.
The amino acid sequences for the superfamily can be aligned to each other
according to the physical chemical properties of the amino acids (see
Figure
). This in the majority of cases results in alignment of
protein segments of both topological and evolutionary correspondence. In the
alignment we have used [446] there are 477 receptor sequences
divided into 42 families, each of which has its own endogenous ligand. The
receptor families can be further divided into subtypes that bind the same
endogenous ligand but bind with different characteristics. Receptor sequences
are also available from many species. Both subtype and species variants, though
binding the same endogenous ligand, can differ in pharmacological profile as
determined typically through the testing of man-made ligand analogues. For quite
a number of receptors discovered by gene cloning the endogenous ligand remains to
be identified.
We have used from the alignment only those sequences with a known ligand. Furthermore one Opioid receptor was removed from consideration due to their inconsistent amino acid sequences. With the completion of the Human genome project a large set of sequences will be available for analysis. The sequences can easily be classified into superfamilies and with our approach also the function can be addressed without time-consuming experimental work.