Searching Protein Structure Space
(SPSS)
Start date: May 1, 2008,
End date: Nov 13, 2012
PROJECT
FINISHED
"Finding the structural neighbors of a protein is an essential task in computational structural Biology. In cases where there is no detectable sequence similarity, the identification of structural neighbors offers a powerful approach to predicting structure and function. State-of-the-art protein structural searches find structural neighbors by comparing a protein to all proteins in their database. As this is an expensive computation that depends directly on the size of the database, such searches consider only a representative subset of the Protein Data Bank (PDB). The PDB is growing dramatically to include many structures of uncharacterized function solved by high-throughput methods developed by the Structural Genomics (SG) initiative. Characterizing these structures, and addressing questions raised by the improved coverage of structure space, mandates better structure search tools. I propose to develop a search tool for protein structure space that is analogous to web search tools such as Google. The system is designed to be fast and interactive, to cover the whole data set, and to have a clear and simple interface so that it can serve as a navigation interface to structure space. For this, I propose adapting a data structure called an inverted index, which is used for fast retrieval in web search. Protein structures are described as strings of letter strings based on a structural alphabet, and placed in an inverted index. Then, given a query structure and its string description, one can quickly retrieve a short list of candidate structural neighbors. Similar to web search, I suggest using query expansion to improve the retrieval performance. The proposal also includes an application of structural search for comparing the structural novelty of contributions from different SG centers. This project will enable me to establish a new Computational Biology research team at my host institution: it preeminently fulfills the goals of the Work Program."
Get Access to the 1st Network for European Cooperation
Log In