Click here to load reader
View
3
Download
0
Embed Size (px)
Modeling of tertiary and quaternary protein structures
by homology
Inauguraldissertation
zur Erlangung der Würde eines Doktors der Philosophie
vorgelegt der Philosophisch-Naturwissenschaftlichen Fakultät der
Universität Basel
von Florian Kiefer
aus Freiburg im Breisgau, Deutschland
Basel, 2012
Genehmigt von der Philosophisch-Naturwissenschaftlichen Fakultät auf Antrag von
Prof. Dr. Torsten Schwede
Prof. Dr. Manuel Peitsch
Basel, den 13.12.2011
Prof. Martin Spiess
Dekan
Attribution-Noncommercial-No Derivative Works 2.5 Switzerland
You are free:
to Share — to copy, distribute and transmit the work
Under the following conditions:
Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).
Noncommercial. You may not use this work for commercial purposes.
No Derivative Works. You may not alter, transform, or build upon this work.
• For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to this web page.
• Any of the above conditions can be waived if you get permission from the copyright holder.
• Nothing in this license impairs or restricts the author's moral rights.
Quelle: http://creativecommons.org/licenses/by-nc-nd/2.5/ch/deed.en Datum: 3.4.2009
Your fair dealing and other rights are in no way affected by the above.
This is a human-readable summary of the Legal Code (the full license) available in German: http://creativecommons.org/licenses/by-nc-nd/2.5/ch/legalcode.de
Disclaimer: The Commons Deed is not a license. It is simply a handy reference for understanding the Legal Code (the full license) — it is a human-readable expression of some of its key terms. Think of it as the user-friendly interface to the Legal Code beneath. This Deed itself has no legal value, and its contents do not appear in the actual license. Creative Commons is not a law firm and does not provide legal services. Distributing of, displaying of, or linking to this Commons Deed does not create an attorney-client relationship.
http://creativecommons.org/licenses/by-nc-nd/2.5/ch/legalcode.de http://creativecommons.org/licenses/by-nc-nd/2.5/ch/deed.en
List of Publications
1-7
1. Bordoli L, Kiefer F, Schwede T. Assessment of disorder predictions in CASP7. Proteins 2007;69 Suppl 8:129-136.
2. Kopp J, Bordoli L, Battey JN, Kiefer F, Schwede T. Assessment of CASP7 predictions for template-based modeling targets. Proteins 2007;69 Suppl 8:38-56.
3. Bordoli L, Kiefer F, Arnold K, Benkert P, Battey J, Schwede T. Protein structure homology modeling using SWISS-MODEL workspace. Nat Protoc 2009;4(1):1-13.
4. Kiefer F, Arnold K, Kunzli M, Bordoli L, Schwede T. The SWISS-MODEL Repository and associated resources. Nucleic Acids Res 2009;37(Database issue):D387-392.
5. Arnold K, Kiefer F, Kopp J, Battey JN, Podvinec M, Westbrook JD, Berman HM, Bordoli L, Schwede T. The Protein Model Portal. J Struct Funct Genomics 2009;10(1):1-8.
6. Berman HM, Westbrook JD, Gabanyi MJ, Tao W, Shah R, Kouranov A, Schwede T, Arnold K, Kiefer F, Bordoli L, Kopp J, Podvinec M, Adams PD, Carter LG, Minor W, Nair R, La Baer J. The protein structure initiative structural genomics knowledgebase. Nucleic Acids Res 2009;37(Database issue):D365-368.
7. Mariani V, Kiefer F, Schmidt T, Haas J, Schwede T. Assessment of template based protein structure predictions in CASP9. Proteins: Structure, Function, and Bioinformatics 2011;79(S10):37-58.
Abstract
The structure of a protein is crucial to understand its function. Despite this importance,
experimentally solved structures are only available for a small portion of the currently known protein
sequences. Comparative or homology modeling is currently the most powerful method used in order
to predict the structure from sequence by the use of homologous template structures. Models,
hence, need to be accurate regarding their three-dimensional coordinates and must represent the
biological active state of the target protein in order to be useful for scientists.
Four goals are pursued in this work in this area of research. Firstly, we increase the coverage of
homology modeling by introducing a method which is able to identify and align evolutionary distant
template structures. The resulting template search and selection procedure is hierarchical. Closely
related template structures are identified accurately and efficiently by standard tools.
A computationally more complex method is invoked in order to identify evolutionary more distant
template structures with high precision and accuracy. Integrated into an automated modeling
pipeline, the developed method is competitive compared to other protein structure prediction
methods.
Secondly, the automated modeling pipeline is applied to a large set of protein sequences to increase
the structural coverage of sequence space. The resulting models and associated annotation data are
stored in a relational database and can be accessed online in order to allow scientists to query for
their protein of interest. Efforts are made to update a selected set of sequences regularly by
shortening the update process without losing accuracy. It is found that the structural coverage of
seven proteomes is increased considerably by this large scale modeling approach.
Thirdly, the modeling of quaternary structure is addressed. Significant room for improvement in the
field of quaternary structure prediction is found when assessing the current state-of-the-art methods
in a double blind prediction experiment. Novel similarity measures are therefore developed to
distinguish proteins with different quaternary structure. We further create a template library built of
structures in their previously defined most likely oligomeric state, to extent the concept of homology
modeling towards the prediction of oligomeric protein structures. In order to select template
structures which share the same quaternary structure with the target structure, a variety of
evolutionary and structural features are investigated. It is shown, that using a combination of these
features for the first time predicts the quaternary structure with high accuracy.
Finally, the performances of methods which predict non-folded (intrinsically disordered) protein
segments are assessed. Current issues are addressed in a field of very active research as more and
more proteins are found to be hubs in interaction networks with considerable disordered portions in
their tertiary structure. In general it is found that such methods perform well, even within the limits
of the test set.
Contents
1 Introduction.......................................................................................................................... 1
1.1 The importance of protein structures ................................................................................... 1
1.2 Experimental methods to determine protein structures ...................................................... 2
1.3 Resources for protein structures ........................................................................................... 3
1.4 The sequence – structure gap ............................................................................................... 4
1.5 Modeling of protein structures ............................................................................................. 5
1.6 Assessing the accuracy of protein modeling procedures ...................................................... 7
2 Modeling of tertiary protein structures ................................................................................ 11
2.1 The homology modeling approach ...................................................................................... 11
2.1.1 Template identification and alignment with the target sequence ............................. 11
2.1.2 Model building ............................................................................................................. 12
2.1.3 Structural evaluation and assessment ........................................................................ 13
2.1.4 Automated modeling procedures ............................................................................... 13
2.1.5 The SWISS-MODEL system .......................................................................................... 14
2.2 Definition of the Problem .................................................................................................... 15
2.3 Improvement of the SWISS-MODEL homology modeling pipeline ..................................... 15
2.3.1 Template selection procedure .................................................................................... 19
2.3.2 Accuracy of the SWISS-MODEL Pipeline ...................................................................... 20
2.3.3 Discussion .........................................................