Click here to load reader

Modeling of tertiary and quaternary protein structures by ... Modeling of tertiary and quaternary protein structures by homology Inauguraldissertation zur Erlangung der Würde eines

  • View
    3

  • Download
    0

Embed Size (px)

Text of Modeling of tertiary and quaternary protein structures by ... Modeling of tertiary and quaternary...

  • Modeling of tertiary and quaternary protein structures

    by homology

    Inauguraldissertation

    zur Erlangung der Würde eines Doktors der Philosophie

    vorgelegt der Philosophisch-Naturwissenschaftlichen Fakultät der

    Universität Basel

    von Florian Kiefer

    aus Freiburg im Breisgau, Deutschland

    Basel, 2012

  • Genehmigt von der Philosophisch-Naturwissenschaftlichen Fakultät auf Antrag von

    Prof. Dr. Torsten Schwede

    Prof. Dr. Manuel Peitsch

    Basel, den 13.12.2011

    Prof. Martin Spiess

    Dekan

  • Attribution-Noncommercial-No Derivative Works 2.5 Switzerland

    You are free:

    to Share — to copy, distribute and transmit the work

    Under the following conditions:

    Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).

    Noncommercial. You may not use this work for commercial purposes.

    No Derivative Works. You may not alter, transform, or build upon this work.

    • For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to this web page.

    • Any of the above conditions can be waived if you get permission from the copyright holder.

    • Nothing in this license impairs or restricts the author's moral rights.

    Quelle: http://creativecommons.org/licenses/by-nc-nd/2.5/ch/deed.en Datum: 3.4.2009

    Your fair dealing and other rights are in no way affected by the above.

    This is a human-readable summary of the Legal Code (the full license) available in German: http://creativecommons.org/licenses/by-nc-nd/2.5/ch/legalcode.de

    Disclaimer: The Commons Deed is not a license. It is simply a handy reference for understanding the Legal Code (the full license) — it is a human-readable expression of some of its key terms. Think of it as the user-friendly interface to the Legal Code beneath. This Deed itself has no legal value, and its contents do not appear in the actual license. Creative Commons is not a law firm and does not provide legal services. Distributing of, displaying of, or linking to this Commons Deed does not create an attorney-client relationship.

    http://creativecommons.org/licenses/by-nc-nd/2.5/ch/legalcode.de http://creativecommons.org/licenses/by-nc-nd/2.5/ch/deed.en

  • List of Publications

    1-7

    1. Bordoli L, Kiefer F, Schwede T. Assessment of disorder predictions in CASP7. Proteins 2007;69 Suppl 8:129-136.

    2. Kopp J, Bordoli L, Battey JN, Kiefer F, Schwede T. Assessment of CASP7 predictions for template-based modeling targets. Proteins 2007;69 Suppl 8:38-56.

    3. Bordoli L, Kiefer F, Arnold K, Benkert P, Battey J, Schwede T. Protein structure homology modeling using SWISS-MODEL workspace. Nat Protoc 2009;4(1):1-13.

    4. Kiefer F, Arnold K, Kunzli M, Bordoli L, Schwede T. The SWISS-MODEL Repository and associated resources. Nucleic Acids Res 2009;37(Database issue):D387-392.

    5. Arnold K, Kiefer F, Kopp J, Battey JN, Podvinec M, Westbrook JD, Berman HM, Bordoli L, Schwede T. The Protein Model Portal. J Struct Funct Genomics 2009;10(1):1-8.

    6. Berman HM, Westbrook JD, Gabanyi MJ, Tao W, Shah R, Kouranov A, Schwede T, Arnold K, Kiefer F, Bordoli L, Kopp J, Podvinec M, Adams PD, Carter LG, Minor W, Nair R, La Baer J. The protein structure initiative structural genomics knowledgebase. Nucleic Acids Res 2009;37(Database issue):D365-368.

    7. Mariani V, Kiefer F, Schmidt T, Haas J, Schwede T. Assessment of template based protein structure predictions in CASP9. Proteins: Structure, Function, and Bioinformatics 2011;79(S10):37-58.

  • Abstract

    The structure of a protein is crucial to understand its function. Despite this importance,

    experimentally solved structures are only available for a small portion of the currently known protein

    sequences. Comparative or homology modeling is currently the most powerful method used in order

    to predict the structure from sequence by the use of homologous template structures. Models,

    hence, need to be accurate regarding their three-dimensional coordinates and must represent the

    biological active state of the target protein in order to be useful for scientists.

    Four goals are pursued in this work in this area of research. Firstly, we increase the coverage of

    homology modeling by introducing a method which is able to identify and align evolutionary distant

    template structures. The resulting template search and selection procedure is hierarchical. Closely

    related template structures are identified accurately and efficiently by standard tools.

    A computationally more complex method is invoked in order to identify evolutionary more distant

    template structures with high precision and accuracy. Integrated into an automated modeling

    pipeline, the developed method is competitive compared to other protein structure prediction

    methods.

    Secondly, the automated modeling pipeline is applied to a large set of protein sequences to increase

    the structural coverage of sequence space. The resulting models and associated annotation data are

    stored in a relational database and can be accessed online in order to allow scientists to query for

    their protein of interest. Efforts are made to update a selected set of sequences regularly by

    shortening the update process without losing accuracy. It is found that the structural coverage of

    seven proteomes is increased considerably by this large scale modeling approach.

    Thirdly, the modeling of quaternary structure is addressed. Significant room for improvement in the

    field of quaternary structure prediction is found when assessing the current state-of-the-art methods

    in a double blind prediction experiment. Novel similarity measures are therefore developed to

    distinguish proteins with different quaternary structure. We further create a template library built of

    structures in their previously defined most likely oligomeric state, to extent the concept of homology

    modeling towards the prediction of oligomeric protein structures. In order to select template

    structures which share the same quaternary structure with the target structure, a variety of

    evolutionary and structural features are investigated. It is shown, that using a combination of these

    features for the first time predicts the quaternary structure with high accuracy.

    Finally, the performances of methods which predict non-folded (intrinsically disordered) protein

    segments are assessed. Current issues are addressed in a field of very active research as more and

    more proteins are found to be hubs in interaction networks with considerable disordered portions in

    their tertiary structure. In general it is found that such methods perform well, even within the limits

    of the test set.

  • Contents

    1 Introduction.......................................................................................................................... 1

    1.1 The importance of protein structures ................................................................................... 1

    1.2 Experimental methods to determine protein structures ...................................................... 2

    1.3 Resources for protein structures ........................................................................................... 3

    1.4 The sequence – structure gap ............................................................................................... 4

    1.5 Modeling of protein structures ............................................................................................. 5

    1.6 Assessing the accuracy of protein modeling procedures ...................................................... 7

    2 Modeling of tertiary protein structures ................................................................................ 11

    2.1 The homology modeling approach ...................................................................................... 11

    2.1.1 Template identification and alignment with the target sequence ............................. 11

    2.1.2 Model building ............................................................................................................. 12

    2.1.3 Structural evaluation and assessment ........................................................................ 13

    2.1.4 Automated modeling procedures ............................................................................... 13

    2.1.5 The SWISS-MODEL system .......................................................................................... 14

    2.2 Definition of the Problem .................................................................................................... 15

    2.3 Improvement of the SWISS-MODEL homology modeling pipeline ..................................... 15

    2.3.1 Template selection procedure .................................................................................... 19

    2.3.2 Accuracy of the SWISS-MODEL Pipeline ...................................................................... 20

    2.3.3 Discussion .........................................................

Search related