The Biology-Combinatorics Interface: Addressing New Challenges in Computational Biology (08w5069)
Organizers
Arvind Gupta (MITACS)
Ken Dill (Universtiy of California, San Francisco)
Anne Condon (University of British Columbia)
Ron Elber (University of Texas at Austin)
Ladislav Stacho (Simon Fraser University)
David Bremner (University of New Brunswick)
Objectives
Doing interdisciplinary work between math and biology poses challenges not unlike those faced in industrial mathematics. In order that the math be relevant, mathematicians need the right vocabulary, a deep understanding of the application area, and the shared wisdom of how to construct models that are both appropriate and tractable. In this workshop we intend to adapt to the life sciences the model for industrial mathematics workshops developed by the Oxford Study Group and the PIMS Industrial Problem Solving Workshops. We aim to identify key, timely challenges and to form multi-disciplinary teams to formulate a plan for attacking them. The objectives of the workshop are both to set broad research agendas and to focus on specific well-defined problems that are of interest both to biologists and mathematicians. We have identified the following areas of interest:
Protein Folding Lattice Models
In recent years, protein folding lattice models, such as the HP model, have provided an arena of very fruitful dialog between biologists and mathematicians involved in discrete combinatorics and complexity. Yet, there remain important outstanding unsolved problems. How are the symmetries and ``near-symmetries'' of native proteins encoded in their amino acid sequences? How can we develop automated classification schemes for protein fold classes? These involve enumerating and organizing classes of topologies. Another problem is protein design, notably what might be called ``hand-in-glove'' design. Given a known target protein structure (the ``hand''), design a protein (the ``glove'') that will complement its shape and bind tightly to it. This problem has considerable biological relevance. If it could be solved, it could lead to new biotech proteins for targeting disease or pathogens. This problem can be explored in lattice folding models, and should be tractable. Many other problems of protein folding remain to be solved, and should be similarly tractable, including questions of the energy gaps in single-chain folding, protein-protein interactions, proteomics, protein aggregation, crystallization, and fibrillization, such as occurs in Alzheimer’s disease.
RNA Folding
RNA molecules are essential to the cell: in the translation of the genetic code, as catalysts in cellular processes, and as mediators in determining the expression level of genes. Also, in vitro selection methods have produced nucleic acids not found in the cell, which can function as enzymes, or as aptamers, which are molecules with high binding specificity for target proteins – with applications in medical diagnosis or as biosensors.
A key mathematical problem is to predict RNA tertiary structures from their nucleic acid sequences. Dynamic programming algorithms are widely used. However, there remain several unsolved problems. First, secondary structures can often be predicted, whereas algorithms can rarely predict tertiary structures. One reason is that the latter involves a greater combinatoric challenge. Another problem: these models often have many parameters, sometimes poorly controlled, and often lacking proper physical treatments of excluded volume, i.e., the self-avoidance aspect of polymer chains. Some of these problems fall in the realm of discrete mathematics.
In addition, there is the problem of the inverse RNA folding: given a secondary structure, find an RNA sequence that folds into that structure. While there is strong empirical evidence that the design problem can be efficiently solved in practice, it is still open whether the problem can be solved in polynomial time in the worst case, or is NP-complete.
Evolutionary capacity and network of protein stability
What is the ``evolutionary capacity'' of a protein? That is, what set of mutations can be made to the amino acid sequence of the protein, so that the protein still folds into the same native structure? This problem is interesting to the biologists because it relates to drug resistance of disease pathogens. If you design a drug to kill a pathogen by binding to one of its proteins, can that pathogenic organism evolutionarily wiggle the protein out of that sequence into another sequences that still folds and serves the pathogen, and yet does not bind to the drug? This involves two combinatoric problems: (1) exploring the many different amino acid sequences of a protein, and (2) for each such sequence, exploring the many different conformations to which the chain can fold, because this determines what protein folds are stable.
Modeling the Cell
A big challenge is to model a cell. A cell is a complex collection of interconnected dynamical biochemical reactions. A very interesting new model approach has emerged that puts this into the realm of discrete combinatoric mathematics. In the new model of Chao Tang at UCSF, every possible genetic network topology is enumerated (in a very small system); each with its own possible dynamics. This model is drawing considerable attention from the biologists. It would be interesting to know if there are analytical treatments, if the model can be scaled to larger sizes, if more realistic topologies can be treated, and if more heterogeneous systems can also be handled. Such models are crucial to understanding how pathogens might be disrupted without disrupting the host organism. Here, it may be useful to draw on knowledge of other types of networks, like the internet. Issues of data security in information nets are related to issues of disease targeting in biological systems.
Evolution of Biological Networks
How do large metabolic and genetic networks emerge from smaller ones? How does a small network grow in complexity? Can we understand the origin of power law for connectivities in metabolic networks?
Using Self-Assembly to evolve Nano-networks
Nanotech mimics biology in the sense of using self-assembling building blocks on the small scale to build interesting structures on the larger scale. Nano-technologists are intensively studying rule-based assembly processes. However, so far, current methods have only achieved highly regular structures (for example, regular grids, squares, and Sierpinski Triangles). More interesting would be the ability to design heterogeneous objects, as biology does. The complexity of such objects will directly impact the numbers and complexity of the basic structures required. We propose to understand the nature of shapes constructible from a small initial set of shapes. What shapes are needed? What mixtures of shapes? What assembly rules? What time sequences can be used to achieve arbitrary molecular assemblies?





