进化生物学概述.ppt
进化生物学概述,Taxonomy vs.Systematics,Blackwelder(1967),Mason(1950),Simpson(1961)Heywood(1973)Stusessy(1990)Judd et al.(1999),Haslop-Harrison(1953),Lawrence(1951)Ross(1974)Stace(1980,1989)especially in N.Amer,Systematics,The study of the diversity of organism.-Mayr,E.1969.Principles of Systematic Zoology.McGraw-Hill,New York.,The scientific study of the kinds and diversity of organisms and of any and all relationships among them.-Simpson G G.1961.Principles of Animal Taxonomy.Columbia Univ.Press,New York.,The science dedicated to discovering,organizing,and interpreting biological diversity.-Systematics Agenda 2000(1994),进化生物学,分 类Classification,系统发育Phylogeny,进 化Evolution,系统和进化生物学的基本内容,进化生物学,基本研究内容,进化生物学,分类(Classification)provide a convenient method of identification and communication 系统发育重建(Phylogeny)detect evolution at work,discovering its processes and interpreting its results 进化的过程和机制(Evolution)provide a classification which as far as possible expresses the natural relationships of organism,-what?,-how?,-why?,why?,进化生物学,进化生物学,进化生物学是研究生物进化的科学,包括进化的过程、原因、机制、速率和方向等。,人类认识自然的本能追求 生物多样性的保护和利用自然(生命)科学的基础农林业持续发展的基础人类的衣食住行社会(政治、外交、法律),重要性和意义,系统和进化生物学,进化的基本概念,进化生物学,进化是居群遗传组成的变化。-Grant 1991:30,进化是地球上生命的变化。-NAC,生物进化是生物与其生存环境相互作用过程中,其遗传系统随时间而发生一系列不可逆的改变,并导致相应表型的改变。,生物进化是由大量证据证明的事实 生物进化的解释有多种多样 进化论及其发展,生物进化及其理论,进化生物学,达尔文的自然选择理论 中性突变-随机漂变理论 综合进化理论,生物进化理论的主要学派,进化生物学,What is Natural Selection?,Charles Darwin,Alfred Russell Wallace,Charles Darwin达尔文(1809-1882),The Struggle for existence,Observation 1:Populations sizes would increase exponentially if all individuals born survived.Observation 2:Most populations are stable in size.Observation 3:No two individuals in a population are exactly the same.Observation 4:Much of this variation is heritable,生命是进化来的;生物进化是逐渐和连续的,不存在不连续变异或突变;生物之间都有一定的亲缘关系,有着共同的祖先一元论;自然选择是生物进化的根本动力(机制)。,达尔文自然选择理论,进化生物学,突变大多是“中性”的,对生物个体的生存既无害也无利;中性突变是通过随机的“遗传漂变”在群体中固定下来,在分子水平上的进化不依赖于自然选择;进化的速率有中性突变的速率所决定,对于所有生物几乎是恒定的;决定生物大分子进化的主要因素是突变压和机会。,中性突变-随机漂变理论(Kimura,1968),进化生物学,用孟德尔定律来解释遗传变异的性质和机制;用群体遗传学方法来研究进化的机制(理论和实验群体遗传学),通过对微观进化过程和机制的研究来认识宏观进化;接受了达尔文进化论的核心部分自然选择,并有所发展。,综合进化理论(Dobzhansky,Mayr,Simpson,Stebbins),进化生物学,综合进化理论(Dobzhansky,Mayr,Simpson,Stebbins),进化生物学,进化=遗传变异+变异的不均等传递+物种形成 突变 重组 基因流 选择 遗传漂变 隔离,进化生物学的基本研究内容,进化=遗传变异+变异的不均等传递+物种形成 突变 重组 基因流 选择 遗传漂变 隔离,群体水平,个体水平,物种水平,进化生物学,进化的源泉 遗传变异 进化的动力I 选择 进化的动力II 遗传漂变 进化的保障 隔离与物种形成,进化生物学的基本研究内容,进化生物学,突变-遗传变异的根本来源 染色体畸变 基因突变 重组-遗传变异的主要成分 基因流,进化的源泉遗传变异,进化生物学,重 组,重组在决定进化的速率和方向上所起的作用等于或大于突变的作用。-Stebbins 1950:122,进化生物学,重组的威力,1990年,人类中已检测出2000个RFLP多态位点。如果按每个位点只有2 个等位基因,人类可能的基因型组合=3200,即:10,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,0002000年地球上的人口:6,000,000,0002000年地球上曾经出现过的人口:13,000,000,000,进化生物学,选择及其意义 实验证据 历史证据 人工选择证据 选择的类型 稳定性、单向性、分裂性、平衡性、性选择 选择的度量,进化的动力I 选择,进化生物学,Tanksley&McCouch.1997.Science 277:1063-1066,栽培作物的遗传基础,驯化+育种,岛屿效应(island effect)瓶颈效应(bottlenect effect)奠基者效应(founder effect),进化的动力II 遗传漂变,进化生物学,遗传漂变:由于某种机会,某一等位基因频率的群体(尤其是在小群体)中出现世代传递的波动现象称为遗传漂变(genetic drift),也称为随机遗传漂变(random genetics drift)。这种波动变化导致某些等位基因的消失,另一些等位基因的固定,从而改变了群体的遗传结构。在大群体中,不同基因型个体所生子女数的波动,对基因频率不会有明显影响。小群体的人数少,并与总人群相隔离,这种社会和地理因素形成的小群体,A基因固定(A1),而a基因人很少,a基因的人如无子女,则a基因就会较快在人群中消失,造成此小群体中基因频率的随机波动。这种漂变与群体大小有关,群体越小,漂变速度越快,甚至12代就造成某个基因的固定和另一基因的消失而改变其遗传结构,而大群体漂变则慢,可随机达到遗传平衡。一些异常基因频率在小隔离群体中特别高,可能是由于该群体中中少数始祖所具有的基因,由于遗传漂变而逐渐达到较高水平,这种现象称为建立者效应(founder effect)。例如,太平洋的东卡罗林岛中有5的人患先天性色盲。据调查,在18世纪末,因台风侵袭,岛上只剩30人,由他们繁殖成今天1600余人的小群体,这5的色盲,可能只是最初30人建立者的某一个人是携带者,其基因频率q=1/600.016,经若干世代的隔离繁殖,q很快上升至0.22,这就是建立者效应。,瓶颈效应(bottlenect effect),进化生物学,奠基者效应(founder effect)是遗传漂变的一种形式,指由带有亲代群体中部分等位基因的少数个体重新建立新的群体,这个群体后来的数量虽然会增加,但因未与其他生物群体交配繁殖,彼此之间基因的差异性甚小。这种情形一般发生于对外隔绝的海岛,或较为封闭的新开辟村落等.,隔离机制(isolating mechanism)导致居群体系之间基因交流下降或完全受阻的各种因素。-Grant 1991:226 生殖隔离与物种形成,进化的保障隔离,进化生物学,生殖隔离与物种形成,Microevolution Speciation Macroevolution,Speciation-mystery of mysteries-Charles Darwin,进化生物学,-local race,-geographical race,-semispecies,-biological species,-genus,Degree of Isolation,-ancestral population,进化生物学,Beattie ed.1995.Australias Biodiversity.Reed.,Beattie ed.1995.Australias Biodiversity.Reed.,已知和未知的生命世界,生命的系统发育和分类,生命之树系统发育重建,Phylogeny the evolutionary relationships among a group of species-Systematics Agenda 2000,系统发育重建方法的比较,目前根据分子序列进行系统发育重建的方法包括:距离法、最大简约法、最大似然法和贝叶斯推测。所有这些方法都有各自的优点和缺点,详见下表。一般说来,分子生物学家和遗传学家倾向于使用距离法(主要是邻接法);系统学家倾向于使用最大简约法;分子进化生物学家和统计学家则倾向于使用最大似然法和贝叶斯推测。目前的趋势是大家倾向于相信多种方法共同支持的系统发育关系。,几个概念,适应辐射:趋异进化的结果使一个物种适应多种不同的环境而分化成多个在形态、生理和行为上各不相同的种,形成一个同源的辐射状的进化系统,即是适应辐射(adaptive radiation)。趋同进化(convergent evolution):不同的生物,甚至在进化上相距甚远的生物,如果生活在条件相同的环境中,在同样选择压的作用下,有可能产生功能相同或十分相似的形态结构,以适应相同的条件。此种现象称为趋同进化(convergent evolution)。鲸、海豚等和鱼类的亲缘关系很远,前者是哺乳类,后者是鱼类,但形都和相似。趋异进化(divergent evolution):有些生物虽然同出一源,但在进化过程中在不同的环境条件的作用下变得很不相同,这种现象称为趋异进化(divergent evolution)。北极熊(Ursus maritimus)是从棕熊(Ursus arctos)发展而来。第四纪的更新世时,一次大冰川将一群棕熊从主群中分了出来,他们在北极严寒环境的选择之下,发展成北极熊。北极熊是白色的,与环境颜色一致,便于猎捕食物;头肩部成流线形,足掌有刚毛,能在冰上行走而不致滑到,并有隔热和御寒的作用。北极熊肉食,棕熊虽然也属食肉目,却以植物为主要食物。,转换颠换同义替代非同义替代无义突变密码子偏倚Multiple hits,Molecular Evolution,Degeneracy within the Universal Genetic code.,Silent and Replacement Substitutions,Silent substitution:Sequence 1:UUU CAU CGUSequence 2:UUU CAC CGU Coded Amino Acids:Phe His Arg,Replacement substitution:Sequence 1:UUU CAU CGUSequence 2:UUU CAG CGU Coded Amino Acids:Phe His Arg Gln,How do you detect positive selection?,DnNumber of replacement substitutions Number of replacement sitesDsNumber of silent substitutions Number of silent sitesDn/Ds 1 Positive Selection,Orthologous genesParalogous genes,书写原则,属及属下皆用斜体,属的第一个字母要大写,其余皆小写,斜体,命名人的用正体。属名、种名、种加词。属名、种名、ssp.(亚种)、var.(变种)亚种名、种加词。属以上皆用正体,不能用斜体。基因名:大写斜体蛋白质名:大写正体突变体名:小写斜体,DNA Substitution Models,The use of maximum likelihood(ML)algorithms in developing phylogenetic hypotheses requires a model of evolution.The frequently used General Time Reversible(GTR)family of nested models encompasses 64 models with different combinations of parameters for DNA site substitution.The models are listed here from the least complex to the most parameter rich.,Jukes-Cantor(JC,nst=1):Equal base frequencies,all substitutions equally likely(PAUP*rate classification:aaaaaa,PAML:aaaaaa)*(Jukes and Cantor 1969)Felsenstein 1981(F81,nst=1):Variable base frequencies,all substitutions equally likely(PAUP*:aaaaaa,PAML:aaaaaa)*(Felsenstein 1981)Kimura 2-parameter(K80,nst=2):Equal base frequencies,variable transition and transversion frequencies(PAUP*:abaaba,PAML:abbbba)(Kimura 1980)Hasegawa-Kishino-Yano(HKY,nst=2):Variable base frequencies,variable transition and transversion frequencies(PAUP*:abaaba,PAML:abbbba)(Hasegawa et.al.1985),Tamura-Nei(TrN):Variable base frequencies,variable transition frequencies,equal transversion frequencies(PAUP*:abaaea,PAML:abbbbf)(Tamura Nei 1993)Kimura 3-parameter(K3P):Variable base frequencies,variable transversion frequencies,equal transition frequencies(PAUP*:abccba,PAML:abccba)(Kimura 1981)Transition Model(TIM):Variable base frequencies,variable transitions,transversions equal(PAUP*:abccea,PAML:abccbe)Transversion Model(TVM):Variable base frequencies,variable transversions,transitions equal(PAUP*:abcdbe,PAML:abcdea),Symmetrical Model(SYM):Equal base frequencies,symmetrical substitution matrix(A to T=T to A)(PAUP*:abcdef,PAML:abcdef)(Zharkikh 1994)General Time Reversible(GTR,nst=6):Variable base frequencies,symmetrical substitution matrix(PAUP*:abcdef,PAML:abcdef)(e.g.,Lanave et al.1984,Tavare 1986,Rodriguez et.al.1990),Likelihood Ratio Test,The likelihood ratio test(LRT)is a statistical test of the goodness-of-fit between two models.A relatively more complex model is compared to a simpler model to see if it fits a particular dataset significantly better.The LRT begins with a comparison of the likelihood scores of the two models:LR=2*(lnL1-lnL2),Examples Example 1-Comparing Likelihood Models:Consider the HKY85 and GTR models.The GTR model differs from HKY85 by the addition of four additional rate parameters(see DNA substitution models).These models are therefore hierarchically nested-an imperative requirement of the LRT.Imagine calculating the likelihood scores of the two models after acquiring a simple neighbor-joining tree:HKY85-lnL=1787.08 GTR-lnL=1784.82 Then,LR=2(1787.08-1784.82)=4.53 degrees of freedom=4(GTR adds 4 additional parameters to HKY85)critical value(P=0.05)=9.49,Floral Genome Project,Soltis et al.2002.Trends in Pl Sci 7:22-34,Beattie ed.1995.Australias Biodiversity.Reed.p.14,Raven et al.1986.Biology of Plants.Worth Pub.Inc.p.155,604-5,Avoid the“Black Box”,Researchers invest considerable resources in producing molecular sequence dataThey should also be ready to invest the time and effort needed to get the most out of their dataModern phylogenetic software makes it easy to produce trees from aligned sequence data but phylogenetic inference should not be treated as a“black box”,Choices are Unavoidable,There are many different phylogenetic methods Thus the investigator is confronted with unavoidable choices Not all methods are equally good for all dataAlthough we need not understand all the details of the various phylogenetic methods,an understanding of the basic properties is essential for informed choice of method and interpretation of results,Data are not Perfect,Most data includes misleading evidence of relationships and we need to have a cautious attitude to the quality of data and treesData can be subject to both systematic biases and noise that affect our chances of getting the correct tree For example:Saturation(noise)Alignment artefactsBase compositional biasesBranch length or rate asymmetries leading to long branch attractionsDifferent methods may be more or less sensitive to some of these problems,Alignment-Homology,The data determines the resultsThe alignment determines the data(hypotheses of homology)Be aware of potential alignment artefacts If using multiple alignment software,explore the sensitivity of the alignment to variations in the parameters used Eliminate regions that cannot be aligned with confidence,Models,Simple models(in ML and distance analyses)often perform poorly because the data does not fit the modelExplore the data for potential biases and deviations from the assumptions of the modelBe prepared to use more complex models that better approximate the evolution of the sequences and therefore might be expected to give more accurate results,Choice of Models,More complex models require the estimation of more parameters each of which is subject to some errorThus there is a trade-off between more realistic and complex models and their power to discriminate between alternative hypothesesBy comparing likelihoods of trees under different models we can determine if a more complex model gives a significantly better fit to the data,Choice of Method,Not all methods deal with all known problemsLogDet is useful when there are strong base compositional biases but does not deal with rate heterogeneity(need to remove invariant sites)ML with gamma distribution is useful when there are strong rate heterogeneities across sitesGamma shape and proportions of invariant sites can be estimated from the data,An Experimental Science,Phylogenetics differs from many sciences in its historical focusThe classical experimental method is not applicableHowever,we can perform experiments in the analysis of dataExperiments(multiple analyses)help us to understand the behaviour of the dataThe only cost is the time invested!,Some Experiments,Vary the included taxa You may be able to minimise the effects of biases by appropriate taxon sampling to break long branches or reduce base compositional biases by introducing intermediate taxaVary the characters includedYou may be able to improve the fit of data to a model by removing the fastest evolving sites or the slowest evolving sites,Is the data any good?,Explore the data for phylogenetic signal:randomization tests will identify data that cannot be used to generate reliable phylogenetic inferencesBe ready to explore data partitions or ways of treating the data-for example in protein coding genes,systematic biases or noise may differentially effect 3rd positions in codons and might be avoided by excluding this data or by translating DNA sequences and analysing amino acid sequences,Measure support for groups,Evaluate relationships shown in trees with bootstrap or other resampling techniquesAppreciate that such measures may be misleading if the data is misleading(particularly if subject to systematic biases)Explore the sensitivity of these results to methods of analyses-disagreements should limit confidence in results unless they can be explained as a result of undesirable properties of methods/characteristics of the data,Hypothesis testing,Alternative evolutionary hypotheses may be supported by alternative phylogenetic treesWe can test alternative hypotheses by determining if any of the alternative trees are significantly better explanations of the dataUse constrained analyses to find alternative treesUse SH or other tests to evaluate alternative trees,Gene trees and species trees,Remember that molecular systematics yields gene treesAccurate gene trees may not be accurate organismal trees Gene duplications and paralogy,lateral transfer,and lineage sorting of plastid genomes can produce mismatches between gene and organismal phylogeniesUse congruence between separate gene trees to identify robust organismal phylogenies or mismatches that require further information,