欢迎来到三一办公! | 帮助中心 三一办公31ppt.com(应用文档模板下载平台)
三一办公
全部分类
  • 办公文档>
  • PPT模板>
  • 建筑/施工/环境>
  • 毕业设计>
  • 工程图纸>
  • 教育教学>
  • 素材源码>
  • 生活休闲>
  • 临时分类>
  • ImageVerifierCode 换一换
    首页 三一办公 > 资源分类 > PPT文档下载  

    分析转录因子结合位点ppt课件.ppt

    • 资源ID:1316343       资源大小:2.36MB        全文页数:44页
    • 资源格式: PPT        下载积分:16金币
    快捷下载 游客一键下载
    会员登录下载
    三方登录下载: 微信开放平台登录 QQ登录  
    下载资源需要16金币
    邮箱/手机:
    温馨提示:
    用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)
    支付方式: 支付宝    微信支付   
    验证码:   换一换

    加入VIP免费专享
     
    账号:
    密码:
    验证码:   换一换
      忘记密码?
        
    友情提示
    2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
    3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
    4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。
    5、试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。

    分析转录因子结合位点ppt课件.ppt

    第六章 基因预测和基因结构分析(I),生物信息学,http:/www.bio.davidson.edu/courses/genomics/method/shotgun.html,基因组测序策略,Genome sequencing: QUICKER, SMALLER, CHEAPER,Nature Biotechnology 26, 1135 - 1145 (2008),13 years$3 billion,http:/www.ncbi.nlm.nih.gov/genomes/static/gpstat.html,Nature Biotechnology 26, 1135 - 1145 (2008),identifying new genes,looking at chromosome organization and structure,finding gene regulatory sequences,comparative genomics,Applications of sequencing,Where are the Genes in the Genome?,GAGAAAATCAATTGGTTTAGAAGGTTTGGACTCACTTGACAGGTTCAGTTGGAGACGATCATAGGTGGCTGCTGTGACAAAGGGAAATTGTGCTTTTCCAGCATGCTTACTGACCCTGATTTACCTCAGGAGTTTGAAAGGATGTCTTCCAAGCGACCAGCCTCTCCGTATGGGGAAGCAGATGGAGAGGTAGCCATGGTGACAAGCAGACAGAAAGTGGAAGAAGAGGAGAGTGACGGGCTCCCAGCCTTTCACCTTCCCTTGCATGTGAGTTTTCCCAACAAGCCTCACTCTGAGGAATTTCAGCCAGTTTCTCTGCTGACGCAAGAGACTTGTGGCCATAGGACTCCCACTTCTCAGCACAATACAATGGAAGTTGATGGCAATAAAGTTATGTCTTCATTTGCCCCACACAACTCATCTACCTCACCTCAGAAGGCAGAAGAAGGTGGGCGACAGAGTGGCGAGTCCTTGTCTAGTACAGCCCTGGGAACTCCTGAACGGCGCAAGGGCAGTTTAGCTGATGTTGTTGACACCTTGAAGCAGAGGAAAATGGAAGAGCTCATCAAAAACGAGCCGGAAGAAACCCCCAGTATTGAAAAACTACTCTCAAAGGACTGGAAAGACAAGCTTCTTGCAATGGGATCGGGGAACTTTGGCGAAATAAAAGGGACTCCCGAGAGCTTAGCTGAGAAAGAAAGGCAACTCATGGGTATGATCAACCAGCTGACCAGCCTCCGAGAGCAGCTGTTGGCTGCCCACGATGAGCAGAAGAAACTAGCTGCCTCTCAGATTGAGAAACAGCGTCAGCAAATGGAGCTGGCCAAGCAGCAACAAGAACAAATTGCAAGACAGCAGCAGCAGCTTCTACAGCAACAACACAAAATCAATTTGCTCCAGCAACAGATCCAGGTTCAAGGTCAGCTGCCGCCATTAATGATTCCCGTATTCCCTCCTGATCAACGGACACTGGCTGCAGCTGCCCAGCAAGGATTCCTCCTCCCTCCAGGCTTCAGCTATAAGGCTGGATGTAGTGACCCTTACCCTGTTCAGCTGATCCCAACTACCATGGCAGCTGCTGCCGCAGCAACACCAGGCTTAGGCCCACTCCAACTGCAGCAGTTATATGCTGCCCAGCTAGCTGCAATGCAGGTATCTCCAGGAGGGAAGCTGCCAGGCATACCCCAAGGCAACCTTGGTGCTGCTGTATCTCCTACCAGCATTCACACAGACAAGAGCACAAACAGCCCACCACCCAAAAGCAAGGATGAAGTGGCACAGCCACTGAACCTATCAGCTAAACCCAAGACCTCTGATGGCAAATCACCCACATCACCCACCTCTCCCCATATGCCAGCTCTGAGAATAAACAGTGGGGCAGGCCCCCTCAAAGCCTCTGTCCCAGCAGCGTTAGCTAGTCCTTCAGCCAGAGTTAGCACAATAGGTTACTTAAATGACCATGATGCTGTCACCAAGGCAATCCAAGAAGCTCGGCAAATGAAGGAGCAACTCCGACGGGAACAACAGGTGCTTGATGGGAAGGTGGCTGTTGTGAATAGTCTGGGTCTCAATAACTGCCGAACAGAAAAGGAAAAAACAACACTGGAGAGTCTGACTCAGCAACTGGCAGTTAAACAGAATGAAGAAGGAAAATTTAGCCATGCAATGATGGATTTCAATCTGAGTGGAGATTCTGATGGAAGTGCTGGAGTCTCAGAGTCAAGAATTTATAGGGAATCCCGAGGGCGTGGTAGCAATGAACCCCACATAAAGCGTCCAATGAATGCCTTCATGGTGTGGGCTAAAGATGAACGGAGAAAGATCCTTCAAGCCTTTCCTGACATGCACAACTCCAACATCAGCAAGATATTGGGATCTCGCTGGAAAGCTATGACAAACCTAGAGAAACAGCCATATTATGAGGAGCAAGCCCGTCTCAGCAAGCAGCACCTGGAGAAGTACCCTGACTATAAGTACAAGCCCAGGCCAAAGCGCACCTGCCTGGTGGATGGCAAAAAGCTGCGCATTGGTGAATACAAGGCAATCATGCGCAACAGGCGGCAGGAAATGCGGCAGTACTTCAATGTTGGGCAACAAGCACAGATCCCCATTGCCACTGCTGGTGTTGTGTACCCTGGAGCCATCGCCATGGCTGGGATGCCCTCCCCTCACCTGCCCTCGGAGCACTCAAGCGTGTCTAGCAGCCCAGAGCCTGGGATGCCTGTTATCCAGAGCACTTACGGTGTGAAAGGAGAGGAGCCACATATCAAAGAAGAGATACAGGCCGAGGACATCAATGGAGAAATTTATGATGAGTACGACGAGGAAGAGGATGATCCAGATGTAGATTATGGGAGTGACAGTGAAAACCATATTGCAG,Genes (i.e., protein coding)But. . . only 2% of the human genome encodes proteinsOther than protein coding genes, what is there? genes for noncoding RNAs (rRNA, tRNA, miRNAs, etc.) structural sequences (scaffold attachment regions) regulatory sequences non-functional “junk” ?Its still uncertain/controversial how much of the genome is composed of any of these classesThe answers will come from experimentation and bioinformatics.,Complexity of genome,Published by AAAS,Science 306, 636-640 (2004),The ENCODE Project: ENCyclopedia Of DNA Elements,http:/genome.ucsc.edu/ENCODE/,Protein coding genes.In long open reading framesORFs interrupted by introns in eukaryotesTake up most of the genome in prokaryotes, but only a small portion of the eukaryotic genomeRNA-only genesTransfer RNA, ribosomal RNA, snoRNAs (guide ribosomal and transfer RNA maturation), intron splicing, guiding mRNAs to the membrane for translation, gene regulationthis is a growing listGene control sequencesPromotersRegulatory elementsTransposable elements, both active and defectiveDNA transposons and retrotransposonsMany types and sizesRepeated sequences. Centromeres and telomeresMany with unknown (or no) functionUnique sequences that have no obvious functionAs a general rule, each part of a genomic sequence has only one function: protein-coding gene, RNA gene, control signal, transposable element, repeat sequence, maybe no functional at all. But, most sequence elements overlap only slightly if at all.,Whats in a genome?,protein-coding genes, nonprotein-coding genes,easier to find than other functional elementswhy?genes are transcribedwhich means that we can identify them by looking at RNAtraditionally this has been done by cDNA or EST sequencing, more recently by microarray, SAGE, MPSS, etc.,protein-coding genes have recognizable featuresopen reading frames (ORFs)codon biasknown transcription and translational start and stop motifs (promoters, 3 poly-A sites)splice consensus sequences at intron-exon boundaries,Finding protein-coding genes,Finding nonprotein-coding genes,e.g., tRNA, rRNA, snoRNA, miRNA, various other ncRNAsHarder to find than protein-coding genesWhy?often not poly-A taileddont end up in cDNA librariesno ORFconstraint on sequence divergence at nucleotide not protein level, so homology is harder to detectSo, how do we find these?,secondary structurehomology, especially alignment of related speciesexperimentallyisolation through non-polyA dependent cloning methodsmicroarrays,Finding nonprotein-coding genes,包括多种RNA结构预测及基因鉴别软件假阳性是最大的问题,http:/en.wikipedia.org/wiki/List_of_RNA_structure_prediction_software,非编码RNA基因预测,A practical guide to the art of RNA gene prediction,第六章 基因预测和基因结构分析(II),生物信息学,基因预测和基因结构分析,生物信息学中的重要内容之一预测编码蛋白质的基因,排除重复序列确定开放阅读框(open reading frame, ORF)确定基因的调控区启动子,(一) 基因预测的基本分析内容,(二) 基因预测的基本方法,1. 序列相似性搜索(Extrinsic Approaches),基因组DNA序列,在6个阅读框中进行翻译并与蛋白质数据库中的序列进行比较分析(如Blastx)对EST数据库中同一生物的cDNA序列进行比较分析(如Blastn),确定基因数目和对应的ORF,Similarity-based Gene Prediction: for sequences that encode a known protein or a protein with a known homolog,分析举例:水稻Xa21基因序列(U37133),CDS:1-2677 bp处和3521-3921 bp处Blastx分析结果(检索蛋白质数据库):与水稻蛋白质序列比较,Blastn分析结果(检索est other数据库):与水稻cDNA序列比较,取决于数据库中EST数据的数量和长度通过“Distance tree of results ”查看与U37133序列同源的其它EST序列,有些蛋白质序列是推测获得的,2. 根据模式序列预测基因(Ab Initio Approaches),各种基因预测软件取决于人们对已知基因结构特征的认识采用统计学方法,基于一个或多个已知序列模式对未知序列进行分类,密码子偏爱性对发现的模式进行统计检验,启动子结构外显子、内含子,原核生物(E.coli),与RNA聚合酶互作位点(-10、-35区),LexA repressor的结合位点(启动子区段)CTGNNNNNNNNNNCAG核糖体结合位点(转录起始位点后)GGAGG,真核生物,基因结构复杂已知外显子、内含子外显子边界、启动子序列特征,目前还没有一个基因预测工具可以完全正确地预测一个基因组中的所有基因(Mathe et al. 2002),不同的基因预测软件分析结果有差异综合多个基因预测软件的分析结果,根据模式序列预测基因,一种分析工具可选择分析基因的不同结构,exon, poly-A, promoter重复序列,某些分析工具可选择物种模式(matrix)作为参照比较对象某些分析工具可用不同的方式呈现分析结果(文字或图形),根据模式序列预测基因,分析举例(1) Gene Finding,Softberry (http:/ Finding工具,分三大类,Gene Finding in EukaryotaOperon and Gene Finding in BacteriaGene Finding in Viruses每一大类包括多个分析软件,在Softberry主页选择“Gene Finding in Eukaryota”类中的“FGENESH”,在FGENESH网页输入D63710序列(fasta格式)、选择物种(human)作为参照,分析结果(文字和图像),GenScan(http:/genes.mit.edu/GENSCAN.html)用三个物种模式作为参照,VertebrateArabidopsisMaize,在GenScan主页输入D63710序列、选择物种(Vertebrate)作为参照,分析结果(文字和图像),分析举例(2) GenScan,分析举例(3) GeneMark,GeneMark(http:/exon.biology.gatech.edu/),用于真核、原核和病毒等基因的预测多种物种参照,在GeneMark的分析主页选择“GeneMark-E”,在“GeneMark-E”网页输入D63710序列、选择物种“H. sapiens”,选择输出格式,分析结果,Combine extrinsic and ab initio Approaches,http:/bioinf.uni-greifswald.de/augustus/,http:/www.yandell-lab.org/software/maker.html,combine extrinsic andab initioapproaches by mapping protein andESTdata to the genome to validateab initiopredictions.,3. 利用比较基因组预测基因(Comparative Genomics Approaches),依赖于全基因组测序结果亲缘关系相近生物的基因序列具有保守性,分析举例,N-SCAN/Twinscan (http:/mblab.wustl.edu/nscan/),选择N-SCAN在线分析(需免费注册),输入待分析序列,选择masking, clade, species和informant,分析结果,基因预测存在主要问题,假阳性(False Positive):多预测了假的编码区,即在非编码区预测出基因假阴性(False Negative):漏掉了真实的编码区,即将基因预测为非编码区过界预测(Over Prediction):由于基因边界很难准确定位,预测经常会超过实际边界片段化(Fragmentation):内含子太大的基因,在预测时容易断裂成两个或多个基因融合化(Fusion):距离过近的两个或多个基因,在预测时容易被融合成一个很大的基因,包括多种基因预测软件NNPP分析启动子位点,在BCM的分析主页选择“Gene Feature Searches”,在“Gene Feature Searches”网页粘贴D63710序列、选择“NNPP/Eukaryotic-eukaryotic promoter prediction”,分析结果,BCM http:/searchlauncher.bcm.tmc.edu/,(三)基因精细结构分析,Promoter2.0 predicts transcription start sites of vertebrate PolII promoters in DNA sequences.,分析启动子位点,Promoter 2.0 Prediction Serverhttp:/www.cbs.dtu.dk/services/Promoter/,在“Promoter 2.0”网页粘贴D63710序列,分析结果,分析转录因子结合位点,Cis-acting element(顺式元件)和trans-acting element(反式元件)的互作,分析举例 PROSCAN,在Proscan网页粘贴序列(FASTA格式),分析结果,http:/www-bimas.cit.nih.gov/molbio/proscan/,分析结果,分析举例,PLACE (A Database of Plant Cis-acting Regulatory DNA Element) http:/www.dna.affrc.go.jp/PLACE/index.html,在PLACE主页点击“Signal Scan Search”,在“PLACE Web Signal Scan”网页粘贴序列(FASTA),三种结果呈现方式:grouped by signal mapped to sequence scan by sequence order,点击相关链接查看什么类型的转录因子结合在相关cis-element上,植物,第六章 基因预测和基因结构分析(上机操作),生物信息学,Gene-finding software and resources,A beginners guide to eukaryotic genome annotation,练习,从核苷酸数据库中选择DNA序列(AF319968) ,试用不同的分析工具分析其基因结构,并将分析结果与核苷酸数据库中的结果相比较。预测上述序列是否含有启动子区域,分析其转录因子结合位点。,FGENESH预测结果,FGENESH预测结果,GENSCAN预测结果,GeneMark预测结果,转录起点预测,

    注意事项

    本文(分析转录因子结合位点ppt课件.ppt)为本站会员(小飞机)主动上传,三一办公仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三一办公(点击联系客服),我们立即给予删除!

    温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。




    备案号:宁ICP备20000045号-2

    经营许可证:宁B2-20210002

    宁公网安备 64010402000987号

    三一办公
    收起
    展开