《淀粉样变性病》PPT课件.ppt
A Field Guide part 2,Genome resources Sequence similarity,Apr,2007,Shandong University,Genome Resources,LocusLink,Gene database,UniGene,Trace Archive,Map Viewer,Homologene,Genomic Biology,Genome Projects:microb,Genome Resources,LocusLink,Gene database,UniGene,Trace Archive,Map Viewer,Homologene,A single query interface to Sequences-RefSeqs-GenBank-HomologeneMaps MapViewerEntrez links,LocusLink,LocusLink will be replaced by Entrez Gene on MARCH 1,2005.Check Gene FAQ for current information.,Entrez Gene,LocusLink,A single query interface to Sequences-RefSeqs-GenBank-HomologeneMaps MapViewerEntrez links,Entrez Gene,More organisms-all RefSeq genomes Entrez integration,Gsnsym,淀粉样变性病,Global Entrez:NADH2,nadh2,47,Entrez Gene:NADH2,26 records,Gene Record for Pongo NADH2,Display Exons/Introns:Gene Table,Gene Table,A Record With More Data:Human HFE,血色沉着病,Gene Graphic Links,Introns/Exons:Gene Table,links to sequence,A Record With More Data:Human HFE,Entrez SNP,hfegene name AND humanorgn,52,血色沉着病,Linking to SNP,染色体定位,基因定位,序列定位,SNP in Structure,Link to OMIM,Variants in OMIM,Genome Resources,LocusLink,Gene database,Trace Archive,Map Viewer,Homologene,UniGene,Gene-oriented clusters of expressed sequences,Automatic clustering using MegaBlast Each cluster represents a unique gene Informed by genome hits Information on tissue types and map locations Useful for gene discovery and selection of mapping reagents,UniGene,A Cluster of ESTs,query,5 EST hits,3 EST hits,Unigene,UniGene Collections,Example UniGene Cluster,Histogram of cluster sizes for UniGene Hs build 177,UniGene Cluster Hs.95351,UniGene Cluster Hs.95351,UniGene Cluster Hs.95351:expression,UniGene Cluster Hs.95351:seqs,Download sequences,web page,ftp site,Genome Resources,LocusLink,Gene database,UniGene,Trace Archive,Map Viewer,Homologene,The New Homologene,Automated detection of homologs among the annotated genes of completely sequenced eukaryotic genomes.,No longer UniGene basedProtein similarities firstGuided by taxonomic treeIncludes orthologs and paralogs,Orthologs 和 Paralogs 是同源序列的两种类型。Orthologs(垂直同源基因)是指来自于不同物种的由垂直家系(物种形成)进化而来的蛋白,并且典型的保留与原始蛋白有相同的功能。Paralogs(平行同源基因)是那些在一定物种中的来源于基因复制的蛋白,可能会进化出新的与原来有关的功能。请参考文献获得更多的信息。,gene duplication,Paralogs vs Orthologs,early globin gene,A-chain gene B-chain gene,paralogs,orthologs orthologs,The New Homologene,RAG1 Homologene,rag1,12,recombination activating gene,RAG1 Homolgene,RAG1,Amniota,Homolgene:RAG1,Homolgene:RAG1,Genome Resources,LocusLink,Gene database,UniGene,Trace Archive,Map Viewer,Homologene,MapViewer,List View,Human MapViewer,腺甙脱氨酶,MapViewer:Human ADAR,4,MV Hs ADAR,Maps&Options,-Sequence maps-Ab initioAssemblyRepeatsBES_CloneCloneNCI_CloneContigComponentCpG islanddbSNP haplotypeFosmidGenBank_DNAGenePhenotypeSAGE_TagSTSTCAG_RNATranscript(RNA)Hs_UniGeneHs_EST,-Cytogenetic maps-IdeogramFISH CloneGene_CytogeneticMitelman BreakpointMorbid/Disease-Genetic Maps-deCODEGenethonMarshfield-RH maps-GeneMap99-G3GeneMap99-GB4NCBI RHStandford-G3TNGWhitehead-RHWhitehead-YAC,Mm_UniGeneMm_ESTRn_UniGeneRn_ESTSsc_UniGeneSsc_ESTBt_UniGeneBt_ESTGga_UniGeneGga_ESTVariation,Maps&Options,MapViewer,UniGene,Component,Repeats,Gene,Master map:repeats,Gene,Phenotype,Variation,Maps&Options,Maps&Options,Genome Resources,LocusLink,Gene database,UniGene,Trace Archive,Map Viewer,Homologene,Strongylocentrotus purpuratus Traces,BLAST,Basic Local Alignment Search Tool,Web Access,BLAST,VAST,Entrez,Text,Sequence,Structure,Basic Local Alignment Search Tool,Why use sequence similarity?BLAST algorithm BLAST statistics BLAST output Examples,Why Do We Need Sequence Similarity Searching?,To identify and annotate sequencesTo evaluate evolutionary relationshipsOther:model genomic structure(e.g.,Spidey)check primer specificity in silico,:NCBIs tool,BLAST Website Stats,Global vs Local Alignment,Global vs Local Alignment,Seq1:WHEREISWALTERNOW(16aa)Seq2:HEWASHEREBUTNOWISHERE(21aa),The Flavors of BLAST,Standard BLASTtraditional“contiguous”word hitposition independent scoring nucleotide,protein and translations(blastn,blastp,blastx,tblastn,tblastx)Megablastoptimized for large batch searchescan use discontiguous wordsPSI-BLASTconstructs PSSMs automatically;uses as queryvery sensitive protein searchRPS BLASTsearches a database of PSSMstool for conserved domain searches,Widely used similarity search toolHeuristic approach based on Smith Waterman algorithmFinds best local alignmentsProvides statistical significanceAll combinations(DNA/Protein)query and database.DNA vs DNA blastnDNA translation vs Protein blastxProtein vs Protein blastpProtein vs DNA translation tblastnDNA translation vs DNA translation tblastx www,standalone,and network clients,Basic Local Alignment Search Tool,Translated BLAST,Query,Database,Program,ucleotide,rotein,N,N,N,N,P,P,blastx,tblastn,tblastx,Particularly useful for nucleotide sequences withoutprotein annotations,such as ESTs or genomic DNA,How BLAST Works,Make lookup table of“words”for queryScan database for hitsUngapped extensions of hits(initial HSPs)Gapped extensions(no traceback)Gapped extensions(traceback;alignment details),Nucleotide Words,GTACTGGACAT TACTGGACATG ACTGGACATGG CTGGACATGGA TGGACATGGAC GGACATGGACC GACATGGACCC ACATGGACCCT,Make a lookuptable of words,.,Protein Words,GTQ TQI QIT ITV TVE VED EDL DLF.,Make a lookuptable of words,-f 11=blastp default,Minimum Requirements for a Hit,Nucleotide BLAST requires one exact match Protein BLAST requires two neighboring matches within 40 aa,GTQITVEDLFYNI SEI YYN,ATCGCCATGCTTAATTGGGCTT CATGCTTAATT,neighborhood words,one exact match,two matches,-A 40=blastp default,BLASTP Summary,High-scoring pair(HSP),Scoring Systems-Nucleotides,A G C TA+1 3 3-3G 3+1 3-3C 3 3+1-3T 3 3 3+1,Identity matrix,CAGGTAGCAAGCTTGCATGTCA|raw score=19-9=10CACGTAGCAAGCTTG-GTGTCA,-r 1-q-3,Scoring Systems-Proteins,Position Independent MatricesPAM Matrices(Percent Accepted Mutation)Derived from observation;small dataset of alignments Implicit model of evolution All calculated from PAM1 PAM250 widely usedBLOSUM Matrices(BLOck SUbstitution Matrices)Derived from observation;large dataset of highly conserved blocks Each matrix derived separately from blocks with a defined percent identity cutoff BLOSUM62-default matrix for BLASTPosition Specific Score Matrices(PSSMs)PSI-and RPS-BLAST,A 4R-1 5 N-2 0 6D-2-2 1 6C 0-3-3-3 9Q-1 1 0 0-3 5E-1 0 0 2-4 2 5G 0-2 0-1-3-2-2 6H-2 0 1-1-3 0 0-2 8I-1-3-3-3-1-3-3-4-3 4 L-1-2-3-4-1-2-3-4-3 2 4K-1 2 0-1-3 1 1-2-1-3-2 5M-1-1-2-3-1 0-2-3-2 1 2-1 5F-2-3-3-3-2-3-3-3-1 0 0-3 0 6P-1-2-2-1-3-1-1-2-2-3-3-1-2-4 7S 1-1 1 0-1 0 0 0-1-2-2 0-1-2-1 4T 0-1 0-1-1-1-1-2-2-1-1-1-1-2-1 1 5W-3-3-4-4-2-2-3-2-2-3-2-3-1 1-4-3-2 11Y-2-2-2-3-2-1-2-3 2-1-1-2-1 3-3-2-2 2 7V 0-3-3-3-1-2-2-3-3 3 1-2 1-1-2-2 0-3-1 4X 0-1-1-1-2-1-1-1-1-1-1-1-1-1-2 0 0-2-1-1-1 A R N D C Q E G H I L K M F P S T W Y V X,BLOSUM62,Position-Specific Score Matrix,DAF-1,Serine/Threonine protein kinases catalytic loop,A R N D C Q E G H I L K M F P S T W Y V 435 K-1 0 0-1-2 3 0 3 0-2-2 1-1-1-1-1-1-1-1-2 436 E 0 1 0 2-1 0 2-1 0-1-1 0 0 0-1 0 0-1-1-1 437 S 0 0-1 0 1 1 0 1 1 0-1 0 0 0 2 0-1-1 0-1 438 N-1 0-1-1 1 0-1 3 3-1-1 1-1 0 0-1-1 1 1-1 439 K-2 1 1-1-2 0-1-2-2-1-2 5 1-2-2-1-1-2-2-1 440 P-2-2-2-2-3-2-2-2-2-1-2-1 0-3 7-1-2-3-1-1 441 A 3-2 1-2 0-1 0 1-2-2-2 0-1-2 3 1 0-3-3 0 442 M-3-4-4-4-3-4-4-5-4 7 0-4 1 0-4-4-2-4-1 2 443 A 4-4-4-4 0-4-4-3-4 4-1-4-2-3-4-1-2-4-3 4 444 H-4-2-1-3-5-2-2-4 10-6-5-3-4-3-2-3-4-5 0-5 445 R-4 8-3-4 0-1-2-3-2-5-4 0-3-2-4-3-3 0-4-5 446 D-4-4-1 8-6-2 0-3-3-5-6-3-5-6-4-2-3-7-5-5 447 I-4-5-6-6-3-4-5-6-5 3 5-5 1 1-5-5-3-4-3 1 448 K 0 0 1-3-5-1-1-3-3-5-5 7-4-5-3-1-2-5-4-4 449 S 0-3-2-3 0-2-2-3-3-4-4-2-4-5 2 6 2-5-4-4 450 K 0 3 0 1-5 0 0-4-1-4-3 4-3-2 2 1-1-5-4-4 451 N-4-3 8-1-5-2-2-3-1-6-6-2-4-5-4-1-2-6-4-5 452 I-3-5-5-6 0-5-5-6-5 6 2-5 2-2-5-4-3-5-3 3 453 M-4-4-6-6-3-4-5-6-5 0 6-5 1 0-5-4-3-4-3 0 454 V-3-3-5-6-3-4-5-6-5 3 3-4 2-2-5-4-3-5-3 5 455 K-2 1 1 4-5 0-1-2 1-4-2 4-3-2-3 0-1-5-2-3 456 N 1 1 3 0-4-1 1 0-3-4-4 3-2-5-2 2-2-5-4-4 457 D-3-2 5 5-1-1 1-1 0-5-4 0-2-5-1 0-2-6-4-5 458 L-3-1 0-3 0-3-2 3-4-2 3 0 1 1-2-2-3 5-1-3,Position-Specific Score Matrix,catalytic loop,./blastpgp-i NP_499868.2-d nr-j 3-Q NP_499868.pssm,Local Alignment Statistics,High scores of local alignments between two random sequencesfollow the Extreme Value Distribution,Score(S),Alignments,Expect ValueE=number of database hits you expect to find by chance,S,your score,expected number of random hits,More info:,Advanced BLAST Options:Nucleotide,Example Entrez Queriesnucleotide allFilter NOT mammaliaOrganismgreen plantsOrganismbiomol mrnaPropertiesgbdiv estProperties AND ratorganismOther Advancede 10000 expect value-v 2000 descriptions-b 2000 alignments,Advanced BLAST Options:Protein,Matrix SelectionPAM30-most stringentBLOSUM45-least stringent,Example Entrez Queriesproteins allFilter NOT mammaliaOrganismgreen plantsOrganismsrcdb refseqPropertiesOther Advancede 10000 expect value-v 2000 descriptions-b 2000 alignments,Limit by taxonMus musculusOrganismMammaliaOrganismViridiplantaeOrganism,sp|P27476|NSR1_YEAST NUCLEAR LOCALIZATION SEQUENCE BINDING PROTEIN(P67)Length=414 Score=40.2 bits(92),Expect=0.013 Identities=35/131(26%),Positives=56/131(42%),Gaps=4/131(3%)Query:362 STTSLTSSSTSGSSDKVYAHQMVRTDSREQKLDAFLQPLSKPLS-SQPQAIVTEDKTD 418 S+S SSS+S SS+S+S S S+E K Sbjct:29 SSSSSESSSSSSSSSESESESESESESSSSSSSSDSESSSSSSSDSESEAETKKEESKDS 88,Filtered,Unfiltered,Low Complexity Filtering,Other BLAST Algorithms,MegablastDiscontiguous MegablastPSI-BLASTPHI-BLAST,Megablast:NCBIs Genome Annotator,Long alignments of similar DNA sequencesGreedy algorithmConcatenation of query sequencesFaster than blastn;less sensitive,MegaBLAST&Word Size,Trade-off:sensitivity vs speed,Discontiguous Megablast,Uses discontiguous word matchesBetter for cross-species comparisons,Templates for Discontiguous Words,W=11,t=16,coding:1101101101101101W=11,t=16,non-coding:1110010110110111W=12,t=16,coding:1111101101101101W=12,t=16,non-coding:1110110110110111W=11,t=18,coding:101101100101101101W=11,t=18,non-coding:111010010110010111W=12,t=18,coding:101101101101101101W=12,t=18,non-coding:111010110010110111W=11,t=21,coding:100101100101100101101W=11,t=21,non-coding:111010010100010010111W=12,t=21,coding:100101101101100101101W=12,t=21,non-coding:111010010110010010111,Reference:Ma,B,Tromp,J,Li,M.PatternHunter:faster and more sensitive homology search.Bioinformatics March,2002;18(3):440-5,W=word size;#matches in templatet=template length,Discontiguous(Cross-species)MegaBLAST,Discontiguous Word Options,MegaBLAST vs Discontiguous MegaBLAST,NM_017460,Homo sapiens cytochrome P450,family 3,subfamily A,polypeptide 4(CYP3A4),transcript variant 1,mRNA(2768 letters),vs Drosophila,MegaBLAST vs Discontiguous MegaBLAST,MegaBLAST=“No significant similarity found.”,Discontiguous megaBLAST=,Another Example.,Discontiguous megaBLAST=numerous hits.,Query:NM_078651 Drosophila melanogaster CG18582-PA(mbt)mRNA,(3244 bp)/note=mushroom bodies tiny;synonyms:Pak2,STE20,dPAK2,MegaBLAST=“No significant similarity found.”,Database:nr(nt),Mammaliaorgn,Ex:Discontiguous MegaBLAST,Ex:BLASTN,PSI-BLAST,Example:Confirming relationships of purinenucleotide metabolism proteins,Position-specific Iterated BLAST,gi|113340|sp|P03958|ADA_MOUSE ADENOSINE DEAMINASE(ADENOSINEMAQTPAFNKPKVELHVHLDGAIKPETILYFGKKRGIALPADTVEELRNIIGMDKPLSLPGFVIAGCREAIKRIAYEFVEMKAKEGVVYVEVRYSPHLLANSKVDPMPWNQTEGDVTPDDVVDEQAFGIKVRSILCCMRHQPSWSLEVLELCKKYNQKTVVAMDLAGDETIEGSSLFPGHVEAYRTVHAGEVGSPEVVREAVDILKTERVGHGYHTIEDEALYNRLLKENMHFEVCPWSSYLTGAVRFKNDKANYSLNTDDPLIFKSTLDTDYQMTKKDMGFTEEEFKRLNINAAKSSFLPEEEKK,PSI-BLAST,0.005,E value cutoff for PSSM,RESULTS:Initial BLASTP,Same results as protein-protein BLAST;different format,Results of First PSSM Search,Other purine nucleotide metabolizing enzymes not found by ordinary BLAST,Tenth PSSM Search:Convergence,Just below threshold,another nucleotide metabolism enzyme,Reverse PSI-BLAST(RPS)-BLAST,Adenosine/AMP Deaminase Domain,.,PHI-BLAST,gi|231729|sp|P30429|CED4_CAEEL CELL DEATH PROTEIN 4MLCEIECRALSTAHTRLIHDFEPRDALTYLEGKNIFTEDHSELISKMSTRLERIANFLRIYRRQASELIDFFNYNNQSHLADFLEDYIDFAINEPDLLRPVVIAPQFSRQMLDRKLLLGNVPKQMTCYIREYHVIKKLDEMCDLDSFFLFLHGRAGSGKSVIASQALSKSDQLIGINYDSIVWLKDSGTAPKSTFDLFTDILKSEDDLLNFPSVEHVTSVVLKRMICNALIDRPNTLFVFDDVVQEETIRWAQELRLRCLVTTRDVEIASQTCEFIEVTSLEIDECYDFLEAYGMPMPVGEKEEDVLNKTIELSSGNPATLMMFFKSCEPKTFEK,GAxxxxGKST,Whats New?,BLAST Databases,Nucleotiderefseq_rna=NM_*,XM_*refseq_genomic=NC_*,NG_*env_ntenvironmental samplefilter,e.g.,16S rRNAProteinrefseq=NP_*,XP_*env_nr,New Formatter,Select lower case,Select red,New Formatter,gray line=same database hit hsps color-coded independently,BLAST Output:Alignments&Filter,low complexity sequence filtered,Advanced Options,Limit to Organism,protein allfilter N,Example Entrez Queriesproteins allFilter NOT mammaliaOrganismray finned fishesOrganismsrcdb refseqProperties Nucleotide only:biomol mrnaPropertiesbiomol genomicPropertiesOtherAdvancede 10000expect value-v 2000descriptions-b 2000alignments,-e 10000-v 2000,Genome BLAST Examples,Example Search Pathways:Hemochromatosis,Gene,“hemochromatosis”,nucleotide sequence,Sample,Example:Human Genome BLAST,Human Genome BLAST:Results,Human Genome BLAST:MapViewer,Map Oligos Onto Genome,CCATGGCGACCCTGGAAAAGCNNNNNNNNNNCAGCAGCGGCTGTGCCTGCGG,-W 7 e 1000,Genome BLAST Results,Primer Alignments,forward primer,reverse primer,MapViewer,MapViewer,Sequence View(sv),forward,reverse,ftp:/ftp.ncbi.nih.gov/refseq/LocusLink/LL_tmpl.gz,A Few Good Files,ftp:/ftp.ncbi.nih.gov/refseq/release/,ftp:/ftp.ncbi.nih.gov/repository/UniGene/Hs.seq.uniq.gz,cdd/,ftp:/ftp.ncbi.nih.gov/toolbox/ncbi_tools/ncbi.tar.gz,ftp:/ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz,Service Addresses,BLASTGeneral HelpWayne Matten,