龙星计划课程信息检索CourseOverviewBackground.ppt
《龙星计划课程信息检索CourseOverviewBackground.ppt》由会员分享,可在线阅读,更多相关《龙星计划课程信息检索CourseOverviewBackground.ppt(65页珍藏版)》请在三一办公上搜索。
1、2008 ChengXiang Zhai Dragon Star Lecture at Beijing University,June 21-30,2008,1,龙星计划课程:信息检索 Course Overview&Background,ChengXiang Zhai(翟成祥)Department of Computer ScienceGraduate School of Library&Information ScienceInstitute for Genomic Biology,StatisticsUniversity of Illinois,Urbana-Champaignhttp:
2、/www-faculty.cs.uiuc.edu/czhai,czhaics.uiuc.edu,2008 ChengXiang Zhai Dragon Star Lecture at Beijing University,June 21-30,2008,2,Outline,Course overviewEssential backgroundProbability&statisticsBasic concepts in information theoryNatural language processing,2008 ChengXiang Zhai Dragon Star Lecture a
3、t Beijing University,June 21-30,2008,3,Course Overview,2008 ChengXiang Zhai Dragon Star Lecture at Beijing University,June 21-30,2008,4,Course Objectives,Introduce the field of information retrieval(IR)Foundation:Basic concepts,principles,methods,etcTrends:Frontier topics Prepare students to do rese
4、arch in IR and/or related fieldsResearch methodology(general and IR-specific)Research proposal writingResearch project(to be finished after the lecture period),2008 ChengXiang Zhai Dragon Star Lecture at Beijing University,June 21-30,2008,5,Prerequisites,Proficiency in programming(C+is needed for as
5、signments)Knowledge of basic probability&statistics(would be necessary for understanding algorithms deeply)Big plus:knowledge of related areasMachine learningNatural language processingData mining,2008 ChengXiang Zhai Dragon Star Lecture at Beijing University,June 21-30,2008,6,Course Management,Teac
6、hing staffInstructor:ChengXiang Zhai(UIUC)Teaching assistants:Hongfei Yan(Peking Univ)Bo Peng(Peking Univ)Course website:http:/group discussion:http:/First post the questions on the group discussion forum;if questions are unanswered,bring them to the office hours(first office hour:June 23,2:30-4:30p
7、m),2008 ChengXiang Zhai Dragon Star Lecture at Beijing University,June 21-30,2008,7,Format&Requirements,Lecture-based:Morning lectures:Foundation&TrendsAfternoon lectures:IR research methodologyReadings are usually available online 2 Assignments(based on morning lectures)Coding(C+),experimenting wit
8、h data,analyzing results,open explorations(5 hours each)Final exam(based on morning lectures):1:30-4:30pm,June 30.Practice questions will be available,2008 ChengXiang Zhai Dragon Star Lecture at Beijing University,June 21-30,2008,8,Format&Requirements(cont.),Course project(Mini-TREC)Work in teamsPha
9、se I:create test collections(3 hours,done within lecture period)Phase II:develop algorithms and submit results(done in the summer)Research project proposal(based on afternoon lectures)Work in teams2-page outline done within lecture periodfull proposal(5 pages)due later,2008 ChengXiang Zhai Dragon St
10、ar Lecture at Beijing University,June 21-30,2008,9,Coverage of Topics:IR vs.TIM,Text Information Management(TIM),Information Retrieval(IR),Multimedia,etc,IR and TIM will be used interchangeably,2008 ChengXiang Zhai Dragon Star Lecture at Beijing University,June 21-30,2008,10,What is Text Info.Manage
11、ment?,TIM is concerned with technologies for managing and exploiting text information effectively and efficientlyImportance of managing text informationThe most natural way of encoding knowledgeThink about scientific literatureThe most common type of informationHow much textual information do you pr
12、oduce and consume every day?The most basic form of informationIt can be used to describe other media of informationThe most useful form of information!,2008 ChengXiang Zhai Dragon Star Lecture at Beijing University,June 21-30,2008,11,Text Management Applications,Access,Mining,Organization,Select inf
13、ormation,Create Knowledge,Add Structure/Annotations,2008 ChengXiang Zhai Dragon Star Lecture at Beijing University,June 21-30,2008,12,Examples of Text Management Applications,SearchWeb search engines(Google,Yahoo,)Library systemsRecommendationNews filterLiterature/movie recommenderCategorizationAuto
14、matically sorting emailsMining/ExtractionDiscovering major complaints from email in customer serviceBusiness intelligenceBioinformaticsMany others,2008 ChengXiang Zhai Dragon Star Lecture at Beijing University,June 21-30,2008,13,Elements of Text Info Management Technologies,Search,Text,Filtering,Cat
15、egorization,Summarization,Clustering,Natural Language Content Analysis,Extraction,Mining,Visualization,RetrievalApplications,MiningApplications,InformationAccess,KnowledgeAcquisition,InformationOrganization,2008 ChengXiang Zhai Dragon Star Lecture at Beijing University,June 21-30,2008,14,Text Manage
16、ment and Other Areas,TM Algorithms,User,Text,StorageCompression,Probabilistic inferenceMachine learning,Natural language processing,Human-computer interaction,TM Applications,Software engineeringWeb,Computer science,InformationScience,2008 ChengXiang Zhai Dragon Star Lecture at Beijing University,Ju
17、ne 21-30,2008,15,Related Areas,InformationRetrieval,Databases,Library&InfoScience,Machine LearningPattern RecognitionData Mining,NaturalLanguageProcessing,ApplicationsWeb,Bioinformatics,StatisticsOptimization,Software engineeringComputer systems,Models,Algorithms,Applications,Systems,2008 ChengXiang
18、 Zhai Dragon Star Lecture at Beijing University,June 21-30,2008,16,Publications/Societies(Incomplete),ACM SIGIR,VLDB,PODS,ICDE,ASIS,Learning/Mining,NLP,Applications,Statistics,Software/systems,COLING,EMNLP,ANLP,HLT,ICML,NIPS,UAI,RECOMB,PSB,JCDL,Info.Science,Info Retrieval,ACM CIKM,Databases,ACM SIGM
19、OD,ACL,ICML,AAAI,ACM SIGKDD,ISMB,WWW,SOSP,OSDI,TREC,2008 ChengXiang Zhai Dragon Star Lecture at Beijing University,June 21-30,2008,17,Schedule:available at http:/,2008 ChengXiang Zhai Dragon Star Lecture at Beijing University,June 21-30,2008,18,2008 ChengXiang Zhai Dragon Star Lecture at Beijing Uni
20、versity,June 21-30,2008,19,Essential Backgroud 1:Probability&Statistics,2008 ChengXiang Zhai Dragon Star Lecture at Beijing University,June 21-30,2008,20,Prob/Statistics&Text Management,Probability&statistics provide a principled way to quantify the uncertainties associated with natural languageAllo
21、w us to answer questions like:Given that we observe“baseball”three times and“game”once in a news article,how likely is it about“sports”?(text categorization,information retrieval)Given that a user is interested in sports news,how likely would the user use“baseball”in a query?(information retrieval),
22、2008 ChengXiang Zhai Dragon Star Lecture at Beijing University,June 21-30,2008,21,Basic Concepts in Probability,Random experiment:an experiment with uncertain outcome(e.g.,tossing a coin,picking a word from text)Sample space:all possible outcomes,e.g.,Tossing 2 fair coins,S=HH,HT,TH,TTEvent:ES,E hap
23、pens iff outcome is in E,e.g.,E=HH(all heads)E=HH,TT(same face)Impossible event(),certain event(S)Probability of Event:1P(E)0,s.t.P(S)=1(outcome always in S)P(A B)=P(A)+P(B)if(AB)=(e.g.,A=same face,B=different face),2008 ChengXiang Zhai Dragon Star Lecture at Beijing University,June 21-30,2008,22,Ba
24、sic Concepts of Prob.(cont.),Conditional Probability:P(B|A)=P(AB)/P(A)P(AB)=P(A)P(B|A)=P(B)P(A|B)So,P(A|B)=P(B|A)P(A)/P(B)(Bayes Rule)For independent events,P(AB)=P(A)P(B),so P(A|B)=P(A)Total probability:If A1,An form a partition of S,thenP(B)=P(BS)=P(BA1)+P(B An)(why?)So,P(Ai|B)=P(B|Ai)P(Ai)/P(B)=P
25、(B|Ai)P(Ai)/P(B|A1)P(A1)+P(B|An)P(An)This allows us to compute P(Ai|B)based on P(B|Ai),2008 ChengXiang Zhai Dragon Star Lecture at Beijing University,June 21-30,2008,23,Interpretation of Bayes Rule,Hypothesis space:H=H1,HnEvidence:E,If we want to pick the most likely hypothesis H*,we can drop P(E),P
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 计划 课程 信息 检索 CourseOverviewBackground

链接地址:https://www.31ppt.com/p-5472895.html