欢迎来到三一办公! | 帮助中心 三一办公31ppt.com(应用文档模板下载平台)
三一办公
全部分类
  • 办公文档>
  • PPT模板>
  • 建筑/施工/环境>
  • 毕业设计>
  • 工程图纸>
  • 教育教学>
  • 素材源码>
  • 生活休闲>
  • 临时分类>
  • ImageVerifierCode 换一换
    首页 三一办公 > 资源分类 > DOC文档下载  

    高级搜索方法毕业论文外文翻译.doc

    • 资源ID:2326129       资源大小:22KB        全文页数:4页
    • 资源格式: DOC        下载积分:8金币
    快捷下载 游客一键下载
    会员登录下载
    三方登录下载: 微信开放平台登录 QQ登录  
    下载资源需要8金币
    邮箱/手机:
    温馨提示:
    用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)
    支付方式: 支付宝    微信支付   
    验证码:   换一换

    加入VIP免费专享
     
    账号:
    密码:
    验证码:   换一换
      忘记密码?
        
    友情提示
    2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
    3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
    4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。
    5、试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。

    高级搜索方法毕业论文外文翻译.doc

    A 译文高级搜索方法搜索关键字方式:如输入“线性”和“代数”可以很容易出现成百上千篇的文献,其中有一些甚至可能与线性代数方面毫无关系的文章。如果我们增加搜索词的数量,而且要求所有的搜索词都匹配,然后我们就可以减少一些至关重要的文献文档被排除掉的风险。我们进行数据库的搜索时应该优先考虑那些含有频率相对较高的最为匹配的关键词的文档,而不是扩大搜索列表中的所有匹配的词。为了搜索到与向量X相关的文献,为达到这一目标,我们需要找到数据库的列矩阵A。将两个向量紧紧地联系起来的最好的方法就是定义向量之间的夹角。这个知识点我们将在第五章的第一节里学习到。在我们已经对奇异值分解的相关知识有所了解之后,我们也将重新审视信息检索应用程序(第六章,第五节)。这种分解可以更加简便的找到近似的数据库矩阵,这样将大大加快搜索。通常它有过滤掉噪音的好处,也就是说,使用近似版本的数据库矩阵可能会产生自动消除掉一些使用关键词在不必要的无关重要的文献文档上的作用。例如,一个牙科学生和一个数学的学生可能都会使用微积分作为他们的一个搜索词。因为数学的列表搜索使用近似数据库矩阵可能会消除掉所有关于牙科的文档。同样道理,数学文件将被过滤掉在牙科学生的搜索文件里。网络搜索和页面的排名现代网络搜索可以轻易涉及到含有成千上万的关键词的数十亿文档。事实上,截止2004年03月,就有超过四十亿个网页出现在互联网上,而且仅仅在单一的一天中对于通过搜索引擎获取或更新多大100个亿的网页这样的事是不常见的。虽然数据库矩阵对页面在网页上的作用是非常之大,但由于矩阵和搜索矢量备件搜索可以被大大的简化;也就是说,任何列中大部分的条目是0。互联网搜索引擎,更好的搜索引擎会做简单的匹配搜索来找到所有关键词的页面,但是他们不会在对关键词的相对频率的基础上有所要求。由于互联网的商业本质,人们要卖的产品可能会故意重复使用关键词来确保他们的网站排名较高的任何相对频率搜索。事实上,很容易地列出关键词的几百倍。如果单词的字体颜色配上页面的背景色,然后观众会不知道这个词是重复。用于网络搜索的更复杂的算法是必要的页面排名包含所有的矩阵模型概率分配在特定的随机过程。这种类型的模型称为马尔可夫过程或一个马尔可夫链。在第三节,我们将会看到第6章如何使用马尔可夫链模型的网页浏览和获取的网页排名。相对频率搜索 搜索的商业数据库通常找出所有包含搜索词的关键文件然后以基于相对频率的文件。在这种情况下,数据库条目矩阵应该代表的第六个字数是代数的所有数据库关键词和应用第八字的相对频率,在那里所有的单词按字母顺序排列。如果说,在数据库中,9号文件包含从词典共发生200次,如果关键词字代数发生10次在文档和Word中的应用发生了6次,然后对这些词的相对频率是10/200和6/200,数据库和相应的矩阵条目。 附录B 外文原文Advanced search methods待添加的隐藏文字内容2A search for the key words such as linear and algebra could easily turn up hundreds of documents, some of which may not even be about linear algebra. If we were to increase the number of search words and require that all search words be matched, then we could run a risk of excluding some crucial linear algebra documents. Rather than match all words of the expanded search list, our database search should give priority to those documents that match most of the key words with high relative frequencies. To accomplish this, we need to find the columns of the database matrix A that are “closest” to the search vector x. One way to measure how close two vectors are is to define the angle between the vectors. We will do this in Section 1 of Chapter 5. We will also revisit the information retrieval application again after we have learned about the singular value decomposition (Chapter 6, Section 5). This decomposition can be used to find a simpler approximation to the database matrix, which will speed up the searches dramatically. Often it has the added advantage of filtering out noise; that is, using the approximate version of the database matrix may automatically have the effect of eliminating documents that use key words in unwanted contexts. For example, a dental student and a mathematics student could both use calculus as one of their search words. Since the list of mathematics search using an approximate database matrix is likely to eliminate all documents relating to dentistry. Similarly, the mathematics documents would be filtered out in the dental students search.Web search and page ranking Modern Web searches could easily involve billions of documents with hundreds of thousands of key words. Indeed, as of March 2004, there were more than 4 billion Web pages on the Internet, and it is not uncommon for search engines to acquire or update as many as 10 billion Web pages in a single day. Although the database matrix for pages on the Internet is extremely large, searches can be simplified dramatically since the matrices and search vectors are spares; that is, most of the entries in any column are 0s. For Internet searches, the better search engines will do simple matching searches to find all pages matching the key words, but they will not order them on the basis of the relative frequency of the key words. Because of the commercial nature of the Internet, people that want to sell products may deliberately make repeated use of key words to ensure that their Web site is highly ranked in any relative frequency search. In fact, it is easy to surreptitiously list a key word hundreds of times. If the font color of the word matches the background color of the page, then the viewer will not be aware that the word is listed repeatedly.For Web searches a more sophisticated algorithm is necessary for ranking the pages that contain all of matrix model for assigning probabilities in certain random processes. This type of model is referred to as a Markov process or a Markov chain. In Section 3 of Chapter 6 we will see how to use Markov chain to model Web surfing and obtain rankings of Web pages.Relative frequency searches Searches of noncommercial databases generally find all documents containing the key search words and then order the documents based on the relative frequency. In this case the entries of the database matrix should represent the relative frequencies of all key words of the database the 6th word is algebra and the 8th word is applied, where all words are listed alphabetically. If, say, document 9 in the database contains a total of 200 occurrences of key words from the dictionary and if the word algebra occurred 10 times in the document and the word applied occurred 6 times, then the relative frequencies for these words would be 10/200 and 6/200, and the corresponding entries in the database matrix would .

    注意事项

    本文(高级搜索方法毕业论文外文翻译.doc)为本站会员(laozhun)主动上传,三一办公仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三一办公(点击联系客服),我们立即给予删除!

    温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。




    备案号:宁ICP备20000045号-2

    经营许可证:宁B2-20210002

    宁公网安备 64010402000987号

    三一办公
    收起
    展开