生物信息学数据库.ppt
《生物信息学数据库.ppt》由会员分享,可在线阅读,更多相关《生物信息学数据库.ppt(225页珍藏版)》请在三一办公上搜索。
1、Databases for Bioinformatics,陈艳炯医学院免疫与病原生物学系,数据库系统基础,数据库的基本概念数据管理系统的发展数据库技术的发展数据库系统的组成数据库应用系统体系结构,数据(Data),数据的定义 描述客观事物(对象)的符号记录数据的种类 文字、图形、图像、声音 数据的特点 数据与其语义是不可分的,Data,The term data means groups of information that represent the qualitative or quantitative attributes of a variable or set of variabl
2、es.Data(plural of datum,which is seldom used)are typically the results of measurements and can be the basis of graphs,images,or observations of a set of variables.Data are often viewed as the lowest level of abstraction from which information and knowledge are derived.,数据概念的变化特点质的规定:由简单到集成;由私有到共享。量的
3、刻化:由小量到大量到海量。所处位置:在软件中的从属地位到主导地位。,信息(Information)是以数据为载体的对客观世界实际存在的事物、事件和概念的抽象反应。信息=数据+数据处理,Data processing,Computer data processing is any process that uses a computer program to enter data and summarise,analyse or otherwise convert data into usable information.The process may be automated and run
4、on a computer.It involves recording,analysing,sorting,summarising,calculating,disseminating and storing data.Because data are most useful when well-presented and actually informative,data-processing systems are often referred to as information systems.,Data analysisWhen the domain from which the dat
5、a are harvested is a science or an engineering,data processing and information systems are considered too broad of terms and the more specialized term data analysis is typically used,focusing on the highly-specialized and highly-accurate algorithmic derivations and statistical calculations that are
6、less often observed in the typical general business environment.Data analysis packages like DAP,gretl or PSPP are often used.,Elements of data processing,In order to be processed by a computer,data needs first be converted into a machine readable format.Once data is in digital format,various procedu
7、res can be applied on the data to get useful information.Data processing may involve various processes,including:Data acquisition(数据采集)Data entry(数据录入)Data cleaning(数据清理)Data validation(数据验证)Data tabulation(数据制表)Statistical analysis(统计分析)Computer graphics(计算机图形)Data warehousing(数据存储)Data mining(数据挖掘
8、),Data acquisition,In computer data processing,data acquisition is the sampling of real world physical conditions and conversion of the resulting samples into digital numeric values that can be manipulated by a computer.The components of data acquisition systems include:Sensors that convert physical
9、 parameters to electrical signals.Signal conditioning circuitry to coerce sensor signals into a form that can be converted to digital values.Analog-to-digital converters,which convert conditioned sensor signals to digital values.Depending on the application,acquired data may be displayed,analyzed,or
10、 recorded,or some combination there of.Data acquisition applications may be controlled by commercial DAQ software or by custom programs developed using various general purpose programming languages such as BASIC or C.Specialized programming languages used for data acquisition include EPICS for build
11、ing large scale data acquisition systems,LabVIEW,which offers a graphical programming environment,and MATLAB which provides graphical tools and libraries for data acquisition and analysis.,Data cleansing or data scrubbing is the act of detecting and correcting(or removing)corrupt or inaccurate recor
12、ds from a record set,table,or database.Used mainly in databases,the term refers to identifying incomplete,incorrect,inaccurate,irrelevant etc.parts of the data and then replacing,modifying or deleting this dirty data.After cleansing,a data set will be consistent with other similar data sets in the s
13、ystem.The inconsistencies detected or removed may have been originally caused by different data dictionary definitions of similar entities in different stores,may have been caused by user entry errors,or may have been corrupted in transmission or storage.Data cleansing differs from data validation i
14、n that validation almost invariably means data is rejected from the system at entry and is performed at entry time,rather than on batches of data.The actual process of data cleansing may involve removing typographical errors or validating and correcting values against a known list of entities.The va
15、lidation may be strict(such as rejecting any address that does not have a valid postal code)or fuzzy(such as correcting records that partially match existing,known records).,A data entry clerk is a member of staff who reads hand-written or printed records and types them into a computer.They are some
16、times employed on a temporary basis,but most large companies which have large amounts of data will hire on a near-permanent basis.,In computer science,data validation is the process of ensuring that a program operates on clean,correct and useful data.It uses routines,often called validation rules or
17、 check routines,that check for correctness,meaningfulness,and security of data that are input to the system.The rules may be implemented through the automated facilities of a data dictionary,or by the inclusion of explicit application program validation logic.Incorrect data validation can lead to da
18、ta corruption or a security vulnerability.Data validation checks that data are valid,sensible,reasonable,and secure before they are processed.,Computer graphics are graphics created using computers and,more generally,the representation and manipulation of pictorial data by a computer.The development
19、 of computer graphics,or simply referred to as CG,has made computers easier to interact with,and better for understanding and interpreting many types of data.Developments in computer graphics have had a profound impact on many types of media and have revolutionized the animation and video game indus
20、try.,Data mining is the process of extracting patterns from data.As more data are gathered,with the amount of data doubling every three years,data mining is becoming an increasingly important tool to transform these data into information.It is commonly used in a wide range of profiling practices,suc
21、h as marketing,surveillance,fraud detection and scientific discovery.,数据结构(data structure)是计算机中存储、组织数据的方式。(Incomputer science,adata structureis a particular way of storing and organizingdatain acomputerso that it can be usedefficiently.)数据结构的逻辑表示与物理存储体现为数据的逻辑结构、存储结构、数据的处理方法(算法)与处理结果。,The two main st
22、ructures of a database are TABLES and INDEXES.Tables are the structures that store your data in the database.Each table is composed of a number of FIELDS,also known as COLUMNS in some database engines.Indexes do not store data,and you do not use them directly.They are used internally by the database
23、 engine to speed up certain search operations.,Field names and types are defined when you create a table.,In order to create an index you have to define the table and the field to be indexed,and the indexing order(Ascending or Descending).Indexes can also be UNIQUE,and in this case the indexed field
24、 does not allow duplicate data to be inserted in different records or rows(for example you could not have two employees with the same userid value if the userid field is being indexed as UNIQUE.),Data Manipulation(数据操作),数据操作Inserting,deleting and updating data分类、归并、排序、存取、检索和输入、输出、更新(包括插入、删除、修改)Adata
25、 manipulation language(DML)is a family of syntax elements similar to a computerprogramming languageused for inserting,deleting and updating data in adatabase.Structured Query Language(SQL),which is used to retrieve and manipulatedatain arelational database.IDMS used byIMS/DLI,CODASYLdatabases.,数据管理
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 生物 信息学 数据库
链接地址:https://www.31ppt.com/p-5368459.html