甲骨文《数据仓库概念》28页.ppt
,Xu Xin,Presales ConsultantOracle(China)Co.,Ltd.,数据仓库的概念,What is.,数据仓库(Data Warehouse)/数据集市(Data Mart)决策支持系统(Decision Support System)联机分析处理(OLAP)/ROLAP/MOLAP元数据(Meta Data)分析指标(Measure)/维(Dimension)星型模型(Star Schema)/雪花模型(Snow Schema)数据钻入/数据钻出(Drill Down/Drill Up)表旋转(Table Rotation)数据挖掘(Data Mining),数据仓库几大功能,Query/ReportDrill up/Drill DownCompareExceptionForcast,WhatifData Mining,数据仓库实施方法,建立数据仓库需要考虑的因素,扩展性 灵活性集成性可靠性,数据仓库专家的建议,需要业务人员的积极参与通过原型设计验证需求确定数据仓库的范围,不要试图Warehouse所有数据为不同需求选择合适工具控制风险利用外部Consultant的经验重点放在不同系统的集成,建立数据仓库举例,Use a Building Estate OLTP database as an example to illustrate the concepts and how to build a successful Data Warehouse which used to check and forecast the rental rate and sell amount in Hong Kong.,步骤1:确定数据仓库的问题范围,列出4月份香港地区每日房屋销售情况找出销售额大于4百万的居民住宅项目比较 Whampoo 和 Kornhill 地区上月销售情况找出售屋数量最多的前3个地区截止到当月的累计销售数量用图表反映最佳销售模式时间序列分析,确定数据仓库的问题范围,确定业务需求和用户需求:用户查询执行的频度系统保留数据的年限 用户主要希望从哪些角度,哪些层次分析数据数据源是哪些系统,步骤2:选择合适的软硬件平台,可靠的供应商 数据建模和管理工具易用性开放集中管理性能并行处理,选择数据库平台的依据:,前3位的考虑因素:易用性92.4%集中管理65.2%可靠的供应商65.2%,数据仓库的考虑因素,(Source:Data Warehouse Institute-February 96),MOLAP 还是 ROLAP?,ROLAP 和 MOLAP的功能区别,TransactionSystems,DecisionSupportSystems,Strategic,Tactical,MDB,RDBMS,Data Cache,linkage,步骤3:根据需要创建新的实体,#Code_no,No_of_transaction,Constructor_ID,Developer_ID,Building date,Purchase date,Purchase price,Address,Area,Apartment,#Code_no,#Transaction_no,Name/Company,HKID,Contact Phone#,Contact Address,Purchase Date,Purchase Price,Owner,#Code_no,#Flat,#Transaction_no,Name,HKID,Occupy_type(P,R),Contact Phone#,Contact Address,Date,Price,Occupant,Contractor_ID,Company Name,Address,Contact Phone#,Constructor,#Code_no,#Flat,No_of_trans,Type,Floor,Area(Building),Area(Actual),Flat Details,Developer_ID,Company Name,Address,Contact Phone#,Developer,Day,Month,Quarter,Year,Time,Territory,District,Region,Building/Estate,Geographic,Location,Type,Size,Area,Housing Types,步骤 4:确定维表删除不必要的表,步骤 5:建立层次结构,Date,1-Jan-94,13-Jun-95,12-Jan-96,12-Apr-96,15-Apr-96,20-Oct-96,20-Oct-96,12-Dec-96,1-Jan-97,31-Mar-97,15-Apr-97,?.,Time,Year,Quarter,Month,Day,Time Hierarchy,步骤 6:确定属性,Type Size AreaClass:Attributes of Housing Type,Housing Type,Occupant,Housing Type dimension lookup table,Attributes,步骤 7:建立FactTable,确定合适的粒度,Time,Location,Type,Area,Occupant Name,Purchase Price,Rent,?.,Sales Fact Table,步骤 8:建立数据仓库模型,Building Estate OLTP Environment,TimeLocationTypeAreaOccupant NamePurchase PriceRent?.,Sales Fact Table,DayMonthQuarterYear,Time,TerritoryDistrictRegionBuilding/Estate,Geographic Location,TypeSizeArea,Housing Types,#Code_no No_of_transaction Constructor_ID Developer_ID Building date Purchase date Purchase price Address Area,Apartment,#Code_no#Transaction_no Name/Company HKID Contact Phone#Contact AddressPurchase Date Purchase Price,Owner,#Code_no#Flat#Transaction_noNameHKIDOccupy_type(P,R)Contact Phone#Contact AddressDatePrice,Occupant,Contractor_IDCompany NameAddressContact Phone#,Constructor,#Code_no#Flat No_of_trans Housing Type Floor Area(Building)Area(Actual),Flat Details,Developer_IDCompany NameAddressContact Phone#,Developer,Transform,Building Estate Data WarehouseOLAP Environment,步骤 9:数据仓库模型优化,TimeLocationTypeAreaOccupant NamePurchase PriceRent?.,Sales Fact Table,TypeSizeArea,Housing Types,TimeLocationTypeAreaOccupant NamePurchase PriceRent?.,Sales Fact Table,DayMonthQuarterYear,Time,TerritoryDistrictRegionBuilding/Estate,Geographic Location,TypeSizeArea,Housing Types,Star schema,Snowflake schema,数据仓库设计优化的原则,避免数据实时汇总(建立汇总表)减少表连接操作(不要超过3-5个)用ID code作关键字 减少I/O竞争 利用分区技术提高性能和可管理性,估算数据仓库容量的算法,Estimated size of database=98*96*20*1000*0.75=141.12 Mb,步骤 10:从业务系统中抽取数据到数据仓库,数据抽取的要求:可访问各种数据源可满足时间要求 可满足数据转换要求可检测源系统中数据的变化,步骤 11:开发前端应用,步骤12:数据仓库的管理,安全管理备份和恢复高可用性数据时效,