A prototype model for data warehouse security based on metadata.doc
A Prototype Model for Data Warehouse Security Based on MetadataN. Katic1 G. Quirchmayr2 J. Schiefer1 M. Stolba1 A M. Tjoa11Institute of Software Technology (E188)Vienna University of TechnologyResselgasse 3/188, A-1040 ViennaAustriakatic, stolba, js, tjoa ifs.tuwien.ac.at 2Institute of Applied Computer Science and Information SystemsUniversity of ViennaLiebiggasse 4, A-1010 ViennaAustriaAbstractThe aim of this paper is to give an overview of security relevant aspects of existing OLAP/Data Warehouse solutions, an area which has seen rather little interest from product developers and is only beginning to be discussed in the research community. Following this description of the current situation, a metadata driven approach implemented as part of the WWW-EIS-DWH project is presented in detail. The prototype focuses on the technical realisation and is intended not to be open for use in different security policies.1. IntroductionFor a wide range of companies, in both private and public sectors, competitiveness and effectiveness depend on the quality of decision making; so it is of no surprise that many are looking to improve the quality of their decisions by learning from past business transactions and decisions. Accumulated data represent a priceless business asset.Providing analysts with wide access to the mass of corporate data in a data warehouse requires the organisation and integration of heterogeneous data in a heterogeneous environment. Furthermore the production of derived aggregated data of the data warehouse requires the continuous maintenance of integrity.These requirements imply several security issues to ensure that it is just authorised users who benefit from relevant data and that no unauthorised sources are used. Under many jurisdictions it is illegal to merge personal data unless anonymity can be ensured and especially in Europe stricter legislation has been proposed and is under review based on OECD and EC recommendations.The aim of this paper is to discuss requirements and impacts on the selection of an adequate security model for a data warehouse environment. This security model should support such features as controlled access to individual data items, selective encryption and patented security processes. For the right choice of the security model it is important to pay attention to the metadata of the data warehouse. They contain security information such as access rules, classifications of security objects or clearances of security subjects.This paper is composed of two parts: a theoretical part which deals with security in data warehouses in general and part 2, a description of an implementation of a security model prototype for a data warehouse environment based on metadata.2. Data Warehouse & Security A data warehouse is a collection of integrated databases designed to support managerial decision-making and problem-solving functions. It contains both highly detailed and summarised historical data relating to various categories, subjects, or areas 4. All units of data are relevant to appropriate time horizons. The data warehouse is an integral part of the enterprise-wide decision support system and does not ordinarily involve data updating. It empowers end-users to perform data access and analysis. It also gives an organisation certain competitive advantages, such as fostering a culture of information sharing, enabling employees to effectively and efficiently solve dynamic organisational problems, minimising operating costs and maximising revenue, attracting and maintaining market shares, and minimising the impact of employee turnovers. The security requirements of the data warehouse environment are similar to those of other distributed computing systems 3. Thus, having an internal control mechanism to ensure the confidentiality, integrity and availability of data in a distributed environment is of paramount importance. Confidentiality denotes the protection of information from unauthorised disclosure either by direct retrieval or by indirect logical inference. Integrity requires data to be protected from malicious or accidental modification, including the insertion of false data, the contamination of data, and the destruction of data. Availability is the characteristic that ensures data being available to authorised users when they need them. Availability is closely related to integrity. It includes denial of service of a system, i. e. a system is not functioning in accordance with its intended purpose 2.2.1. Security RestrictionsA data warehouse by nature is an open, accessible system. The aim of a data warehouse generally is to make large amounts of data easily accessible to users, thereby enabling them to extract information about the business as a whole. Any security restrictions can be seen as obstacles to that goal, and they become constraints on the design of the warehouse.There may be sound business reasons for any security restrictions applied to the data warehouse, but it is worth noting that they may lead to a potential loss of information. If analysts have restricted access to data in the data warehouse it may be impossible for them to get a complete picture of the trends within the analysed area. Checking security restrictions will of course have its price by affecting the performance of the data warehouse environment, because further security checks require additional CPU cycles and time to perform.2.2. Security RequirementsSecurity requirements describe all security conditions that have to be considered in the data warehouse environment.It is important to determine in an early stage any security requirements that will be enforced in the data warehouse, because they can seriously impair the organisation and design of the warehouse. It is very difficult to add security restrictions after a data warehouse has gone live. So it is important to capture the ultimate security requirements at the beginning and make them part of the system design. The first step for the definition of security requirements is to classify the security objects and security subjects of the data warehouse environment. Security objects can be classified in different ways. Which solution is suitable depends on the security level which should be achieved. Qualified classifications would be classification by sensitivity (public, confidential, top secret) or according to job functions (accounting data, personnel data). As with security objects, there is a number of ways in which security subjects can be classified. We can follow a top-down company view, with users classified by department, section, group, and so on. Another possible classification is role based, with people grouped across departments based on their role. This approach would classify all analysts as one group, irrespective of their department. If each department genuinely accesses different data, it is probably better to design the security access for each department separately. 2.2.1. Legal Requirements. It is vital to consider all legal requirements on the data being stored in the data warehouse. If individual customer data are being held, such as account details in a banking data warehouse, it may be required by law to enforce certain restrictions. In this context the following issues are to be clarified:· Which arrangements have to be made for being allowed to hold legally sensitive data? · Which data are subjected to legal restrictions? · Which separate handling does this data require concerning storage, access and maintenance? · Which analyses may be performed on this data? · If data held online is used for trend analysis, and is therefore held in summarised rather than detailed form, do any legal restrictions apply? · Which data may be used only for the companies own purposes and which data may be passed on third parties? · Can the analysis of legally sensitive data be limited in a way that no legal restrictions apply? These issues explain why the administrator of a data warehouse must have special know-how about the legal and business field in order to identify legally sensitive data and to accordingly limit the access to this data.2.2.2. Audit Requirements. Resulting audit information is the basis for further reviews and examinations in order to test the adequacy of system controls and to recommend any changes in the security policy.Auditing is a security feature that is often mandated by organisation. Given the high volume of data involved in a data warehouse, auditing can cause an extremely heavy overhead on the system. To make up for this overhead more hardware will be needed. Basically the following activities are interesting for auditing:· Connections· Disconnections· Access to data· Change of data· Deletion of dataFor each of these activities it may be necessary to audit success, failure or both. For security reasons the auditing of failures can be particularly important, since it can highlight any attempted unauthorised or fraudulent access.If data access is to be audited, it has to be established whether each access is to be audited separately, or whether it is sufficient to audit the fact that a user has accessed specific tables during a session. This has impacts on the audit information that needs to be held and implicitly avoids both space and I/O overhead.If data changes are being audited, it has to be determined whether it is sufficient to audit the fact that a change occurred, or whether it is required to capture the actual change that was made. 2.2.3. Network Requirements. Network requirements are a further important part of security requirements. For the transfer of data from the source system (usually an operational system) into a data warehouse they must mostly be transmitted over a network. For such a data transfer precautions must be taken, in order to retain the confidentiality and integrity of the data.It must be clarified and proved whether data have to be encoded before transmission into the data warehouse to prevent a manipulation during the transfer. If the data are transmitted for example over a public network, a secure connection between source system and data warehouse has to be constructed to transmit data in encoded form. The expenditure for data encryption and decryption can be very high regarding processing speed and delay. Particularly with large quantities of data this factor can affect the system performance of the source system as well as of the data warehouse system negatively.A further substantial fact is the reliability of the data communication. It should be guaranteed that data are transferred error free from the source system into the data warehouse. Connection interruptions should be prevented as far as possible, since incomplete transfers threaten the integrity of data in the data warehouse. Therefore measures must be taken, which make possible a complete rollback of the entire transfer process in case of an incomplete data transmission. 2.2.4. Increasing comfortability of use through access authorisation. The primary goal of security restrictions and access authorisation in data warehouses is the prevention of disclosure of protected data.However, authorisations in data warehouses have in contrast to operational systems a second important function: Users should and want to see mainly only “their“ (relevant) data of the information system, which are based on a data warehouse, since they then penetrate more directly to the information important for their daily work. The work with the data warehouse system thus becomes more comfortable by the fact that only data are offered, which are important for a certain user.2.3. Metadata & SecurityAn essential part of a data warehouse are metadata - data about the data contained within the data warehouse. Without metadata, locating information contained in the data warehouse becomes a daunting task, akin to searching for a person's telephone number without the aid of a telephone directory. Metadata not only describe the contents of the data warehouse but also provide the user with information useful in judging the quality of the content 1. They might describe each fact contained within the warehouse in terms of when it was last updated, the source of the fact and how it is derived from an organisation's operational systems. Metadata may also identify the hierarchies within dimensions (for example, identifying the sales territories that fall into each region).The role of metadata is rapidly expanding as organisations develop a data warehouse strategy that may result in the creation of operational data stores, integrated data warehouses and multiple data marts. The data warehouse is increasingly multiple databases that have common elements but serve different functions. Metadata must describe the enterprise warehouse even if it is no longer a single database residing on one server. The role of metadata is being redefined as providing data about distributed information resources. Metadata must isolate the user from the complexities of accessing distributed information resources, while facilitating the currency and synchronisation of multiple databases. Failing this, users are confronted with precisely the problems that data warehousing was intended to solve. Different answers to the same question and the resulting lack of confidence in the information obtained are just one example.Metadata management is provided via a metadata repository and accompanying software. Metadata repository management software can be used to map the source data to the target database, generate code for data transformations, integrate and transform the data, and control moving data to the warehouse. This software enables users to specify how the data should be transformed, such as data mapping, conversion, and summarisation.Metadata can also describe security mechanisms in a data warehouse environment. In this case access rules with corresponding information about security objects and subjects are stored as metadata. Security subjects are responsible for changes in the data warehouse and cause information to flow within different objects and subjects. When a user accesses data of the data warehouse, the secure query management layer has to check whether this access is allowed or not. To ensure that, it verifies the corresponding access authorisations by analysing the security metadata.In the following section a prototype is introduced which implements a security model based on metadata. It is an implementation of a security manager for administration, definition and description of users and user groups and a secure query management layer (SQML) which has the task to filter user queries by checking if they are allowed to be performed. The prototype is integrated into the WWW-EIS-DWH infor