Data warehouse or Data warehouse. Collection of data directed to a domain, integrated, non – volatile, and which varies in time . It helps the decision making of the company or organization. A data warehouse is, above all, a company file beyond transactional and operational information , stored in a database designed to favor the efficient analysis and disclosure of data (especially OLAP ).
[ hide ]
- 1 Features
- 2 Definitions
- 1 According to Bill Inmon
- 2 According to Ralph Kimball
- 3 Function
- 4 Structure
- 1 Information Cubes
- 2 Dimensions
- 3 Variables
- 5 Architecture of a Data Warehouse
- 6 Elements that make up a Data Warehouse
- 1 Metadata
- 2 Middleware
- 3 API
- 4 Extraction Mechanisms
- 5 Loading Mechanisms
- 7 Design of a Data Warehouse
- 8 See also
- 9 References
Data storage should not be used with current usage data. Data warehouses often contain large amounts of information that are sometimes subdivided into smaller logical units that are known as data-dependent malls.
Generally, two basic ideas guide the creation of a data warehouse:
- Integration of data from distributed and differently structured databases, facilitating comprehensive overview and comprehensive data warehouse analysis.
- Separation of the data used in daily operations from the data used in the warehouse for the purposes of disclosure, aid in decision making for analysis and control.
Periodically, data is imported from Enterprise Resource Planning Systems ( ERP ) and other business-related software systems into the data warehouse for further transformation. It is common practice to “perform” data before combining it in a warehouse.
Data Warehouse architecture.
The preprocessing program reads the effected data (often a business’s primary OLTP databases), performs qualitative preprocessing or filtering (denormalization, if deemed necessary including), and writes it to the store.
According to Bill Inmon
Bill Inmon was one of the first authors to write on the subject of data warehouses , he defines a data warehouse in terms of the characteristics of the data repository:
- Topic Oriented: The data in the database is organized so that all data elements related to the same event or real-world object are linked together.
- Variant over time: Changes produced in the data over time are recorded so that the reports that can be generated reflect those variations.
- Non-volatile: The information is not modified or deleted, once a data has been stored, it becomes read-only information and is kept for future reference.
- Integrated: The database contains data from all operational systems of the organization, and such data must be consistent.
Inmon defends a top-down methodology when designing a data warehouse, since in this way all corporate data will be considered better. In this methodology, Data marts will be created after the complete data warehouse of the organization has finished.
According to Ralph Kimball
Ralph Kimball is another well-known author on the subject of data warehouses. It defines a data warehouse as: “a copy of the data transactions specifically structured for consultation and analysis”. It was also Kimball who determined that a data warehouse was nothing more than “the union of all the Data marts of an entity”. Therefore, it defends a bottom-up methodology when designing a data warehouse.
A data warehouse must deliver the right information to the right people at the right time in the right format. The data warehouse responds to the needs of knowledgeable users , using decision support systems (DSS), Executive Information Systems (EIS) or tools for consultation or reporting. End users can easily query their data stores without touching or affecting the operation of the system.
The environment of a data warehouse is defined by the sum of the different integrated DataMarts, not only on a physical level but also on a logical level.
An Information Cube or DataMart is a logical view of the raw data , of the data provided by your operations / finance system to the data warehouse with the addition of new dimensions or calculated information. They are called DataMart, because they represent a set of data related to a particular topic such as: sales, operations, human resources, etc., and are available to “customers” who may be interested in it.
This information can be accessed by the Executive (Owner) through “Dynamic Tables” of MS- Excel or customized programs. Pivot Tables allow you to manipulate the views (junctions, filtering, organization) of the information very easily. Information cubes (DataMarts) are produced very quickly. The necessary access security rules apply to them.
Strategic information is classified into: dimensions and variables. The analysis is based on the dimensions and therefore it is called: Multidimensional analysis. Bringing these concepts to a DW, a Data Warehouse is a data collection that is made up of dimensions and variables, understanding as dimensions those elements that participate in the analysis and variables to the values that you want to analyze.
Variables are typically represented by numerical and detailed values for each instance of the object or event measured.
Conversely, dimensions are attributes relative to the variable, and are used to sort, group, or abbreviate their values. The dimensions have a smaller granularity and take as values a set of elements less than that of the variables.
They are attributes related to variables. They are the perspectives of analysis of the variables. They are part of the Dimension Table.
Also called “management indicators”, they are the data that is being analyzed. They are part of the Fact Table. More formally, variables represent some quantifiable or measurable aspect of the objects or events to be analyzed.
Data Warehouse Architecture
The functional blocks correspond to a complete information system that uses a data warehouse:
- Operational level: Contains primitive (operational) data that is being permanently updated, used by traditional operational systems that perform transactional operations.
- Data Warehouse: Contains primitive data corresponding to successive loads of the Data Warehouse and some derived data. Derived data is data generated from primitive data by applying some type of processing (summaries).
- Departmental level (Data Mart): Contains almost exclusively derived data. Each department of the company determines its departmental level with information of interest at that level. It will be the output target on which the data in the warehouse is organized and stored for direct consultation by end users, report developers and other applications.
- Individual level: Contains little data, the result of applying heuristics, statistical processes, etc., to the data contained in the previous level. The individual level is the ultimate goal of a Data Warehouse. From this level the end user will access and different hypotheses may be proposed, as well as navigate through the data contained in the Data Warehouse.
Elements that make up a Data Warehouse
One of the most important components of the architecture of a DW is the Metadata. It is commonly defined as: “data about data”, in the sense that it is data that describes the structure of the data and how they are related.
The Metadata documents exactly, among other things, what tables exist for that application, what columns each table has, and what type of data can be stored. The data is of interest to the end user , the Metadata is of interest to the programs that have to handle this data.
However, the role Metadata plays in a DW environment is very different from the role it plays in operational environments. In a DW environment the Metadata plays a fundamental role.
Middleware’s role is to ensure connectivity between all components of a DW architecture . Middleware can be seen as an API layer, on the basis of which developers can develop applications that work in different environments without worrying about the network and communication protocols in which they will be run. In this way, a better cost / performance ratio is offered that goes through the development of more complex applications, in less time.
Application Programmer Interface. Language and message format used by a program to activate and interact with the functions of another program or a physical equipment. middleware Ensures connectivity between all components of a computing infrastructure. It is the structure to link all applications in an integrated way.
Another component of the architecture of a DW is OLAP systems. These types of systems are aimed at carrying out strategic analyzes of the information contained in a DW in an ad-hoc manner. Strategic analyzes require a dynamic and multidimensional vision of the information that is different from that found in OLTP systems.
This type of analysis is aimed at processing large volumes of data so that the evolution of the business can be measured over time, by making comparisons, studying indicators, deviations, etc. This requires the possibility of Top Down analysis, that is, these systems must have the necessary dynamism to allow the reformulation of the query made according to the analysis of the results obtained in the first instance.
There are two basic ways to carry out this task:
- Simple Accumulation. Simple accumulation is undoubtedly the simplest and most common, and consists of summarizing or summarizing all the transactions included in the selected period of time and transporting the result as a single transaction to the DW.
- Rolling. The Rolling process, for its part, is applied in cases where it is chosen to maintain various levels of granularity. For this, summary information is stored at different levels, corresponding to different groupings of the time unit.
Design of a Data Warehouse
To build a Data Warehouse you need tools to help the migration and transformation of the data to the data warehouse. Already built, means are required to handle large volumes of information. Depending on the internal structure of the data warehouse and especially the type of queries to be made, the architecture of the data warehouse is designed .
With this criterion, the data must be distributed among numerous Data Marts. In order to tackle a Data Warehouse project, it is necessary to carry out some general issues of the Organization:
- Current Environment: Any proposed Data Warehouse solution must be highly oriented to business needs and must be compatible with the company’s existing and planned technical architecture.
- Business Environment: It is essential to have exact knowledge about the type of business of the Organization and the support that the information represents within its entire decision-making process.
- Technical Environment: Both the hardware environment aspect must be included: mainframes, servers, networks, as well as applications and tools. Decision Support Systems will be emphasized, if they currently exist, how they operate, etc.
- User expectations. : A Data Warehouse project is not a technological project, it is a way of life for organizations and as such, it must have the support of all users and their belief in its goodness.
- Stages of Development: With prior knowledge, the development of a conceptual strategy for the construction of a DW is already underway.
- Business Environment: It is essential to have exact knowledge about the type of business of the organization and the support that the information represents within its entire decision-making process.
- Prototype: A prototype is an effort to simulate as much as possible the product that will be delivered to users.
- Pilot: The Data Warehouse pilot is simply the first of many iterative efforts to be made to build a Data Warehouse.
- Technological Concept Test: This is an optional step that may be needed to determine if the specified DW architecture will ultimately perform as expected.