Which is a data warehouse that integrates data from databases across an entire enterprise group of answer choices?

Data Architecture

Rick Sherman, in Business Intelligence Guidebook, 2015

Physical Data Store Combinations

EDW data distribution schema, data marts, OLAP cubes, and any other SOA data stores are logical, not physical, and based on the data use case, one or more of these data stores may not need to be made a persistent physical data store.

Each of the data stores may actually be split into federated entities. For example, the EDW may be split into federated DWs based on such criteria as geographic regions, business functions, and business organizational entities or to support structured versus non-structured data.

On the other end of the spectrum, the entire set of data stores may be implemented on a single database platform, with each data store being represented as a schema within that database.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B978012411461600006X

Scalable Data Warehouse Architecture

Daniel Linstedt, Michael Olschimke, in Building a Scalable Data Warehouse with Data Vault 2.0, 2016

2.1.1 Workload

The enterprise data warehouse (EDW) is “by far the largest and most computationally intense business application” in a typical enterprise. EDW systems consist of huge databases, containing historical data on volumes from multiple gigabytes to terabytes of storage [4]. Successful EDW systems face two issues regarding the workload of the system: first, they experience rapidly increasing data volumes and application workloads and, second, an increasing number of concurrent users [5]. In order to meet the performance requirements, EDW systems are implemented on large-scale parallel computers, such as massively parallel processing (MPP) or symmetric multiprocessor (SMP) system environments and clusters and parallel database software. In fact, most medium- to large-size data warehouses could not be implementable without larger-scale parallel hardware and parallel database software to support them [4].

In order to handle the requested workload, there is more required than parallel hardware or parallel database software. The logical and physical design of the databases has to be optimized for the expected data volumes [6–8].

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128025109000027

Traditional Data Modeling Paradigms and Their Discontents

Ralph Hughes MA, PMP, CSM, in Agile Data Warehousing for the Enterprise, 2016

Summary

Every EDW team starting upon a new warehouse or major subject area is at a crossroads where they must choose to follow either traditional data modeling techniques or one of the new agile approaches. To understand the advantages of the agile techniques that are demonstrated in the following chapters, EDW team leaders must first understand the weaknesses of the two traditional approaches: standard normal forms and conformed dimensional forms.

The standard normal form implies a very traditionally structured data warehouse, one with an Integration layer and a Presentation layer. Designers will model a traditional Integration layer with tables in third, fourth, or fifth normal form. ETL will load this normalized Integration layer first before transforming it again to populate the star schemas of the Presentation layer, which better support user-friendly BI applications. The conformed dimensional data warehouse skips building much or all of the Integration layer in order to load the company’s operational data directly into star schemas.

Both of these modeling approaches lead to data warehouses that are very expensive to modify once data is loaded into their data repositories, making them brittle in the face of changing business requirements. In order to provide a data warehouse that can evolve as fast as the business context can change, EDW team leaders will need to draw upon an agile approach to DW/BI design. The alternative delivery and data modeling techniques that will make such “agile data engineering” possible are presented later, but in the next chapter we first consider some provisional agile solutions that can be achieved even without adopting a new data modeling technique.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123964649000126

The Enterprise Data Warehouse

Charles D. Tupper, in Data Architecture, 2011

Publisher Summary

An enterprise data warehouse is a strategic repository that provides analytical information about the core operations of an enterprise. It is distinct from traditional data warehouses and marts, which are usually limited to departmental or divisional business intelligence. An enterprise data warehouse (EDW) supports enterprise-wide business needs and at the same time is critical to helping IT evolve and innovate while still adhering to the corporate directive to “provide more functionality with less investment.” Organizations that implement enterprise data warehouse initiatives can expect that benefits like it provide a strategic weapon against the competition. The data that are needed to beat the competition to market are universally accessible in the structure needed to make agile business decisions. It addresses data governance and data-quality issues that profoundly limit the operational and strategic use of the cross-functional data. It also eliminates redundant purchasing of data. It addresses compliance requirements by validating and certifying the accuracy of the company’s financial data under Sarbanes-Oxley and other compliance requirements, improves alignment between IT and their business partners by enabling IT to deliver multiple initiatives, including data warehousing, data integration and synchronization, and master data management. These are all developed from the same data, and all of it can be propagated and reused for other purposes. It ensures cross-functional and cross-enterprise collaboration by guaranteeing that data are provided with relevant business context and meaning. A definitive meaning is ascribed for each context. It increases business productivity by leveraging integrated data for business decision queries and reports, thereby reducing delivery costs and time.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123851260000206

Architecture Framework

Rick Sherman, in Business Intelligence Guidebook, 2015

Data Warehousing Replaces the Data Warehouse

Over the years, many technologies and approaches to providing analytics have prompted people to proclaim the death of the data warehouse. Although these DW killers have been able to provide analytics, they have not been able to support enterprise-wide analytics with its accompanying need for consistent, comprehensive, clean, conformed, and current data. Their best results might have provided terrific analytics, but they did it in yet another silo (that then needed to be integrated with the other silos!).

The key fallacy of the push to replace the concept of separating reporting data from transactional data was that it was only being done for technological reasons. If that were true then data warehouses would have died long ago. The underlying reason for the separation is business and data needs. Business processes and applications have different business rules, data definitions, and transformations that create inconsistency. Data ages poorly and its completeness varies based on business need. Many of these differences need to be discovered by the BI team to be used in data integration and business intelligence applications. If only it was so easy that business people could just access all their data sources in a BI tool that would magically know what needed to be transformed and how, then that tool would replace a DW. But, of course, it is not that easy.

The classic EDW as depicted in Figure 4.2 is a single, centralized database. The data workflow includes:

•

Data being created, updated, and modified in the SORs

•

Data from SORs being integrated, transformed, and cleansed

•

Data being loaded into the EDW

•

Data being accessed by the BI tools for reporting and analysis

Advances in technology have prompted the evolution from the classic EDW model to more sophisticated data architecture. Figure 4.3 illustrates some of the other data stores that are being used today to replace an EDW-only structure. These data stores may include: ODS, MDM, data marts, OLAP cubes, staging data stores in addition to the EDW. The data stores’ characteristics vary: they may be persistent or transient; may be stored in a database, file structure, or memory; may be distributed or centralized; etc. Each of these data stores has specific use cases that an enterprise will leverage based on its needs. We will discuss these in more detail in the follow-on chapters in the architecture section.

FIGURE 4.3. Data architecture workflow.

This evolution from a single centralized EDW to a set of architectural options is what I call the shift to data warehousing, i.e., many data stores, from a data warehouse. One of the best practices for a BI data architecture is to have the EDW serve two different data roles: systems of integration (SOI) and systems of analytics (SOA). Figure 4.4 depicts the three roles that occur in the BI data architecture. The purpose of each role is as follows:

FIGURE 4.4. BI data architecture—roles of data systems.

•

Systems of Record (SOR)—data is captured and updated in operational and transactional applications. These applications are designated as the SOR so that people and processes know what the authorized sources are for any particular data subject. This implies an expectation level in regards to the integrity and legitimacy of the data. For example, an application would be designated as the SOR for accounting data.

•

System of Integration (SOI)—gathers, integrates, and transforms data from SORs into consistent, conformed, comprehensive, clean, and current information. Similar to the SOR, this designation implies a particular level of integrity and legitimacy of the integrated data. It also implies that if a person or process needs integrated data, then the SOI should be the source used.

•

System of Analytics (SOA)—provides business information that has been integrated and transformed to BI applications for business analysis. Similar to the SOR, this designation implies a particular level of integrity and legitimacy of the information being used in BI. Although BI applications will directly access SORs for operation reporting, if integrated and transformed data is needed, the SOA needs to be the source.

Note: The hub and spoke (or EDW to data marts) depicted in Figure 4.4 is a logical depiction of the BI architecture. This is done to simplify the diagram and focus on the data-related functions rather than display physical databases.

Just as the data sources depicted in Figure 4.4 are the SOR for operational processes, the BI architecture needs to establish the EDW as the SOI—where data gets integrated—and the SOA—where BI and analytical application go for integrated data.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124114616000046

Eliminating Risk Through Nested Iterations

Ralph Hughes MA, PMP, CSM, in Agile Data Warehousing for the Enterprise, 2016

Building an enterprise data warehouse following traditional techniques is fraught with risk, as empirical studies have shown that these projects fail more often than they succeed. Agile enterprise data warehousing (EDW) techniques mitigate this risk using three types of iterations, one stacked within another, with each style of iteration designed to detect a different type of hazard. At the lowest level, teams employ Scrum development iterations so that product owners can regularly review the application for coding concepts errors. On the next level, agile EDW teams hold a subrelease candidate review after every three or four iterations so that the project’s close stakeholders can review how application features map to the business problems they need to solve. Finally, EDW teams promote successful subrelease candidates into production so that end users can operate the software as part of their day-to-day activities, revealing flaws in the business concepts serving as the project’s high-level business goals.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123964649000060

Introduction to Data Warehousing

Daniel Linstedt, Michael Olschimke, in Building a Scalable Data Warehouse with Data Vault 2.0, 2016

1.2.1 Access

Access to the EDW requires that the end-users be able to connect to the data warehouse with the proposed client workstations. The connection must be immediate, on demand and with high performance [12, pxxiii]. However, access means much more for the users than the availability, especially the business users: it should be easy to understand the meaning of the information presented by the system. That includes the correct labelling of the data warehouse contents. It also includes the availability of appropriate applications to analyze, present and use the information provided by the data warehouse [12, p3].

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128025109000015

Why We Test and What Tests to Run

Ralph Hughes MA, PMP, CSM, in Agile Data Warehousing for the Enterprise, 2016

Summary

Deciding what to test for an enterprise data warehouse is challenging because of the complexity of the application. When one considers the large number of test types that a team could possibly execute, and then the combinations of those test types with the many modules to be tested, the resulting list far exceeds the quality work that the team has time or money to pursue. EDW teams need a framework to make quality planning a straightforward process and one that results in an economical but still robust validation process.

The agile approach is to perform both top-down and bottom-up planning and then to check that the two resulting plans support each other well. The top-down style asks the team to choose a small set of the most important test types and place them on a 2×2 matrix that combines the different audiences who wish to see test results versus the fundamental purpose of the tests. Teams can then reflect on whether the four quadrants of this 2×2 matrix are balanced. They can also consider how well it incorporates the six dimensions of testing, which include notions such as positive versus negative testing as well as progression versus regression tests.

Switching to the bottom-up path, the team should decide where to employ any of a dozen standard techniques for authoring unit test cases. It should also consider which of these can be implemented as reusable, parameter-driven test widgets that will save the team significant time in validating the lowest-level components of its warehouse. The team can also explore whether the test techniques selected for each type of ETL and BI units roll up easily into integration and system tests.

Finally, agile EDW teams should evaluate how well the two planning paths intersect and reinforce each other. They can consider whether the top-down notions of quality management, assurance, and control connect effectively with the integration and system tests that resulted from their bottom-up script consolidations. They can also ask where they can extend the test techniques employed for unit testing to validate more abstract notions such as epic- and theme-level stories. They can then factor in the remaining dimensions of testing to discover if notions such as negative and regression testing reveal oversights spanning the entire QA plan.

By understanding and authoring a quality plan from multiple perspectives, the agile EDW team can be reasonably assured that their plan is robust, actionable, and economical. This plan lists only test types, however. The next step is to plan how the test cases falling into those categories will actually get written, a topic we address in Chapter 17 when we consider the who, when, and where of agile EDW quality assurance planning.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123964649000163

Essential DW/BI Background and Definitions

Ralph Hughes MA, PMP, CSM, in Agile Data Warehousing for the Enterprise, 2016

Corporate Information Factory

The corporate information factory (CIF) is an enterprise data warehouse that follows a high-level data flow architecture advocated by Bill Inmon and Claudia Imhoff [Inmon & Imhoff 2001]. As popularly understood, a CIF gathers data from sources and transforms it into a repository in the integration layer of the reference architecture. From there, the information is subsetted out to departmental data marts, delivering the specific columns and rows needed by each one. In the CIF model, the data stored in the integration layer should be a “single version of the truth” within the company. Because most DW/BI designers suspect that duplicate information stored within a database inevitably allows data discrepancies to occur, most CIF integration layers are highly normalized because the normalization process leads to tables that make such redundancy impossible. The data in the integration layer is then de-normalized into a dimensionalized model and stored in an enterprise presentation layer of the warehouse. Data is later subsetted into small dimensional models as needed for specific users and is often structured to specifically support the needs of a particular class of data analysis, such as sales volumes and profitability.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123964649000047

Fully Agile EDW with Hyper Generalization

Ralph Hughes MA, PMP, CSM, in Agile Data Warehousing for the Enterprise, 2016

HGF Enables Model-Driven Development and Fast Deliveries

If we focus on just the creation of new enterprise data warehouses or at least adding new subject areas to an existing EDW, we can see that hyper generalization accelerates DW/BI deliveries in multiple ways, including the following:

•

Eliminating most of the logical and physical artifacts that other data modeling paradigms require

•

Allowing teams to build integration layers directly from a graphical business model

•

Enabling teams to update an existing data warehouse by making changes to the EDW’s graphical model

Eliminating Most Logical and Physical Data Modeling

Consider the logical data model for data integration layers that the hyper generalized paradigm utilizes, as shown in the top portion of Figure 15.1. That diagram depicts the logical data model for any enterprise data warehouse built using this approach, so for any DW/BI team building an enterprise data warehouse, the logical data modeling work is complete the minute they select their warehouse automation tool. The fact that data for the dimensional entities will be stored in either a table of associative triples or a table of name-value pairs means the physical data model for the nontransactional data is also already defined. Transaction tables will receive a structure that closely matches the format in which event data arrive to the data warehouse. For that reason, the physical data modeling for the EDW is also largely complete once the team has selected its automation tool. With the logical and physical data modeling reduced to a minimum, the development team can redirect its efforts elsewhere.

Controlling the EDW Design from a Business Model Diagram

The logical and physical models for a hyper generalized integration layer may be already set, but the records expressing the conceptual nature of the company’s information must still be entered into the model portion of the HGF repository. Where does the knowledge needed to make the correct entries into those entities come from? The answer is the business model for the EDW. The multiple forms that the hyper generalized data modeling paradigms use for storing things, links, and attributes makes it possible for the computer to read a graphical depiction of a business model and translate it into metadata entries in the model portion of the HGF repository. Moreover, once the EDW team supplements the business model with some business-level source-to-target column mappings, the data warehouse automation system can generate the ETL needed to capture the business data and translate it into instances of things, links, and attributes.

Figure 15.9 demonstrated how a simple entity diagram translates directly into records for the THING and LINK entities of an HGF data warehouse. Figure 15.13 depicts the diagram that a team would employ to define a larger portion of an enterprise data warehouse. This business model has been drawn using the business information modeler of the data warehouse automation system. The particular entities in the figure represent the standard normal form model shown in Figure 12.14 that serves as the starting point for the change cases I have been using to demonstrate the advantages of hyper modeled forms. The fifth normal form solution for dealerships has been included, but the fourth normal form violation still needs to be corrected. We will see how that violation is resolved using the HGF automation tool when we return to the four change cases later.

Figure 15.13. Data warehouse business model used for the change cases.

In Figure 15.13, 12 entities represent qualifier information the team wishes to capture, organized into six dimensions. The dark arrows point to the entities that hold the parent objects that dependent entities require and thus can be interpreted as equivalent of the foreign-key constraints used in relational database management system (DBMS) schemas. Two transaction data sets have been defined for the Sales Fact, one for sales made directly through the company’s own web and the other for sales made through partner sites. These transaction sets have slightly different but overlapping fields for measures defined—in particular, there are no discounts allowed for sales made through partner websites. The light arrows represent how these transaction records will connect to the dimensional information once the warehouse is loaded. For clarity, these links are shown for only one of the transaction data sets.

In this model, the developers have organized the qualifier entities into the dimensions they wish the final presentation layer to possess. The Sales Order and Ad Site entities will be denormalized into the Sales Dimension, for example, and the four components for dates will be consolidated into a Time Dimension. The company also desires to track subsidiary relationships between its customers, so the developers have declared a recursive relationship on the customer entity, with the dotted line indicating that some CUSTOMER instances may not have a parent record.

The entities show the attributes that the operational data will be able to provide. Similar to the examples in previous chapters, all customers will have values for names, social networking IDs, and their cities. The transactions data sets will both have quantities requested and installed, but only direct sales will have a measure for a discount on a sale.

Note that this model is expressed in business concepts. Every entity, attribute, and relationship drawn is a fact that the business subject matter experts working with the EDW developers can confirm or dispute as they review the diagram. This model, once drawn using the business modeling interface, can also be reviewed and interpreted by the HGF automation system. If the automation tool finds the business model complete and consistent, it will insert the records necessary to express that model into the logical entities shown in Figure 15.7. Once those configuration records have been inserted into the EDW’s physical repository, it is ready to receive qualifier data. The team can then build data loading routines to capture the data for the dimensions using extracts from the operational systems.

The fact that this model can be interpreted by both business partners and the DW/BI development tool takes enterprise data warehousing to a much higher level of IT-business alignment. Business assertions can be translated directly by the machine into a data store that will behave as the subject matter experts desire. Such direct translation of business knowledge not only eliminates logical and physical data modeling chores for the EDW developers but also prevents many time-consuming mistakes they can easily commit when following traditional development practices.

Driving Design Changes Using a Business Model

Perhaps more important, the HGF automation system allows the EDW developers to change the data warehouse’s structure by updating the very same business model they used to create the warehouse in the first place. When requirements change, the data warehouse administrators update the model and then publish the new version when they wish for it to take effect. The automation system will first retire and insert into the hyper generalized repository the new records needed to express the updated model. It will then adjust the dimensional data so that existing entities will comply with the newly declared relationship patterns from that date forward. When the presentation layer objects are refreshed, the EDW team can choose whether to portray the business dimensions as they were through the past or as they are now, given the new data model.

Figure 15.14 shows the details of how an updated diagram of the EDW’s business model alters the entries made in the HGF things and link repository. The EDW team decided that, as of 7-October, the company should be able to categorize orders into electronic commerce segments without regard to which website they originated from. Until then, the originating website determined which market segment an order represented. In the business modeler, this change requires removing the arrow between AD SITE and eSEGMENT and replacing it with a direct link between orders and segments. In the model entities of the repository, the automation tool should retire the LINK_TYPE record that rolls up AD SITE and eSEGMENT and insert another relating ORDER directly to eSEGMENT. The bottom of the diagram shows how the automation system will interpret this request into actual data management actions. The record with OID 6014 (linking 6012 Ad Sites to 6011 eSegments) is given an end date of 7-October, and a record 10071 linking 6013 Orders directly to 6011 eSegments is inserted to take effect from that date onward.

Figure 15.14. Example of how graphical model changes impact the associative data store.

Again, this update was accomplished without any logical and physical modeling, saving the development team a tremendous amount of time and effort. This direct link between the business model and the data warehouse’s capabilities allows the EDW team to fluidly respond to new realization regarding requirements, thus dramatically improving the DW/BI department’s agility. With the ability to fix quickly, a tremendous amount of EDW project risk has been eliminated. The business model no longer has to be perfect before the team can begin building the data warehouse, allowing teams to safely start the data warehouse with a modest subrelease and add on small increments with each development iteration.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123964649000151

Which is a data warehouse that integrates data from databases across an entire enterprise?

systems that integrate data from databases across an entire enterprise are called enterprise data warehouses (EDW).

What is the enterprise data warehouse?

An enterprise data warehouse (EDW) is a relational data warehouse containing a company's business data, including information about its customers. An EDW enables data analytics, which can inform actionable insights.

Which data warehouse modeling provides us enterprise wide data integration?

Enterprise Warehouse It supports corporate-wide data integration, usually from one or more operational systems or external data providers, and it's cross-functional in scope.