A source table has an individual and corporate customer. The requirement is that an ETL process should take the corporate customers only and populate the data in a target table. The test cases required to validate the ETL process by reconciling the source (input) and target (Output) data. The transformation rule also specifies that output should only have corporate customers. Show Physical Test 1: Count the corporate customers from the source. Count the customers from the target table. If the ETL transformation is correct the count should be an exact match. This is pretty good test however the count might misguide if the same record is loaded more than once as it cannot distinguish between each customer. Physical Test 2: Compare each corporate customer from the source to the customer on target. This kind of reconciliation can be done at the row or attribute level. Hence, it will not only prove the validation of counts but will also prove that the customer is exactly the same on both sides. There are various permutations and combination of these types of rules including advancing complexity. Also, this example proves that the concepts for ETL testing are totally different from that of GUI based software testing. How to do ETL Testing with iCEDQ?As discussed earlier in the article – ETL testing Vs. Application Testing, ETL process are background jobs that cannot be tested with conventional QA tools. They need specialized software. In the drawing below, we have a bunch of ETL processes that are reading, transforming and loading customer, orders and shipment data. We will take these examples and then create test cases and rules in iCEDQ to certify the processes. The examples below will also clear the thought process and the principles behind ETL testing. Data Pipeline with Seven (7) ETL ProcessesETL1 This process is loading customer data from a file to a staging table (CUSTOMER STAGE). See Test Rule 1 and Test Rule 2.ETL2 This process is loading sales transaction data from OMS (Source B) into table SALES TRANSACTION.ETL3This process is loading product rating data from OMS (Source B) into table PRODUCT RATING.ETL4 Product rating data is also sourced from an additional data source C (Market Data Vendor) into another PRODUCT RATING table.ETL5 An ETL process is loading data the table SHIPMENT from a Shipment System (Source D).ETL6 This process is reading data from CUSTOMER STAGE table and population into CUSTOMER DIM table.ETL7 This ETL process is more complex than other ETL process. It is reading data from SALES TRANSACTION table and populating ORDER and DAILY SALES SUMMARY table.
iCEDQ ETL Testing Rule ExplanationTest Rule 1: Source Data Validation Ensure that the incoming data has valid data and data is in correct format.This data validation of customer file is needed to ensure that ETL1 gets correct data. What? This rule tests if the data in the source is valid. In the customer file:
Why? Even though the downstream ETL processes are not responsible for incoming upstream data from source system, still it is important to validate the source data because;
Source Vs Target Data Reconciliation – ETL Testing Ensure data is copied (transported) from system A to system B correctly.Source target reconciliation will certify that the ETL1 has not dropped data or added extra data in the processes of copying the data from file to Stage. What? This rules teste if the ETL Process (1) has loaded the source data correctly to target. In this case CUSTOMER FILE is loaded in staging area.
Why? Usually such kind of tests need to connect across two different systems. In this case its between file server and database. It is impossible to implement this rule without an iCEDQ’s ETL Testing and Automation platform. Consider the alternative:
Data Reconciliation Across Two Sources – ETL Testing Raise alarm, if same data coming from two different sources do not match.This ETL Testing rule makes sure that the data processed by ETL3 and ETL4 from two different system is within acceptable tolerance. What? Product Ratings data is delivered by both source B (Order Management System) and source C (Market Data Vendor). Since the products are same
Why? A data validation rule can be setup to notify data that is outside the expected range and it can work. However, if the product rating data is important for business and there is another source that can provide data.
If another source is not available, then data reconciliation can be done with previous days data. This can capture sudden changes or if the previous days data is sent again by the source system. Business Validation Rule – ETL Testing Validate the data independent of the ETL process but based on business rules.Validate if the data is following the business rule. These rules are independent of ETL processes. Ex : Audit if the What? The order table is populated by ETL2 process. And it can be very complicated. However, a simple rule that the business is aware can suggest a data or processing issue.
Why? But why rely on the specification given by the ETL process.
The ETL process might be correct but the source itself might be providing wrong data. Business Reconciliation - ETL Testing Reconcile the data independent of ETL processing logic but based on Business Rules.Audit if the data between the two business areas is consistent and reconciles correctly. These Audit rules are independent of the ETL processes. What? ETL 2 is populating data in the Sales Transaction table and ETL 5 is populating data in the Shipment table.
Why? Regardless of the system is used to process data or store data the business rules are universal.
Reference Data Reconciliation – ETL Testing Reconcile the data based on entity relationship and independent of the ETL processing or data source.Audit if the factual data and reference data reconciles. What? Product table exists and doesn’t have an ETL process populating it. Probably is static table that get populated manually occasionally.
Why?
It is possible that Master data or Reference data is populated manually in the data warehouse. In such cases, if the operational system starts sending transactions then there could be data issues if the ETL process does not validate the FK. Physical Schema Recon – ETL Testing Reconcile data structures independently.This reconciliation makes sure that the physical schema matches between two systems. What?
Why?
As we see from the discussion above even though we call ETL testing; it is not just about ETL testing. An understanding of ETL, Data Structures as well as business context is necessary to effectively do ETL testing. Any delinquency on behalf of the tester or absence of a capable audit rules engine can and will cause havoc in your data centric projects. What are the types of validation in ETL?As you saw the general process of Testing, there are mainly 12 Types of ETL Testing types:. Production Validation Testing.. Source to Target Data Testing.. Metadata Testing.. Performance Testing.. Data Transformation Testing.. Data Quality Testing.. Data Integration Testing.. Report Testing.. What are the common ETL testing data validation scenarios?ETL Test Scenarios and Test Cases
Verify mapping doc whether corresponding ETL information is provided or not. Change log should maintain in every mapping doc. Validate the source and target table structure against corresponding mapping doc. Validate the name of columns in the table against mapping doc.
How would you validate data?Common types of data validation checks include:. Data Type Check. A data type check confirms that the data entered has the correct data type. ... . Code Check. A code check ensures that a field is selected from a valid list of values or follows certain formatting rules. ... . Range Check. ... . Format Check. ... . Consistency Check. ... . Uniqueness Check.. |