Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Vom Data Vault ins Star Schema:Ladekonzepte müssen reifen
Dani Schnider & Andrea Kennel
BASEL | BERN | BRUGG | BUCHAREST | COPENHAGEN | DÜSSELDORF | FRANKFURT A.M. | FREIBURG I.BR. GENEVA | HAMBURG | LAUSANNE | MANNHEIM | MUNICH | STUTTGART | VIENNA | ZURICH
Dani• Senior Principal Consultant at Trivadis AG in
Glattbrugg/Zurich
• Trainer of several Trivadis courses
• Co-Author of Books “Data Warehousing mitOracle” and “Data Warehouse Blueprints”
• Oracle ACE
@dani_schnider danischnider.wordpress.com
DR. ANDREA KENNEL
Dozentin DatenbankenFachcoach ProjektmanagementFachhochschule NordwestschweizBrugg/Windisch
[email protected]@infokennel.chwww.infokennel.ch
Planung der Produktion und Controlling
Muss mal wieder Etiketten bestellen. Kannst Du mal schauen, was wir da in
letzter Zeit benötigt haben?Dann schau ich mal, wieviele
Flaschen wir je Etikettenfarbe und Monat ausgeliefert haben.
Je Jahr genügt. Aber kannst Du mirgleichzeitig den Alkoholgehalt
angeben, damit ich die Biersteuerprüfen kann? Ok. Also die Anzahl der gelieferten
Flaschen, je Jahr, je Etikettenfarbe und je Alkoholstufe.
Planung der Produktion und Controlling
Sind wir in den letzten Jahren auchüberregional bekannter geworden?
Hmm, da prüfe ich wohl, welcheEinnahmen wir je Jahr in welcher
Region erzielt haben.
Ja. Das hilft schon mal. Interessantwäre nun aber auch, welche Bierstile
wo gut ankommen. Dann werte ich für 2018 mal aus, wie vielFlaschen wir je Bierstil in welche Region
geliefert haben
619
8282 194
194
194
164
20K27K
496433K
357K357K
458K 715K
458K486K
458K
458K
458K
22
458K
458K
Source:How to Create a Data Vault Modelhttps://youtu.be/Q1qj_LjEawc
Core-Modell:Data Vault
Datenmodell Data Mart (SCD1)
Laden der Dimension MART_D1_CUSTOMER
• V_CURR_S_CUSTOMER_ADDRESS
CREATE OR REPLACE VIEW v_curr_s_customer_addressASSELECT * FROM (
SELECT s.*, MAX(load_date) OVER (PARTITION BY h_customer_key) last_load_date
FROM s_customer_address s)WHERE load_date = last_load_date
INSERT /*+ APPEND */ INTO MART_D1_CUSTOMERSELECTcust.H_Customer_Key,cust_adr.Last_Name,cust_adr.First_Name,cust_adr.Street,cust_adr.Street_No,cust_adr.ZIP_Code,cust_adr.City,bill_adr.ZIP_Code bill_ZIP_Code,bill_adr.City bill_City,cust_info.Private_Person,cust_info.Reseller
FROM H_CUSTOMER custLEFT JOIN V_CURR_S_CUSTOMER_ADDRESS cust_adr
ON (cust.H_Customer_Key = cust_adr.H_Customer_Key) LEFT JOIN V_CURR_S_BILLING_ADDRESS bill_adr
ON (cust.H_Customer_Key = bill_adr.H_Customer_Key) LEFT JOIN V_CURR_S_CUSTOMER_INFO cust_info
ON (cust.H_Customer_Key = cust_info.H_Customer_Key)
Laden der Dimension MART_D1_CUSTOMER
INSERT /*+ APPEND */ INTO MART_F1_SALES_ORDERSELECTinvoice.Invoice_Date,l_oi_beer.H_Beer_Key,l_cust_order.H_Customer_Key,h_order.Order_no,order_item.Quantity,order_item.Price_per_Unit,order_item.Price_Total,invoice.Payment_Date
FROM V_CURR_S_ORDER_ITEM order_itemJOIN L_BEER_ORDER_ITEM l_oi_beer
ON (l_oi_beer.H_Order_Item_Key = order_item.H_Order_Item_Key) JOIN L_ORDER_ITEM l_order_item
ON (l_order_item.H_Order_Item_Key = order_item.H_Order_Item_Key) JOIN L_CUSTOMER_ORDER l_cust_order
ON (l_cust_order.H_Order_Key = l_order_item.H_Order_Key) JOIN H_ORDER h_order ON (h_order.H_Order_Key = l_order_item.H_Order_Key) LEFT JOIN V_CURR_S_INVOICE invoice
ON (invoice.H_Order_Key = l_order_item.H_Order_Key)
Laden der Fakten
Welcher Bierstil wurde im 2018 in welcher Ortschaft wie häufig bestellt?
SELECT d_beer.style, d_cust.city, SUM(f_sale_orde.Quantity) QuantityFROM MART_F1_SALES_ORDER f_sale_orde…
Eine Auswertung
Anzahl gelieferter Flaschen je Monat und Etikettenfarbe
Demo
SELECTTO_CHAR(f_sale_deli.Delivery_Date,'YYYY-MM') month,d_beer.Beer_Name,
…
Zugriffsschicht auf Data Vault
Für jeden Hub und zugehörige Satelliten:
• Point-in-Time Table (PIT Table)
• Current View (aktuelle Version)
• Version View (alle Versionen)
INSERT /*+ append */ INTO pit_customerWITH load_dates AS (
SELECT h_customer_key, load_date FROM s_customer_infoUNIONSELECT h_customer_key, load_date FROM s_customer_addressUNIONSELECT h_customer_key, load_date FROM s_billing_address
)SELECT ld.h_customer_key
, ld.load_date, LEAD(ld.load_date) OVER (PARTITION BY ld.h_customer_key ORDER BY ld.load_date) load_end_date, MAX (s1.load_date) OVER (PARTITION BY ld.h_customer_key ORDER BY ld.load_date) s1_load_date, MAX (s2.load_date) OVER (PARTITION BY ld.h_customer_key ORDER BY ld.load_date) s2_load_date, MAX (s3.load_date) OVER (PARTITION BY ld.h_customer_key ORDER BY ld.load_date) s3_load_date
FROM load_dates ldLEFT JOIN s_customer_info s1ON (s1.h_customer_key = ld.h_customer_key AND s1.load_date = ld.load_date)
LEFT JOIN s_customer_address s2ON (s2.h_customer_key = ld.h_customer_key AND s2.load_date = ld.load_date)
LEFT JOIN s_billing_address s3ON (s3.h_customer_key = ld.h_customer_key AND s3.load_date = ld.load_date
Laden der PIT Table
Source: Loading Dimensions from a Data Vault Modelhttps://danischnider.wordpress.com/2015/11/12/loading-dimensions-from-a-data-vault-model/
CREATE OR REPLACE VIEW v_curr_customer ASSELECT hub.h_customer_key
, hub.customer_no
, s1.preferred_contact, s1.e_mail_address, …
, s2.last_name cust_last_name, s2.first_name cust_first_name, …
, s3.last_name bill_last_name, s3.first_name bill_first_name, …
FROM h_customer hubJOIN pit_customer pit ON (hub.h_customer_key = pit.h_customer_key)LEFT JOIN s_customer_info s1ON (s1.h_customer_key = pit.h_customer_key AND s1.load_date = pit.s1_load_date)
LEFT JOIN s_customer_address s2ON (s2.h_customer_key = pit.h_customer_key AND s2.load_date = pit.s2_load_date)
LEFT JOIN s_billing_address s3ON (s3.h_customer_key = pit.h_customer_key AND s3.load_date = pit.s3_load_date)
WHERE pit.load_end_date IS NULL
Current View
CREATE OR REPLACE VIEW v_vers_customer ASSELECT hub.h_customer_key
, hub.customer_no, pit.load_date valid_from, NVL(pit.load_end_date, TO_DATE('31.12.9999', 'dd.mm.yyyy')) valid_to, s1.preferred_contact, s1.e_mail_address, …
, s2.last_name cust_last_name, s2.first_name cust_first_name, …
, s3.last_name bill_last_name, s3.first_name bill_first_name, …
FROM h_customer hubJOIN pit_customer pit ON (hub.h_customer_key = pit.h_customer_key)LEFT JOIN s_customer_info s1ON (s1.h_customer_key = pit.h_customer_key AND s1.load_date = pit.s1_load_date)
LEFT JOIN s_customer_address s2ON (s2.h_customer_key = pit.h_customer_key AND s2.load_date = pit.s2_load_date)
LEFT JOIN s_billing_address s3ON (s3.h_customer_key = pit.h_customer_key AND s3.load_date = pit.s3_load_date)
WHERE pit.load_end_date IS NULL
Version View
CREATE OR REPLACE VIEW v_mart_d1_customer ASSELECT cust.h_customer_key
, cust.cust_last_name last_name, cust.cust_first_name first_name, cust.cust_street street, cust.cust_street_no street_no, cust.cust_zip_code zip_code, cust.cust_city city, cust.bill_zip_code, cust.bill_city, cust.private_person, cust.reseller
FROM monster_dv.v_curr_customer cust
CREATE OR REPLACE VIEW v_mart_d1_beer_bottling ASSELECT bott.h_bottling_key
, bott.beer_name, bott.best_before_date, bott.bottling_date, bott.quantity_in_liter, bott.number_of_bottles, brew.brew_no, brew.brew_date, brew.brewer, beer.h_beer_key
FROM monster_dv.v_curr_bottling bottJOIN monster_dv.l_brew_bottling lbbrON (lbbr.h_bottling_key = bott.h_bottling_key)
JOIN monster_dv.v_curr_brew brewON (brew.h_brew_key = lbbr.h_brew_key)
JOIN monster_dv.l_beer_brew lbbeON (lbbe.h_brew_key = brew.h_brew_key)
JOIN monster_dv.v_curr_beer beerON (beer.h_beer_key = lbbe.h_beer_key)
Virtual Data Mart (SCD1)
Demo
CREATE OR REPLACE VIEW v_mart_f1_sales_order ASSELECT ordr.invoice_date
, beer.h_beer_key, cust.h_customer_key, ordr.order_no, orit.quantity, orit.price_per_unit, orit.price_total, ordr.payment_date
FROM monster_dv.v_curr_order ordrJOIN monster_dv.l_customer_order lcorON (lcor.h_order_key = ordr.h_order_key)
JOIN monster_dv.v_curr_customer custON (cust.h_customer_key = lcor.h_customer_key)
JOIN monster_dv.l_order_item loitON (loit.h_order_key = ordr.h_order_key)
JOIN monster_dv.v_curr_order_item oritON (orit.h_order_item_key = loit.h_order_item_key)
JOIN monster_dv.l_beer_order_item lboiON (lboi.h_order_item_key = orit.h_order_item_key)
JOIN monster_dv.v_curr_beer beerON (beer.h_beer_key = lboi.h_beer_key)
Datenmodell Data Mart (SCD2)
INSERT /*+ APPEND */ INTO MART_D2_CUSTOMERSELECTversion_key_seq.nextval VK_Customer,cust_pit.H_Customer_Key,cust_pit.load_date valid_from,NVL(cust_pit.load_end_date, TO_DATE('31.12.9999','dd.mm.yyyy')) valid_to,…
FROM PIT_CUSTOMER cust_pitLEFT JOIN S_CUSTOMER_INFO cust_info
ON (cust_info.h_customer_key = cust_pit.h_customer_keyAND cust_info.load_date = cust_pit.s1_load_date)
LEFT JOIN S_CUSTOMER_ADDRESS cust_adrON (cust_adr.h_customer_key = cust_pit.h_customer_keyAND cust_adr.load_date = cust_pit.s2_load_date)
LEFT JOIN S_BILLING_ADDRESS bill_adrON (bill_adr.h_customer_key = cust_pit.h_customer_keyAND bill_adr.load_date = cust_pit.s3_load_date)
Laden der Dimension MART_D2_CUSTOMER
INSERT /*+ APPEND */ INTO MART_F2_SALES_ORDERSELECTsale.Invoice_Date,beer.VK_Beer,sale.H_Beer_Key,cust.VK_Customer,sale.H_Customer_Key,sale.Order_No,sale.Quantity,sale.Price_per_Unit,sale.Price_Total,sale.Payment_Date
FROM MART_F1_SALES_ORDER sale JOIN MART_D2_BEER beer
ON (sale.H_Beer_Key = beer.H_Beer_KeyAND sale.Invoice_Date >= beer.valid_fromAND sale.Invoice_Date < beer.valid_to)
JOIN MART_D2_CUSTOMER custON (sale.H_Customer_Key = cust.H_Customer_KeyAND sale.Invoice_Date >= cust.valid_fromAND sale.Invoice_Date < cust.valid_to)
Laden der Fakten
Welcher Bierstil wurde im 2018 in welcher Ortschaft wie häufig bestellt?
SELECT d_beer.style, d_cust.city, SUM(f_sale_orde.Quantity) QuantityFROM MART_F2_SALES_ORDER f_sale_orde…
Eine Auswertung
Anzahl gelieferter Flaschen je Monat und Etikettenfarbe
Demo
SELECTTO_CHAR(f_sale_deli.Delivery_Date,'YYYY-MM') month,d_beer.Beer_Name,
…
CREATE OR REPLACE VIEW v_mart_d2_customer ASSELECT cust.h_customer_key
, cust.valid_from, cust.valid_to, cust.cust_last_name last_name, cust.cust_first_name first_name, cust.cust_street street, cust.cust_street_no street_no, cust.cust_zip_code zip_code, cust.cust_city city, cust.bill_zip_code, cust.bill_city, cust.private_person, cust.reseller
FROM monster_dv.v_vers_customer cust
Virtual Data Mart (SCD2)CREATE OR REPLACE VIEW v_mart_f2_sales_order ASSELECT ordr.invoice_date
, beer.h_beer_key, cust.h_customer_key, ordr.order_no, orit.quantity, orit.price_per_unit, orit.price_total, ordr.payment_date
FROM monster_dv.v_curr_order ordrJOIN monster_dv.l_customer_order lcorON (lcor.h_order_key = ordr.h_order_key)
JOIN monster_dv.v_vers_customer custON (cust.h_customer_key = lcor.h_customer_key)
JOIN monster_dv.l_order_item loitON (loit.h_order_key = ordr.h_order_key)
JOIN monster_dv.v_curr_order_item oritON (orit.h_order_item_key = loit.h_order_item_key)
JOIN monster_dv.l_beer_order_item lboiON (lboi.h_order_item_key = orit.h_order_item_key)
JOIN monster_dv.v_vers_beer beerON (beer.h_beer_key = lboi.h_beer_key)
WHERE ordr.invoice_date >= cust.valid_fromAND ordr.invoice_date < cust.valid_toAND ordr.invoice_date >= beer.valid_fromAND ordr.invoice_date < beer.valid_to
SELECT b.style, c.city, SUM(o.quantity) quantity
FROM v_mart_f1_sales_order oJOIN v_mart_d1_beer bON (o.h_beer_key = b.h_beer_key)
JOIN v_mart_d1_customer cON (o.h_customer_key = c.h_customer_key)
WHERE o.invoice_date >= DATE'2018-01-01'AND o.invoice_date <= DATE'2018-12-31'
GROUP BY b.style, c.cityORDER BY quantity DESC
SELECT b.style, c.city, SUM(o.quantity) quantity
FROM v_mart_f2_sales_order oJOIN v_mart_d2_beer bON (o.h_beer_key = b.h_beer_key
AND o.invoice_date >= b.valid_fromAND o.invoice_date < b.valid_to)
JOIN v_mart_d2_customer cON (o.h_customer_key = c.h_customer_key
AND o.invoice_date >= c.valid_fromAND o.invoice_date < c.valid_to)
WHERE o.invoice_date >= DATE'2018-01-01'AND o.invoice_date <= DATE'2018-12-31'
GROUP BY b.style, c.cityORDER BY quantity DESC
Auswertungen Virtual Data Mart SCD1 / SCD2
Welcher Bierstil wurde im 2018 in welcher Ortschaft wie häufig bestellt?
SCD1 SCD2
Demo
Performancevergleich
0
1
2
3
4
5
6
7
8
9
1 2 3 4 5 6
Seconds per Query
Mart SCD1 Mart SCD2 Views SCD1 Views SCD2
Fazit
• SCDI ist auch mit DV realtiv einfach• SCDII ist auch mit DV realtiv komplex• SCDII ist mit PIT-Tabellen realtiv einfach• Logik für Views und ETL gleich komplex• Daten persistent laden ist sinnvoll, ausser …• ... wenn aktuelle Fakten benötigt werden
Near Real-Time oder Real-Time
Wenn wir die Daten Real-Time benötigen müssen wir mit den Views
arbeiten Können wir nicht die Daten bis und mit gestern ab dem Mart lesen und
die aktuellen via View joinen?
Das wäre eine Art Lambda-Architektur?
Genau.Klingt nach einer guten Idee für die
nächste DOAG