九、退化维度
本节讨论一种称为退化维度的技术。该技术减少维度的数量,简化维度数据仓库模式。简单的模式比复杂的更容易理解,也有更好的查询性能。当一个维度没有数据仓库需要的任何数据时就可以退化此维度,此时需要把退化维度的相关数据迁移到事实表中,然后删除退化的维度。
1. 退化订单维度
本小节说明如何退化订单维度,包括对数据仓库模式和定期装载脚本的修改。使用维度退化技术时你首先要识别数据,分析从来不用的数据列。例如,订单维度的order_number列就可能是这样的一列。但如果用户想看事务的细节,还需要订单号。因此,在退化订单维度前,要把订单号迁移到sales_order_fact表。下图显示了迁移后的模式。
按顺序执行下面的四步退化order_dim维度表:
(1)给sales_order_fact表添加order_number列
(2)把order_dim表里的订单号迁移到sales_order_fact表
(3)删除sales_order_fact表里的order_sk列
(4)删除order_dim表
下面的脚本完成所有退化订单维度所需的步骤。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
| 1use dw;
2alter table sales_order_fact rename to sales_order_fact_old;
3create table sales_order_fact(
4 order_number int COMMENT 'order number',
5 customer_sk int COMMENT 'customer surrogate key',
6 product_sk int COMMENT 'product surrogate key',
7 order_date_sk int COMMENT 'order date surrogate key',
8 allocate_date_sk int COMMENT 'allocate date surrogate key',
9 allocate_quantity int COMMENT 'allocate quantity',
10 packing_date_sk int COMMENT 'packing date surrogate key',
11 packing_quantity int COMMENT 'packing quantity',
12 ship_date_sk int COMMENT 'ship date surrogate key',
13 ship_quantity int COMMENT 'ship quantity',
14 receive_date_sk int COMMENT 'receive date surrogate key',
15 receive_quantity int COMMENT 'receive quantity',
16 request_delivery_date_sk int COMMENT 'request delivery date surrogate key',
17 order_amount decimal(10,2) COMMENT 'order amount',
18 order_quantity int COMMENT 'order quantity')
19clustered by (order_number) into 8 buckets
20stored as orc tblproperties ('transactional'='true');
21
22insert into table sales_order_fact
23select t2.order_number,
24 t1.customer_sk,
25 t1.product_sk,
26 t1.order_date_sk,
27 t1.allocate_date_sk,
28 t1.allocate_quantity,
29 t1.packing_date_sk,
30 t1.packing_quantity,
31 t1.ship_date_sk,
32 t1.ship_quantity,
33 t1.receive_date_sk,
34 t1.receive_quantity,
35 t1.request_delivery_date_sk,
36 t1.order_amount,
37 t1.order_quantity
38 from sales_order_fact_old t1
39 inner join order_dim t2 on t1.order_sk = t2.order_sk;
40
41drop table sales_order_fact_old;
42drop table order_dim;
43 |
退化一个维度后需要做的另一件事就是修改定期装载脚本。修改后的脚本需要把订单号加入到销售订单事实表,而不再需要导入订单维度。下面显示了修改后的regular_etl.sql脚本文件内容。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381
| 1-- 设置变量以支持事务
2set hive.support.concurrency=true;
3set hive.exec.dynamic.partition.mode=nonstrict;
4set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
5set hive.compactor.initiator.on=true;
6set hive.compactor.worker.threads=1;
7
8USE dw;
9
10-- 设置SCD的生效时间和过期时间
11SET hivevar:cur_date = CURRENT_DATE();
12SET hivevar:pre_date = DATE_ADD(${hivevar:cur_date},-1);
13SET hivevar:max_date = CAST('2200-01-01' AS DATE);
14
15-- 设置CDC的上限时间
16INSERT OVERWRITE TABLE rds.cdc_time SELECT last_load, ${hivevar:cur_date} FROM rds.cdc_time;
17
18-- 装载customer维度
19-- 设置已删除记录和地址相关列上SCD2的过期,用<=>运算符处理NULL值。
20UPDATE customer_dim
21 SET expiry_date = ${hivevar:pre_date}
22 WHERE customer_dim.customer_sk IN
23(SELECT a.customer_sk
24 FROM (SELECT customer_sk,
25 customer_number,
26 customer_street_address,
27 customer_zip_code,
28 customer_city,
29 customer_state,
30 shipping_address,
31 shipping_zip_code,
32 shipping_city,
33 shipping_state
34 FROM customer_dim WHERE expiry_date = ${hivevar:max_date}) a LEFT JOIN
35 rds.customer b ON a.customer_number = b.customer_number
36 WHERE b.customer_number IS NULL OR
37 ( !(a.customer_street_address <=> b.customer_street_address)
38 OR !(a.customer_zip_code <=> b.customer_zip_code)
39 OR !(a.customer_city <=> b.customer_city)
40 OR !(a.customer_state <=> b.customer_state)
41 OR !(a.shipping_address <=> b.shipping_address)
42 OR !(a.shipping_zip_code <=> b.shipping_zip_code)
43 OR !(a.shipping_city <=> b.shipping_city)
44 OR !(a.shipping_state <=> b.shipping_state)
45 ));
46
47-- 处理customer_street_addresses列上SCD2的新增行
48INSERT INTO customer_dim
49SELECT
50 ROW_NUMBER() OVER (ORDER BY t1.customer_number) + t2.sk_max,
51 t1.customer_number,
52 t1.customer_name,
53 t1.customer_street_address,
54 t1.customer_zip_code,
55 t1.customer_city,
56 t1.customer_state,
57 t1.shipping_address,
58 t1.shipping_zip_code,
59 t1.shipping_city,
60 t1.shipping_state,
61 t1.version,
62 t1.effective_date,
63 t1.expiry_date
64FROM
65(
66SELECT
67 t2.customer_number customer_number,
68 t2.customer_name customer_name,
69 t2.customer_street_address customer_street_address,
70 t2.customer_zip_code customer_zip_code,
71 t2.customer_city customer_city,
72 t2.customer_state customer_state,
73 t2.shipping_address shipping_address,
74 t2.shipping_zip_code shipping_zip_code,
75 t2.shipping_city shipping_city,
76 t2.shipping_state shipping_state,
77 t1.version + 1 version,
78 ${hivevar:pre_date} effective_date,
79 ${hivevar:max_date} expiry_date
80 FROM customer_dim t1
81INNER JOIN rds.customer t2
82 ON t1.customer_number = t2.customer_number
83 AND t1.expiry_date = ${hivevar:pre_date}
84 LEFT JOIN customer_dim t3
85 ON t1.customer_number = t3.customer_number
86 AND t3.expiry_date = ${hivevar:max_date}
87WHERE (!(t1.customer_street_address <=> t2.customer_street_address)
88 OR !(t1.customer_zip_code <=> t2.customer_zip_code)
89 OR !(t1.customer_city <=> t2.customer_city)
90 OR !(t1.customer_state <=> t2.customer_state)
91 OR !(t1.shipping_address <=> t2.shipping_address)
92 OR !(t1.shipping_zip_code <=> t2.shipping_zip_code)
93 OR !(t1.shipping_city <=> t2.shipping_city)
94 OR !(t1.shipping_state <=> t2.shipping_state)
95 )
96 AND t3.customer_sk IS NULL) t1
97CROSS JOIN
98(SELECT COALESCE(MAX(customer_sk),0) sk_max FROM customer_dim) t2;
99
100-- 处理customer_name列上的SCD1
101-- 因为hive的update的set子句还不支持子查询,所以这里使用了一个临时表存储需要更新的记录,用先delete再insert代替update
102-- 因为SCD1本身就不保存历史数据,所以这里更新维度表里的所有customer_name改变的记录,而不是仅仅更新当前版本的记录
103DROP TABLE IF EXISTS tmp;
104CREATE TABLE tmp AS
105SELECT
106 a.customer_sk,
107 a.customer_number,
108 b.customer_name,
109 a.customer_street_address,
110 a.customer_zip_code,
111 a.customer_city,
112 a.customer_state,
113 a.shipping_address,
114 a.shipping_zip_code,
115 a.shipping_city,
116 a.shipping_state,
117 a.version,
118 a.effective_date,
119 a.expiry_date
120 FROM customer_dim a, rds.customer b
121 WHERE a.customer_number = b.customer_number AND !(a.customer_name <=> b.customer_name);
122DELETE FROM customer_dim WHERE customer_dim.customer_sk IN (SELECT customer_sk FROM tmp);
123INSERT INTO customer_dim SELECT * FROM tmp;
124
125-- 处理新增的customer记录
126INSERT INTO customer_dim
127SELECT
128 ROW_NUMBER() OVER (ORDER BY t1.customer_number) + t2.sk_max,
129 t1.customer_number,
130 t1.customer_name,
131 t1.customer_street_address,
132 t1.customer_zip_code,
133 t1.customer_city,
134 t1.customer_state,
135 t1.shipping_address,
136 t1.shipping_zip_code,
137 t1.shipping_city,
138 t1.shipping_state,
139 1,
140 ${hivevar:pre_date},
141 ${hivevar:max_date}
142FROM
143(
144SELECT t1.* FROM rds.customer t1 LEFT JOIN customer_dim t2 ON t1.customer_number = t2.customer_number
145 WHERE t2.customer_sk IS NULL) t1
146CROSS JOIN
147(SELECT COALESCE(MAX(customer_sk),0) sk_max FROM customer_dim) t2;
148
149-- 重载PA客户维度
150TRUNCATE TABLE pa_customer_dim;
151INSERT INTO pa_customer_dim
152SELECT
153 customer_sk
154, customer_number
155, customer_name
156, customer_street_address
157, customer_zip_code
158, customer_city
159, customer_state
160, shipping_address
161, shipping_zip_code
162, shipping_city
163, shipping_state
164, version
165, effective_date
166, expiry_date
167FROM customer_dim
168WHERE customer_state = 'PA' ;
169
170-- 装载product维度
171-- 设置已删除记录和product_name、product_category列上SCD2的过期
172UPDATE product_dim
173 SET expiry_date = ${hivevar:pre_date}
174 WHERE product_dim.product_sk IN
175(SELECT a.product_sk
176 FROM (SELECT product_sk,product_code,product_name,product_category
177 FROM product_dim WHERE expiry_date = ${hivevar:max_date}) a LEFT JOIN
178 rds.product b ON a.product_code = b.product_code
179 WHERE b.product_code IS NULL OR (a.product_name <> b.product_name OR a.product_category <> b.product_category));
180
181-- 处理product_name、product_category列上SCD2的新增行
182INSERT INTO product_dim
183SELECT
184 ROW_NUMBER() OVER (ORDER BY t1.product_code) + t2.sk_max,
185 t1.product_code,
186 t1.product_name,
187 t1.product_category,
188 t1.version,
189 t1.effective_date,
190 t1.expiry_date
191FROM
192(
193SELECT
194 t2.product_code product_code,
195 t2.product_name product_name,
196 t2.product_category product_category,
197 t1.version + 1 version,
198 ${hivevar:pre_date} effective_date,
199 ${hivevar:max_date} expiry_date
200 FROM product_dim t1
201INNER JOIN rds.product t2
202 ON t1.product_code = t2.product_code
203 AND t1.expiry_date = ${hivevar:pre_date}
204 LEFT JOIN product_dim t3
205 ON t1.product_code = t3.product_code
206 AND t3.expiry_date = ${hivevar:max_date}
207WHERE (t1.product_name <> t2.product_name OR t1.product_category <> t2.product_category) AND t3.product_sk IS NULL) t1
208CROSS JOIN
209(SELECT COALESCE(MAX(product_sk),0) sk_max FROM product_dim) t2;
210
211-- 处理新增的product记录
212INSERT INTO product_dim
213SELECT
214 ROW_NUMBER() OVER (ORDER BY t1.product_code) + t2.sk_max,
215 t1.product_code,
216 t1.product_name,
217 t1.product_category,
218 1,
219 ${hivevar:pre_date},
220 ${hivevar:max_date}
221FROM
222(
223SELECT t1.* FROM rds.product t1 LEFT JOIN product_dim t2 ON t1.product_code = t2.product_code
224 WHERE t2.product_sk IS NULL) t1
225CROSS JOIN
226(SELECT COALESCE(MAX(product_sk),0) sk_max FROM product_dim) t2;
227
228-- 装载销售订单事实表
229-- 前一天新增的销售订单
230INSERT INTO sales_order_fact
231SELECT
232 a.order_number,
233 customer_sk,
234 product_sk,
235 e.order_date_sk,
236 null,
237 null,
238 null,
239 null,
240 null,
241 null,
242 null,
243 null,
244 f.request_delivery_date_sk,
245 order_amount,
246 quantity
247 FROM
248 rds.sales_order a,
249 customer_dim c,
250 product_dim d,
251 order_date_dim e,
252 request_delivery_date_dim f,
253 rds.cdc_time g
254 WHERE
255 a.order_status = 'N'
256AND a.customer_number = c.customer_number
257AND a.status_date >= c.effective_date
258AND a.status_date < c.expiry_date
259AND a.product_code = d.product_code
260AND a.status_date >= d.effective_date
261AND a.status_date < d.expiry_date
262AND to_date(a.status_date) = e.order_date
263AND to_date(a.request_delivery_date) = f.request_delivery_date
264AND a.entry_date >= g.last_load AND a.entry_date < g.current_load ;
265
266-- 处理分配库房、打包、配送和收货四个状态
267DROP TABLE IF EXISTS tmp;
268CREATE TABLE tmp AS
269select t0.order_number order_number,
270 t0.customer_sk customer_sk,
271 t0.product_sk product_sk,
272 t0.order_date_sk order_date_sk,
273 t2.allocate_date_sk allocate_date_sk,
274 t1.quantity allocate_quantity,
275 t0.packing_date_sk packing_date_sk,
276 t0.packing_quantity packing_quantity,
277 t0.ship_date_sk ship_date_sk,
278 t0.ship_quantity ship_quantity,
279 t0.receive_date_sk receive_date_sk,
280 t0.receive_quantity receive_quantity,
281 t0.request_delivery_date_sk request_delivery_date_sk,
282 t0.order_amount order_amount,
283 t0.order_quantity order_quantity
284 from sales_order_fact t0,
285 rds.sales_order t1,
286 allocate_date_dim t2,
287 rds.cdc_time t4
288 where t0.order_number = t1.order_number and t1.order_status = 'A'
289 and to_date(t1.status_date) = t2.allocate_date
290 and t1.entry_date >= t4.last_load and t1.entry_date < t4.current_load;
291
292DELETE FROM sales_order_fact WHERE sales_order_fact.order_number IN (SELECT order_number FROM tmp);
293INSERT INTO sales_order_fact SELECT * FROM tmp;
294
295DROP TABLE IF EXISTS tmp;
296CREATE TABLE tmp AS
297select t0.order_number order_number,
298 t0.customer_sk customer_sk,
299 t0.product_sk product_sk,
300 t0.order_date_sk order_date_sk,
301 t0.allocate_date_sk allocate_date_sk,
302 t0.allocate_quantity allocate_quantity,
303 t2.packing_date_sk packing_date_sk,
304 t1.quantity packing_quantity,
305 t0.ship_date_sk ship_date_sk,
306 t0.ship_quantity ship_quantity,
307 t0.receive_date_sk receive_date_sk,
308 t0.receive_quantity receive_quantity,
309 t0.request_delivery_date_sk request_delivery_date_sk,
310 t0.order_amount order_amount,
311 t0.order_quantity order_quantity
312 from sales_order_fact t0,
313 rds.sales_order t1,
314 packing_date_dim t2,
315 rds.cdc_time t4
316 where t0.order_number = t1.order_number and t1.order_status = 'P'
317 and to_date(t1.status_date) = t2.packing_date
318 and t1.entry_date >= t4.last_load and t1.entry_date < t4.current_load;
319
320DELETE FROM sales_order_fact WHERE sales_order_fact.order_number IN (SELECT order_number FROM tmp);
321INSERT INTO sales_order_fact SELECT * FROM tmp;
322
323DROP TABLE IF EXISTS tmp;
324CREATE TABLE tmp AS
325select t0.order_number order_number,
326 t0.customer_sk customer_sk,
327 t0.product_sk product_sk,
328 t0.order_date_sk order_date_sk,
329 t0.allocate_date_sk allocate_date_sk,
330 t0.allocate_quantity allocate_quantity,
331 t0.packing_date_sk packing_date_sk,
332 t0.packing_quantity packing_quantity,
333 t2.ship_date_sk ship_date_sk,
334 t1.quantity ship_quantity,
335 t0.receive_date_sk receive_date_sk,
336 t0.receive_quantity receive_quantity,
337 t0.request_delivery_date_sk request_delivery_date_sk,
338 t0.order_amount order_amount,
339 t0.order_quantity order_quantity
340 from sales_order_fact t0,
341 rds.sales_order t1,
342 ship_date_dim t2,
343 rds.cdc_time t4
344 where t0.order_number = t1.order_number and t1.order_status = 'S'
345 and to_date(t1.status_date) = t2.ship_date
346 and t1.entry_date >= t4.last_load and t1.entry_date < t4.current_load;
347
348DELETE FROM sales_order_fact WHERE sales_order_fact.order_number IN (SELECT order_number FROM tmp);
349INSERT INTO sales_order_fact SELECT * FROM tmp;
350
351DROP TABLE IF EXISTS tmp;
352CREATE TABLE tmp AS
353select t0.order_number order_number,
354 t0.customer_sk customer_sk,
355 t0.product_sk product_sk,
356 t0.order_date_sk order_date_sk,
357 t0.allocate_date_sk allocate_date_sk,
358 t0.allocate_quantity allocate_quantity,
359 t0.packing_date_sk packing_date_sk,
360 t0.packing_quantity packing_quantity,
361 t0.ship_date_sk ship_date_sk,
362 t0.ship_quantity ship_quantity,
363 t2.receive_date_sk receive_date_sk,
364 t1.quantity receive_quantity,
365 t0.request_delivery_date_sk request_delivery_date_sk,
366 t0.order_amount order_amount,
367 t0.order_quantity order_quantity
368 from sales_order_fact t0,
369 rds.sales_order t1,
370 receive_date_dim t2,
371 rds.cdc_time t4
372 where t0.order_number = t1.order_number and t1.order_status = 'R'
373 and to_date(t1.status_date) = t2.receive_date
374 and t1.entry_date >= t4.last_load and t1.entry_date < t4.current_load;
375
376DELETE FROM sales_order_fact WHERE sales_order_fact.order_number IN (SELECT order_number FROM tmp);
377INSERT INTO sales_order_fact SELECT * FROM tmp;
378
379-- 更新时间戳表的last_load字段
380INSERT OVERWRITE TABLE rds.cdc_time SELECT current_load, current_load FROM rds.cdc_time;
381 |
(1)准备测试数据
测试使用具有分配库房、打包、配送和收货里程碑的两个新订单。所以每个订单需要添加五行。下面的脚本向源数据库里的sales_order表新增十行。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
| 1USE source;
2DROP TABLE IF EXISTS temp_sales_order_data;
3CREATE TABLE temp_sales_order_data AS SELECT * FROM sales_order WHERE 1=0;
4
5SET @start_date := unix_timestamp('2016-07-25');
6SET @end_date := unix_timestamp('2016-07-26');
7SET @order_date := from_unixtime(@start_date + rand() * (@end_date - @start_date));
8SET @amount := floor(1000 + rand() * 9000);
9SET @quantity := floor(10 + rand() * 90);
10INSERT INTO temp_sales_order_data VALUES (1, 131, 1, 1, @order_date, 'N', '2016-08-01', @order_date, @amount, @quantity);
11
12SET @start_date := unix_timestamp('2016-07-25');
13SET @end_date := unix_timestamp('2016-07-26');
14SET @order_date := from_unixtime(@start_date + rand() * (@end_date - @start_date));
15SET @amount := floor(1000 + rand() * 9000);
16SET @quantity := floor(10 + rand() * 90);
17INSERT INTO temp_sales_order_data VALUES (2, 132, 2, 2, @order_date, 'N', '2016-08-01', @order_date, @amount, @quantity);
18
19SET @start_date := unix_timestamp('2016-07-26');
20SET @end_date := unix_timestamp('2016-07-27');
21SET @order_date := from_unixtime(@start_date + rand() * (@end_date - @start_date));
22SET @amount := floor(1000 + rand() * 9000);
23SET @quantity := floor(10 + rand() * 90);
24INSERT INTO temp_sales_order_data VALUES (3, 131, 1, 1, @order_date, 'A', '2016-08-01', @order_date, @amount, @quantity);
25
26SET @start_date := unix_timestamp('2016-07-26');
27SET @end_date := unix_timestamp('2016-07-27');
28SET @order_date := from_unixtime(@start_date + rand() * (@end_date - @start_date));
29SET @amount := floor(1000 + rand() * 9000);
30SET @quantity := floor(10 + rand() * 90);
31INSERT INTO temp_sales_order_data VALUES (4, 132, 2, 2, @order_date, 'A', '2016-08-01', @order_date, @amount, @quantity);
32
33SET @start_date := unix_timestamp('2016-07-27');
34SET @end_date := unix_timestamp('2016-07-28');
35SET @order_date := from_unixtime(@start_date + rand() * (@end_date - @start_date));
36SET @amount := floor(1000 + rand() * 9000);
37SET @quantity := floor(10 + rand() * 90);
38INSERT INTO temp_sales_order_data VALUES (5, 131, 1, 1, @order_date, 'P', '2016-08-01', @order_date, @amount, @quantity);
39
40SET @start_date := unix_timestamp('2016-07-27');
41SET @end_date := unix_timestamp('2016-07-28');
42SET @order_date := from_unixtime(@start_date + rand() * (@end_date - @start_date));
43SET @amount := floor(1000 + rand() * 9000);
44SET @quantity := floor(10 + rand() * 90);
45INSERT INTO temp_sales_order_data VALUES (6, 132, 2, 2, @order_date, 'P', '2016-08-01', @order_date, @amount, @quantity);
46
47SET @start_date := unix_timestamp('2016-07-28');
48SET @end_date := unix_timestamp('2016-07-29');
49SET @order_date := from_unixtime(@start_date + rand() * (@end_date - @start_date));
50SET @amount := floor(1000 + rand() * 9000);
51SET @quantity := floor(10 + rand() * 90);
52INSERT INTO temp_sales_order_data VALUES (7, 131, 1, 1, @order_date, 'S', '2016-08-01', @order_date, @amount, @quantity);
53
54SET @start_date := unix_timestamp('2016-07-28');
55SET @end_date := unix_timestamp('2016-07-29');
56SET @order_date := from_unixtime(@start_date + rand() * (@end_date - @start_date));
57SET @amount := floor(1000 + rand() * 9000);
58SET @quantity := floor(10 + rand() * 90);
59INSERT INTO temp_sales_order_data VALUES (8, 132, 2, 2, @order_date, 'S', '2016-08-01', @order_date, @amount, @quantity);
60
61SET @start_date := unix_timestamp('2016-07-29');
62SET @end_date := unix_timestamp('2016-07-30');
63SET @order_date := from_unixtime(@start_date + rand() * (@end_date - @start_date));
64SET @amount := floor(1000 + rand() * 9000);
65SET @quantity := floor(10 + rand() * 90);
66INSERT INTO temp_sales_order_data VALUES (9, 131, 1, 1, @order_date, 'R', '2016-08-01', @order_date, @amount, @quantity);
67
68SET @start_date := unix_timestamp('2016-07-29');
69SET @end_date := unix_timestamp('2016-07-30');
70SET @order_date := from_unixtime(@start_date + rand() * (@end_date - @start_date));
71SET @amount := floor(1000 + rand() * 9000);
72SET @quantity := floor(10 + rand() * 90);
73INSERT INTO temp_sales_order_data VALUES (10, 132, 2, 2, @order_date, 'R', '2016-08-01', @order_date, @amount, @quantity);
74
75INSERT INTO sales_order
76select null,
77 order_number,
78 customer_number,
79 product_code,
80 status_date,
81 order_status,
82 request_delivery_date,
83 entry_date,
84 order_amount,
85 quantity
86 from temp_sales_order_data t1
87 order by t1.status_date;
88
89COMMIT ;
90 |
1 2 3
| 1use rds;
2INSERT OVERWRITE TABLE rds.cdc_time SELECT '2016-07-25', '2016-07-26' FROM rds.cdc_time;
3 |
1 2
| 1将regular_etl.sql文件中的SET hivevar:cur_date = CURRENT_DATE();行改为SET hivevar:cur_date = '2016-07-26';
2 |
1 2 3
| 1use rds;
2INSERT OVERWRITE TABLE rds.cdc_time SELECT '2016-07-26', '2016-07-27' FROM rds.cdc_time;
3 |
1 2
| 1将regular_etl.sql文件中的SET hivevar:cur_date = CURRENT_DATE();行改为SET hivevar:cur_date = '2016-07-27';
2 |
1 2 3
| 1use rds;
2INSERT OVERWRITE TABLE rds.cdc_time SELECT '2016-07-27', '2016-07-28' FROM rds.cdc_time;
3 |
1 2
| 1将regular_etl.sql文件中的SET hivevar:cur_date = CURRENT_DATE();行改为SET hivevar:cur_date = '2016-07-28';
2 |
1 2 3
| 1use rds;
2INSERT OVERWRITE TABLE rds.cdc_time SELECT '2016-07-28', '2016-07-29' FROM rds.cdc_time;
3 |
1 2
| 1将regular_etl.sql文件中的SET hivevar:cur_date = CURRENT_DATE();行改为SET hivevar:cur_date = '2016-07-29';
2 |
1 2 3
| 1use rds;
2INSERT OVERWRITE TABLE rds.cdc_time SELECT '2016-07-29', '2016-07-30' FROM rds.cdc_time;
3 |
1 2
| 1将regular_etl.sql文件中的SET hivevar:cur_date = CURRENT_DATE();行改为SET hivevar:cur_date = '2016-07-30';
2 |
1 2
| 1 查询sales_order_fact表的两条订单。
2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
| 1use dw;
2select t1.order_number orn,
3 t2.order_date od,
4 t1.order_quantity oq,
5 t3.allocate_date ad,
6 t1.allocate_quantity aq,
7 t4.packing_date pd,
8 t1.packing_quantity pq,
9 t5.ship_date sd,
10 t1.ship_quantity sq,
11 t6.receive_date rd,
12 t1.receive_quantity rq
13 from sales_order_fact t1
14 inner join order_date_dim t2 on t1.order_date_sk = t2.order_date_sk
15 left join allocate_date_dim t3 on t1.allocate_date_sk = t3.allocate_date_sk
16 left join packing_date_dim t4 on t1.packing_date_sk = t4.packing_date_sk
17 left join ship_date_dim t5 on t1.ship_date_sk = t5.ship_date_sk
18 left join receive_date_dim t6 on t1.receive_date_sk = t6.receive_date_sk
19 where t1.order_number IN (131 , 132);
20 |
测试完将regular_etl.sql文件中的SET hivevar:cur_date = CURRENT_DATE();行恢复。