十二、间接数据源
本节讨论如何处理间接数据源。间接数据源与维度表具有不同的粒度,因此不能直接装载进数据仓库。在这里通过修改进阶技术(八)——“多路径和参差不齐的层次”里的促销源数据说明怎样处理间接数据源。
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| 1CAMPAIGN SESSION,MONTH,YEAR
22016 First Campaign,1,2016
32016 First Campaign,2,2016
42016 First Campaign,3,2016
52016 First Campaign,4,2016
62016 Second Campaign,5,2016
72016 Second Campaign,6,2016
82016 Second Campaign,7,2016
92016 Third Campaign,8,2016
102016 Last Campaign,9,2016
112016 Last Campaign,10,2016
122016 Last Campaign,11,2016
132016 Last Campaign,12,2016
14 |
1 2
| 1 如上所示,促销期数据源的粒度是月,因为每行都有一个月份元素。而且一个促销期可能延续多个月,正如上面显示的2016年第一个促销期有四个月。这意味着促销期信息重复了四次,也就是四行。比方说希望简化促销期源数据的准备工作,每个促销期不管有多长,只准备一行数据。新的数据格式可以改成下面所示,存在non_campaign_session.csv文件中。
2 |
1 2 3 4 5
| 12016 First Campaign,1,2016,4,2016
22016 Second Campaign,5,2016,7,2016
32016 Third Campaign,8,2016,8,2016
42016 Last Campaign,9,2016,12,2016
5 |
需要一个不同的过渡表。使用下面的脚本创建它。
1 2 3 4 5 6 7 8 9 10
| 1USE rds;
2CREATE TABLE non_straight_campaign (
3 campaign_session CHAR(30),
4 start_month CHAR(9),
5 start_year INT,
6 end_month CHAR(9),
7 end_year INT
8)
9row format delimited fields terminated by ',' stored as textfile;
10 |
1 2
| 1 注意新的过渡表既有开始年月列也有结束年月列。下面给出了修改后的促销期装载脚本。
2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
| 1use rds;
2load data local inpath '/root/non_campaign_session.csv' overwrite into table non_straight_campaign;
3
4use dw;
5drop table if exists tmp;
6create table tmp as
7select t1.month_sk,
8 t1.month,
9 t1.month_name,
10 t3.campaign_session,
11 t1.quarter,
12 t1.year
13 from month_dim t1, month_dim t2, rds.non_straight_campaign t3
14 where t1.year = t3.start_year
15 and t1.month >= t3.start_month
16 and t2.year = t3.end_year
17 and t2.month <= t3.end_month
18 and t1.year = t2.year
19 and t1.month = t2.month;
20delete from month_dim where month_dim.month_sk in (select month_sk from tmp);
21insert into month_dim select * from tmp;
22 |
执行修改后的促销期装载脚本之前,要执行下面的命令删除已装载的促销期数据。
1 2 3
| 1USE dw;
2UPDATE month_dim SET campaign_session = NULL;
3 |
1 2
| 1 执行修改后的促销期装载脚本后,查询month_dim表,确认它被正确地装载,查询语句如下。
2 |
1 2 3 4
| 1select month_sk m_sk, month_name, month m, campaign_session,quarter q
2 from dw.month_dim
3 where year = 2016;
4 |
查询结果如下图所示。