专注大数据培训
我们一直在领跑

最新面试题集锦(17)–百度

用户操作表:
    schema 定义
        字段名 字段类型    字段说明    字段例子    字段密级    列裁剪阈值
        user_id string  uid 1234567 L4  0
        client  string  端类型 android L3  0
        op_type string  接口类型    download    L4  0
        file_type   string  文件类型    mp4 L4  0
        file_md5    string  文件md5   video   L4  0
        event_hour  string  小时      L2  0
        event_day   string  日期  20180801    L2 
统计:
        a)按照user_id、file_md5、op_type、file_type区分,统计记录条数
        d)按照user_id、file_md5分组,统计记录条数
        c)讲 a)和b)统计的结果按照user_id 和file_md5进行关联,关键结果如下user_id, file_md5, op_type, file_type, num_userid_filemd5, num_user_file_optype_filetype
        c)并将结果写入HDFS
        
    HDFS:集群和 UGI:
        集群:yq01-wutai-hdfs.dmop.baidu.com:54310
        UGI: dt-udw-AAAA,aaaaa

a)

select
user_id,file_md5,op_type,file_type,count(1) as cnt
from t
group by user_id,file_md5,op_type,file_type;

b)

select
user_id,
file_md5,
count(1) as cnt
from t
group by user_id,file_md5;

c)

with a as (
select
user_id,file_md5,op_type,file_type,count(1) as cnt
from t
group by user_id,file_md5,op_type,file_type
),
b as (
select
user_id,file_md5,count(1) cnt
from t
group by user_id,file_md5
)

insert overwrite directory 'yq01-wutai-hdfs.dmop.baidu.com:54310/dt-udw-AAAA/aaaa/'
select
a.user_id,a.file_md5,a.op_type,a.file_type,a.cnt,
b.user_id,b.file_md5,b.cnt
from a  join b  on a.user_id=b.user_id and a.file_md5=b.file_md5;
欢迎分享,转载有奖:多易教育 » 最新面试题集锦(17)–百度