Skip to content

SQL166 每天的日活数及新用户占比

Static BadgeStatic Badge

题目描述

用户行为日志表tb_user_log

iduidartical_idin_timeout_timesign_cin
110190012021-10-31 10:00:002021-10-31 10:00:090
210290012021-10-31 10:00:002021-10-31 10:00:090
310102021-11-01 10:00:002021-11-01 10:00:421
410290012021-11-01 10:00:002021-11-01 10:00:090
510890012021-11-01 10:00:012021-11-01 10:00:500
610890012021-11-02 10:00:012021-11-02 10:00:500
710490012021-11-02 10:00:282021-11-02 10:00:500
810690012021-11-02 10:00:282021-11-02 10:00:500
910890012021-11-03 10:00:012021-11-03 10:00:500
1010990022021-11-03 11:00:552021-11-03 11:00:590
1110490032021-11-03 11:00:452021-11-03 11:00:550
1210590032021-11-03 11:00:532021-11-03 11:00:590
1310690032021-11-03 11:00:452021-11-03 11:00:550

(uid-用户ID, artical_id-文章ID, in_time-进入时间, out_time-离开时间, sign_in-是否签到)

问题:统计每天的日活数及新用户占比

  • 新用户占比=当天的新用户数÷当天活跃用户数(日活数)。
  • 如果in_time-进入时间out_time-离开时间跨天了,在两天里都记为该用户活跃过。
  • 新用户占比保留2位小数,结果按日期升序排序。

输出示例

示例数据的输出结果如下:

dtdauuv_new_ratio
2021-10-3021.00
2021-11-0130.33
2021-11-0230.67
2021-11-0350.40

解释:

2021年10月31日有2个用户活跃,都为新用户,新用户占比1.00;

2021年11月1日有3个用户活跃,其中1个新用户,新用户占比0.33;

SQL Schema

sql
DROP TABLE IF EXISTS tb_user_log;
CREATE TABLE tb_user_log
(
    id         INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid        INT NOT NULL COMMENT '用户ID',
    artical_id INT NOT NULL COMMENT '视频ID',
    in_time    datetime COMMENT '进入时间',
    out_time   datetime COMMENT '离开时间',
    sign_in    TINYINT DEFAULT 0 COMMENT '是否签到'
) CHARACTER SET utf8
  COLLATE utf8_bin;

INSERT INTO tb_user_log(uid, artical_id, in_time, out_time, sign_in)
VALUES (101, 9001, '2021-10-31 10:00:00', '2021-10-31 10:00:09', 0),
       (102, 9001, '2021-10-31 10:00:00', '2021-10-31 10:00:09', 0),
       (101, 0, '2021-11-01 10:00:00', '2021-11-01 10:00:42', 1),
       (102, 9001, '2021-11-01 10:00:00', '2021-11-01 10:00:09', 0),
       (108, 9001, '2021-11-01 10:00:01', '2021-11-01 10:01:50', 0),
       (108, 9001, '2021-11-02 10:00:01', '2021-11-02 10:01:50', 0),
       (104, 9001, '2021-11-02 10:00:28', '2021-11-02 10:00:50', 0),
       (106, 9001, '2021-11-02 10:00:28', '2021-11-02 10:00:50', 0),
       (108, 9001, '2021-11-03 10:00:01', '2021-11-03 10:01:50', 0),
       (109, 9002, '2021-11-03 11:00:55', '2021-11-03 11:00:59', 0),
       (104, 9003, '2021-11-03 11:00:45', '2021-11-03 11:00:55', 0),
       (105, 9003, '2021-11-03 11:00:53', '2021-11-03 11:00:59', 0),
       (106, 9003, '2021-11-03 11:00:45', '2021-11-03 11:00:55', 0);

答案

sql
SELECT dt, COUNT(uid) AS `dau`, ROUND(AVG(IF(dt = first_time, 1, 0)), 2) AS `uv_new_ratio`
FROM (SELECT uid, dt, MIN(dt) OVER (PARTITION BY uid ORDER BY dt) AS `first_time`
      FROM (SELECT uid, DATE_FORMAT(in_time, '%Y-%m-%d') AS `dt`
            FROM tb_user_log
            UNION
            SELECT uid, DATE_FORMAT(out_time, '%Y-%m-%d') AS `dt`
            FROM tb_user_log) a) b
GROUP BY dt
ORDER BY dt;