自学内容网 自学内容网

SQL面试题1:连续登陆问题

引言

场景介绍:

许多互联网平台为了提高用户的参与度和忠诚度,会推出各种连续登录奖励机制。例如,游戏平台会给连续登录的玩家发放游戏道具、金币等奖励;学习类 APP 会为连续登录学习的用户提供积分,积分可兑换课程或其他福利。通过这些激励措施,平台希望用户能够养成持续使用产品的习惯,从而提升产品的活跃度和留存率。同时,对于平台运营者来说,分析用户的连续登录数据可以了解用户的使用习惯和忠诚度,进而优化产品功能和运营策略。

题目描述:

假设我们有一个记录用户登录信息的表,表名为 login_table,其中包含两个字段:uid(用户 ID)和 dt(登录日期)。现在需要完成以下三个任务:

  1. 查询连续登录超过三天的用户:找出在一段时间内,连续登录天数大于三天的用户列表。这有助于平台识别出那些高度活跃且对产品有较高忠诚度的用户,以便进一步进行精细化运营和奖励。
  2. 查询每个用户连续登录的最大天数:对于每个用户,统计其在所有登录记录中连续登录的最长时间段,这能帮助我们了解不同用户的活跃程度差异,为个性化运营提供数据基础。
  3. 查询一个用户连续登录的最大天数(可隔一天):在计算用户连续登录天数时,允许中间间隔一天,只要整体登录天数最多,就是我们要找的结果。比如用户在 1、3、5、6 日登录,那么其连续登录的最大天数为 6 天。这种统计方式可以更灵活地评估用户的活跃程度,考虑到了用户可能因为某些特殊情况中断一天登录,但整体仍保持较高的使用频率。

数据准备与代码实现

数据准备

12025-01-01
12025-01-02
12025-01-03
22025-01-07
22025-01-08
32025-01-09
32025-01-10
32025-01-12
32025-01-13

1. 查询连续登录超过三天的用户

思路:

  1. 用户登录记录编号:利用row_number()函数按uid分区并依dt升序排序生成序号rn,实现对各用户登录时间进行排序编号
  2. 计算连续登录首日:利用date_add函数将dt减去rn,计算每行对应的连续登录起始日期first_day
  3. 选出连续登录超过三天大用户:利用group byuidfirst_day分组,结合having筛选出分组行数大于等于3的记录,实现找出连续登录超三天的用户uid
with data as (
    select 1 as uid,'2025-01-01' as dt union all
    select 1 as uid,'2025-01-02' as dt union all
    select 1 as uid,'2025-01-03' as dt union all
    select 2 as uid,'2025-01-07' as dt union all
    select 2 as uid,'2025-01-08' as dt union all
    select 3 as uid,'2025-01-09' as dt union all
    select 3 as uid,'2025-01-10' as dt union all
    select 3 as uid,'2025-01-12' as dt union all
    select 3 as uid,'2025-01-13' as dt
),
data2 as (
    select uid,dt,row_number() over (partition by uid order by dt) rn from data
),
data3 as (
    select uid,dt,rn,date_add(dt,-rn) as first_day from data2
)
select uid from data3 group by uid,first_day having count(1) >= 3;

2. 查询每个用户连续登录的最大天数

思路:

  1. 用户登录记录编号:利用窗口函数row_number(),按uid分区并依dt升序排序生成序号rn,实现对各用户登录时间进行排序编号。
  2. 计算连续登录首日:利用date_add函数将dt减去rn,计算每行对应的连续登录起始日期first_day
  3. 统计分组登录天数:利用group byuidfirst_day分组,通过count(*)统计同一组合的天数login_day,以此统计出每个用户每段连续登录的天数。
  4. 获取用户最大连续登录天数:再次使用group byuid进行分组,通过max(login_day)从每个用户的多段连续登录天数中选出最大值,最终得到每个用户连续登录的最大天数。
with data as (
    select 1 as uid,'2025-01-01' as dt union all
    select 1 as uid,'2025-01-02' as dt union all
    select 1 as uid,'2025-01-03' as dt union all
    select 2 as uid,'2025-01-07' as dt union all
    select 2 as uid,'2025-01-08' as dt union all
    select 3 as uid,'2025-01-09' as dt union all
    select 3 as uid,'2025-01-10' as dt union all
    select 3 as uid,'2025-01-12' as dt union all
    select 3 as uid,'2025-01-13' as dt
),
data2 as (
    select uid,dt,row_number() over (partition by uid order by dt) rn from data
),
data3 as (
    select uid,dt,rn,date_add(dt,-rn) as first_day from data2
),
data4 as (
    select uid,first_day,count(*) as login_day from data3 group by uid,first_day
    )
select uid,max(login_day) from data4 group by uid;

3. 查询一个用户连续登录的最大天数,可以隔一天。解释:1、3、5、6登录则最大登录天数为6天。

思路:

  1. 查找上次登录时间:利用lag函数按uid分区并依dt升序排序,实现获取每行记录的上一次登录时间prev_dt
  2. 打标判断连续登录:利用datediff函数计算dtprev_dt的时间差,根据差值情况打标flag,实现区分是否连续登录,如果差值小于2天或者null(表示第一天)标记为0,都则标记为1。
  3. 计算连续登录标识和:利用sum函数按uid分组并依dt升序对flag求和,生成sum_flag,实现标识连续登录段。
  4. 计算每组时间差值:利用datediff函数对uidsum_flag聚类分组后计算max(dt)min(dt)的差值,实现获取每个分组的时间跨度。
  5. 获取最大连续登录天数:利用分组和max函数选出每个用户的最大时间差值max(diff)+1,实现得到每个用户连续登录的最大天数max_login

核心点:将相差值小于等于2的分到同一组里,然后采用分段思想计算每个分组分段的天数即为连续登录的天数。

with data as (
    select 1 as uid,'2025-01-01' as dt union all
    select 1 as uid,'2025-01-02' as dt union all
    select 1 as uid,'2025-01-04' as dt union all
    select 2 as uid,'2025-01-07' as dt union all
    select 2 as uid,'2025-01-08' as dt union all
    select 2 as uid,'2025-01-11' as dt union all
    select 2 as uid,'2025-01-13' as dt union all
    select 2 as uid,'2025-01-15' as dt union all
    select 3 as uid,'2025-01-09' as dt union all
    select 3 as uid,'2025-01-10' as dt union all
    select 3 as uid,'2025-01-12' as dt union all
    select 3 as uid,'2025-01-15' as dt
),
data2 as (
    select uid,dt,lag(dt, 1) over (partition by uid order by dt) prev_dt from data
),
data3 as (
    select uid,dt,prev_dt,if(datediff(dt, prev_dt) <= 2 or datediff(dt, prev_dt) is null, 0 ,1) flag from data2
),
data4 as (
    select uid,dt,prev_dt,flag,sum(flag) over(partition by uid order by dt) as sum_flag from data3
),
data5 as (
    select uid,datediff(max(dt),min(dt)) diff from data4 group by uid,sum_flag
)
select uid,max(diff)+1 as max_login from data5 group by uid;

知识点总结

1.窗口函数:lag、row_number

https://blog.csdn.net/Ahuuua/article/details/127136611

基本语法:函数名(参数) OVER (PARTITION BY 子句 ORDER BY 子句 ROWS/RANGE子句)

  • 函数名:如sum、max、min、count、avg等聚合函数以及lead、lag行比较函数等;
  • over: 关键字,表示前面的函数是分析函数,不是普通的集合函数;
  • 分组子句:over关键字后面挂号内的内容
lag()比较窗口函数

lag/lead(arg1,arg2,arg3):其中arg1为列名;arg2为偏移值,不能为负,默认为1;arg3超出记录窗口时的默认值,当不指定默认值时,则为null。lag:向前取n行; lead:向后取n行
在这里插入图片描述

row_number()排序窗口函数

排序窗口函数的主要作用是为查询结果中的每一行数据生成一个唯一的行号。这个行号是基于特定的排序规则生成的,并且可以根据不同的分组条件进行独立编号。

rankrow_numberdense_rank
100111
100121
90332

2. 日期计算函数

日期的三种形式:

  • DATE:YYYY-MM-DD,CURRENT_DATE()
  • DATETIME:YYYY-MM-DD HH:MM:SS、CURRENT_TIMESTAMP()
  • TIMESTAMP:时间戳,1973-12-30 15:30:00为19731230153000,UNIX_TIMESTAMP()
    常见计算函数:
  • DATEDIFF(end,start):计算end-start,单位天数
  • TIMESTAMPDIFF(unit,start,end):计算end-start,单位unit
    • unit:second、minute、hour、day、week、month、quarter(季度)、year
  • DATE_ADD(date, num):计算date+num后的时间,num参数表示要增加的时间间隔数量,正数表示增加时间,负数表示减少时间。
select CURRENT_DATE(),CURRENT_TIMESTAMP(),UNIX_TIMESTAMP();

在这里插入图片描述


原文地址:https://blog.csdn.net/qq_44780372/article/details/145044501

免责声明:本站文章内容转载自网络资源,如侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!