自学内容网 自学内容网

力扣之1369.获取最近第二次的活动

  • 题目:

  • sql建表语句

  • Create table If Not Exists UserActivity (username varchar(30), activity varchar(30), startDate date, endDate date);
    Truncate table UserActivity;
    insert into UserActivity (username, activity, startDate, endDate) values ('Alice', 'Travel', '2020-02-12', '2020-02-20');
    insert into UserActivity (username, activity, startDate, endDate) values ('Alice', 'Dancing', '2020-02-21', '2020-02-23');
    insert into UserActivity (username, activity, startDate, endDate) values ('Alice', 'Travel', '2020-02-24', '2020-02-28');
    insert into UserActivity (username, activity, startDate, endDate) values ('Bob', 'Travel', '2020-02-11', '2020-02-18');
  • 分析:首先,我们先按照姓名分组,然后按照开始时间排序,找出每个人的活动次序,然后我们再算出每个人的最大次数,然后筛选出最大值不为1,而且排序问最大值-1的信息,或者最大值和排序都为1的信息,下面是图解分析:

  • sql实现:

    with t1 as (
        select username,activity,startDate,endDate,row_number() over (partition by username order by startDate) rn from UserActivity        -- 首先按照姓名分组,然后按照开始时间排序,找出每个人的活动次序
    ),
        t2 as (
            select username,activity,startDate,endDate,rn,max(rn) over(partition by username) max_rn from t1  -- 然后我们再算出每个人的最大次数
        )
    select username,activity,startDate,endDate from t2  where (rn=max_rn-1 and max_rn!=1) or (rn=1 and max_rn=1) --筛选出最大值不为1,而且排序问最大值-1的信息,或者最大值和排序都为1的信息
  • pandas例子:

  • data = [['Alice', 'Travel', '2020-02-12', '2020-02-20'], ['Alice', 'Dancing', '2020-02-21', '2020-02-23'], ['Alice', 'Travel', '2020-02-24', '2020-02-28'], ['Bob', 'Travel', '2020-02-11', '2020-02-18']]
    user_activity = pd.DataFrame(data, columns=['username', 'activity', 'startDate', 'endDate']).astype({'username':'object', 'activity':'object', 'startDate':'datetime64[ns]', 'endDate':'datetime64[ns]'})
    
  • Pandas 分析与sql分析一样:

  • pandas实现:

    import pandas as pd
    
    def second_most_recent(user_activity: pd.DataFrame) -> pd.DataFrame:
        user_activity['rn']=user_activity.groupby('username')['startDate'].rank()  --按照姓名分组,然后按照开始时间排序
    
        user_activity['max_rn']=user_activity.groupby('username')['rn'].transform('max') -- 算出每个人的最大次数
    
        user_activity=(user_activity[((user_activity['rn']==1) & (user_activity['max_rn']==1)) |((user_activity['rn']==user_activity['max_rn']-1) & (user_activity['max_rn']!=1))].reset_index(drop=True)) -- 筛选出最大值不为1,而且排序问最大值-1的信息,或者最大值和排序都为1的信息
    
        user_activity=user_activity[['username','activity','startDate','endDate']] -- 取出所需要的列
        return user_activity


原文地址:https://blog.csdn.net/qq_46038718/article/details/142862366

免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!