自学内容网 自学内容网

Pyspark中pyspark.sql.functions常用方法(2)(时间函数)

pyspark sql functions

current_database

返回当前数据库

spark.range(1).select(current_database()).show()  
+------------------+
|current_database()|
+------------------+
|           default|
------------------+

current_date 获取当前日期

df.withColumn('ldsx',sf.current_date()).show()  
+---+----------+
| id|      ldsx|
+---+----------+
|  0|2024-10-09|
+---+----------+

add_months 月份操作

df = spark.createDataFrame([('2015-04-08', 2)], ['dt', 'add'])
df.show()
+----------+---+
|        dt|add|
+----------+---+
|2015-04-08|  2|
+----------+---+

df.select(sf.add_months(df.dt, 1).alias('next_month')).show()
+----------+
|next_month|
+----------+
|2015-05-08|
+----------+

df.select(sf.add_months('dt', -2).alias('prev_month')).show()
+----------+
|prev_month|
+----------+
|2015-02-08|
+----------+

date_add 日操作增加

df = spark.createDataFrame([('2015-04-08', 2,)], ['dt', 'add'])
# 日期加1
df.select(date_add(df.dt, 1).alias('next_date')).collect()
[Row(next_date=datetime.date(2015, 4, 9))]
# 日期-1
df.select(date_add('dt', -1).alias('prev_date')).collect()
[Row(prev_date=datetime.date(2015, 4, 7))]

date_sub 日操作减少与date_add相反

df.show()
+----------+---+
|        dt|sub|
+----------+---+
|2015-04-08|  2|
+----------+---+

df.select(sf.date_sub(df.dt, 1).alias('prev_date')).show()
+----------+
| prev_date|
+----------+
|2015-04-07|
+----------+

df.select(sf.date_sub(df.dt, 5).alias('prev_date')).show()
+----------+
| prev_date|
+----------+
|2015-04-03|
+----------+

df.select(sf.date_sub(df.dt, -5).alias('prev_date')).show()
+----------+
| prev_date|
+----------+
|2015-04-13|
+----------+

date_format 时间格式化

df.show()
+----------+
|        dt|
+----------+
|2015-04-08|
+----------+
df.select(sf.date_format('dt', 'MM/dd/yyy').alias('date')).show()
+----------+
|      date|
+----------+
|04/08/2015|
+----------+

weekofyear 将给定日期的周数提取为整数

df = spark.createDataFrame([('2015-04-08',)], ['dt'])
df.show()
+----------+
|        dt|
+----------+
|2015-04-08|
+----------+
df.select(sf.weekofyear(df.dt).alias('week')).show()
+----+
|week|
+----+
|  15|
+----+
df = spark.createDataFrame([('2015-04-01',)], ['dt'])
df.select(sf.weekofyear(df.dt).alias('week')).show()
+----+
|week|
+----+
|  14|
+----+

year 提取年份

df.select(sf.year(df.dt).alias('year')).show()
+----+
|year|
+----+
|2015|
+----+

month 提取月份

df = spark.createDataFrame([('2015-04-08',)], ['dt'])
df.show()
+----------+
|        dt|
+----------+
|2015-04-08|
+----------+
df.select(sf.month(df.dt).alias('month')).show()
+-----+
|month|
+-----+
|    4|
+-----+

hour 提取小时

df = spark.createDataFrame([(datetime.datetime(2015, 4, 8, 13, 8, 15),)], ['ts'])
df.show()
+-------------------+
|                 ts|
+-------------------+
|2015-04-08 13:08:15|
+-------------------+
df.select(sf.minute('ts').alias('hour')).show()
+------+
|minute|
+------+
|    13|
+------+

minute 提取分钟

df = spark.createDataFrame([(datetime.datetime(2015, 4, 8, 13, 8, 15),)], ['ts'])
df.show()
+-------------------+
|                 ts|
+-------------------+
|2015-04-08 13:08:15|
+-------------------+
df.select(sf.minute('ts').alias('minute')).show()
+------+
|minute|
+------+
|     8|
+------+

seconds 提取秒

import datetime
df = spark.createDataFrame([(datetime.datetime(2015, 4, 8, 13, 8, 15),)], ['ts'])
df.show()
+-------------------+
|                 ts|
+-------------------+
|2015-04-08 13:08:15|
+-------------------+
# 返回秒
df.select(sf.second('ts').alias('second')).show()
+------+
|second|
+------+
|    15|
+------+

last_day 日期所属月份的最后一天

df = spark.createDataFrame([('2015-04-08',)], ['dt'])
df.show()
+----------+
|        dt|
+----------+
|2015-04-08|
+----------+
df.select(sf.last_day(df.dt).alias('date')).show()
+----------+
|      date|
+----------+
|2015-04-30|
+----------+

localtimestamp 返回时间戳

df.select(sf.localtimestamp()).collect()
[Row(localtimestamp()=datetime.datetime(2024, 10, 9, 15, 45, 17, 57000))]

next_day 获取下一个日期

“Mon”, “Tue”, “Wed”, “Thu”, “Fri”, “Sat”, “Sun”

# 获取当前时间的下一个周日
df.select(sf.next_day(df.d, 'Sun').alias('date')).show()
+----------+
|      date|
+----------+
|2015-08-02|
+----------+
# 获取当前时间的下一个周六
df.select(sf.next_day(df.d, 'Sat').alias('date')).show()
+----------+
|      date|
+----------+
|2015-08-01|
+----------+

make_date 拼接日期

参数三个column,分别是年月日

df = spark.createDataFrame([(2020, 6, 26)], ['Y', 'M', 'D'])
df.show()
+----+---+---+
|   Y|  M|  D|
+----+---+---+
|2020|  6| 26|
+----+---+---+

df.select(sf.make_date(df.Y, df.M, df.D).alias("datefield")).show()
+----------+
| datefield|
+----------+
|2020-06-26|
+----------+

df.select(sf.make_date(sf.lit('2020'), sf.lit('10'), sf.lit('13')).alias("datefield")).show()
+----------+
| datefield|
+----------+
|2020-10-13|
+----------+

to_date 转换日期格式

参数:1.要转换的column,2.day format(可选)

col.cast("date")

df = spark.createDataFrame([('1997-02-28 10:30:00',)], ['t'])
df.printSchema()
root
 |-- t: string (nullable = true)
spark.createDataFrame([('1997-02-28 10:30:00',)], ['t']).collect()
[Row(t='1997-02-28 10:30:00')]

# 转换
df.select(sf.to_date(df.t).alias('date')).collect()
[Row(date=datetime.date(1997, 2, 28))]

# 可以使用format
df.select(sf.to_date(df.t, 'yyyy-MM-dd HH:mm:ss').alias('date')).show()
+----------+
|      date|
+----------+
|1997-02-28|
+----------+

trunc 截断重置时间

‘year’, ‘yyyy’, ‘yy’ to truncate by year, or ‘month’, ‘mon’, ‘mm’ to truncate by month Other options are: ‘week’, ‘quarter’

# 按年截 月日初始
df.select(sf.trunc(df.d, 'year').alias('year')).show()
+----------+
|      year|
+----------+
|1997-01-01|
+----------+
# 按月截 日初始
df.select(sf.trunc(df.d, 'mon').alias('year')).show()
+----------+
|      year|
+----------+
|1997-02-01|
+----------+

weekday 返回日期所在星期几

返回日期/时间戳的星期几(0=星期一,1=星期二,…,6=星期日)

f = spark.createDataFrame([('2015-04-08',)], ['dt'])
df.select(weekday('dt').alias('day')).show()
+---+
|day|
+---+
|  2|
+---+

原文地址:https://blog.csdn.net/weixin_43322583/article/details/142869739

免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!