pyspark使用 graphframes创建图的方法
1、安装graphframes的步骤
1.1 查看 spark 和 scala版本
在终端输入: spark-shell --version 查看spark 和scala版本
1.2 在maven库中下载对应版本的graphframes
https://mvnrepository.com/artifact/graphframes/graphframes
我这里需要的是spark 2.4 scala 2.11版本
https://mvnrepository.com/artifact/graphframes/graphframes/0.8.0-spark2.4-s_2.11
1.3 在pyspark的环境中配置graphframe的jar包
os.environ['PYSPARK_PYTHON'] = 'Python3.7/bin/python'
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars graphframes-0.8.1-spark2.4-s_2.11.jar pyspark-shell'
spark = SparkSession \
.builder \
.appName("read_data") \
.config('spark.pyspark.python', 'Python3.7/bin/python') \
.config('spark.yarn.dist.archives', 'hdfs://ns62007/user/dmc_adm/_PYSPARK_ENV/Python3.7.zip#Python3.7') \
.config('spark.executorEnv.PYSPARK_PYTHON', 'Python3.7/bin/python') \
.config('spark.sql.autoBroadcastJoinThreshold', '-1') \
.enableHiveSupport() \
.getOrCreate()
spark.sparkContext.addPyFile('graphframes-0.8.1-spark2.4-s_2.11.jar')
2、导入GraphFrame创建图
2.1 导入包使用
from graphframes import GraphFrame
2.2 创建图的例子
from pyspark.sql.types import *
import pandas as pd
from graphframes import GraphFrame
#创建图的方法1
v = spark.createDataFrame([
("a", "Alice", 34),
("b", "Bob", 36),
("c", "Charlie", 30),
], ["id", "name", "age"])
# Create an Edge DataFrame with "src" and "dst" columns
e = spark.createDataFrame([
("a", "b", "friend"),
("b", "c", "follow"),
("c", "b", "follow"),
], ["src", "dst", "relationship"])
# Create a GraphFrame
g = GraphFrame(v, e)
# Query: Get in-degree of each vertex.
g.inDegrees.show()
也可以简单化顶点和边:
#创建图的方法2
edges_df= spark.createDataFrame([
("a", "b"),
("b", "c"),
("c", "b"),
], ["src", "dst"])
nodes_df=spark.createDataFrame([
(1, "a"),
(2, "b"),
(3, "c")
], ["num","id"])
graph=GraphFrame(nodes_df, edges_df)
graph.inDegrees.show()
原文地址:https://blog.csdn.net/eylier/article/details/140505019
免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!