自学内容网 自学内容网

pyspark使用 graphframes创建图的方法

1、安装graphframes的步骤

1.1 查看 spark 和 scala版本

在终端输入: spark-shell --version 查看spark 和scala版本

1.2 在maven库中下载对应版本的graphframes

https://mvnrepository.com/artifact/graphframes/graphframes

我这里需要的是spark 2.4 scala 2.11版本

https://mvnrepository.com/artifact/graphframes/graphframes/0.8.0-spark2.4-s_2.11

1.3 在pyspark的环境中配置graphframe的jar包

os.environ['PYSPARK_PYTHON'] = 'Python3.7/bin/python'
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars graphframes-0.8.1-spark2.4-s_2.11.jar pyspark-shell'

spark = SparkSession \
        .builder \
        .appName("read_data") \
        .config('spark.pyspark.python', 'Python3.7/bin/python') \
        .config('spark.yarn.dist.archives', 'hdfs://ns62007/user/dmc_adm/_PYSPARK_ENV/Python3.7.zip#Python3.7') \
        .config('spark.executorEnv.PYSPARK_PYTHON', 'Python3.7/bin/python') \
        .config('spark.sql.autoBroadcastJoinThreshold', '-1') \
        .enableHiveSupport() \
        .getOrCreate()

spark.sparkContext.addPyFile('graphframes-0.8.1-spark2.4-s_2.11.jar')

2、导入GraphFrame创建图

2.1 导入包使用

from graphframes import GraphFrame

2.2 创建图的例子

from pyspark.sql.types import *
import pandas as pd
from graphframes import GraphFrame

#创建图的方法1
v = spark.createDataFrame([
  ("a", "Alice", 34),
  ("b", "Bob", 36),
  ("c", "Charlie", 30),
], ["id", "name", "age"])

# Create an Edge DataFrame with "src" and "dst" columns
e = spark.createDataFrame([
  ("a", "b", "friend"),
  ("b", "c", "follow"),
  ("c", "b", "follow"),
], ["src", "dst", "relationship"])
# Create a GraphFrame
g = GraphFrame(v, e)

# Query: Get in-degree of each vertex.
g.inDegrees.show()

也可以简单化顶点和边:

#创建图的方法2
edges_df= spark.createDataFrame([
  ("a", "b"),
  ("b", "c"),
  ("c", "b"),
], ["src", "dst"])
nodes_df=spark.createDataFrame([
  (1, "a"),
  (2, "b"),
  (3, "c")
], ["num","id"])

graph=GraphFrame(nodes_df, edges_df)
graph.inDegrees.show()


原文地址:https://blog.csdn.net/eylier/article/details/140505019

免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!