自学内容网 自学内容网

[每日一氵] py4j.protocol.Py4JError: An error occurred while calling xx

供兄弟们检索这个问题

py4j.protocol.Py4JNetworkError: Answer from Java side is empty
py4j.protocol.Py4JError: An error occurred while calling

报错log全文

2024-11-06T14:15:57.638+0800: 1.362: [GC (Allocation Failure) [PSYoungGen: 209920K->13072K(279552K)] 209920K->13088K(2027520K), 0.0144828 secs] [Times: user=0.02 sys=0.01, real=0.02 secs] 
2024-11-06T14:15:57.846+0800: 1.570: [GC (Metadata GC Threshold) [PSYoungGen: 43298K->6592K(489472K)] 43314K->6616K(2237440K), 0.0055518 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 
2024-11-06T14:15:57.852+0800: 1.576: [Full GC (Metadata GC Threshold) [PSYoungGen: 6592K->0K(489472K)] [ParOldGen: 24K->6391K(309760K)] 6616K->6391K(799232K), [Metaspace: 20673K->20673K(1067008K)], 0.0246174 secs] [Times: user=0.06 sys=0.01, real=0.03 secs] 
2024-11-06T14:15:59.520+0800: 3.244: [GC (Metadata GC Threshold) [PSYoungGen: 225839K->13730K(489472K)] 232230K->20201K(799232K), 0.0128483 secs] [Times: user=0.03 sys=0.00, real=0.02 secs] 
2024-11-06T14:15:59.533+0800: 3.257: [Full GC (Metadata GC Threshold) [PSYoungGen: 13730K->0K(489472K)] [ParOldGen: 6471K->18273K(479744K)] 20201K->18273K(969216K), [Metaspace: 33753K->33753K(1079296K)], 0.0308338 secs] [Times: user=0.06 sys=0.01, real=0.03 secs] 
xxxx/pyspark.zip/pyspark/context.py:340: RuntimeWarning: Failed to add file [EEE/anti.py] specified in 'spark.submit.pyFiles' to Python path:
  YYY
  TTT
  UUU.jar
  xxxx
  xxxx/__pyfiles__
  xxxx/pyspark.zip
  xxxx/py4j-0.10.9.5-src.zip
  xxxx/executor_user_py_env/lib/python39.zip
  xxxx/executor_user_py_env/lib/python3.9
  xxxx/executor_user_py_env/lib/python3.9/lib-dynload
  xxxx/executor_user_py_env/lib/python3.9/site-packages
  warnings.warn(
xxxx/pyspark.zip/pyspark/context.py:340: RuntimeWarning: Failed to add file [EEE/common_define.py] specified in 'spark.submit.pyFiles' to Python path:
  YYY
  TTT
  UUU.jar
  xxxx
  xxxx/__pyfiles__
  xxxx/pyspark.zip
  xxxx/py4j-0.10.9.5-src.zip
  xxxx/executor_user_py_env/lib/python39.zip
  xxxx/executor_user_py_env/lib/python3.9
  xxxx/executor_user_py_env/lib/python3.9/lib-dynload
  xxxx/executor_user_py_env/lib/python3.9/site-packages
  warnings.warn(
xxxx/pyspark.zip/pyspark/context.py:340: RuntimeWarning: Failed to add file [EEE/config.py] specified in 'spark.submit.pyFiles' to Python path:
  YYY
  TTT
  UUU.jar
  xxxx
  xxxx/__pyfiles__
  xxxx/pyspark.zip
  xxxx/py4j-0.10.9.5-src.zip
  xxxx/executor_user_py_env/lib/python39.zip
  xxxx/executor_user_py_env/lib/python3.9
  xxxx/executor_user_py_env/lib/python3.9/lib-dynload
  xxxx/executor_user_py_env/lib/python3.9/site-packages
  warnings.warn(
xxxx/pyspark.zip/pyspark/context.py:340: RuntimeWarning: Failed to add file [EEE/utils.py] specified in 'spark.submit.pyFiles' to Python path:
  YYY
  TTT
  UUU.jar
  xxxx
  xxxx/__pyfiles__
  xxxx/pyspark.zip
  xxxx/py4j-0.10.9.5-src.zip
  xxxx/executor_user_py_env/lib/python39.zip
  xxxx/executor_user_py_env/lib/python3.9
  xxxx/executor_user_py_env/lib/python3.9/lib-dynload
  xxxx/executor_user_py_env/lib/python3.9/site-packages
  warnings.warn(
================================
[INFO] SQL DATA READING STARTING |
================================
2024-11-06T14:16:03.676+0800: 7.399: [GC (Metadata GC Threshold) [PSYoungGen: 333446K->23988K(641024K)] 351720K->42270K(1120768K), 0.0201206 secs] [Times: user=0.04 sys=0.01, real=0.02 secs] 
2024-11-06T14:16:03.696+0800: 7.419: [Full GC (Metadata GC Threshold) [PSYoungGen: 23988K->0K(641024K)] [ParOldGen: 18281K->31872K(658432K)] 42270K->31872K(1299456K), [Metaspace: 55449K->55446K(1101824K)], 0.0935589 secs] [Times: user=0.24 sys=0.02, real=0.09 secs] 
2024-11-06T14:16:08.425+0800: 12.149: [GC (Metadata GC Threshold) [PSYoungGen: 436446K->42196K(670720K)] 468318K->74076K(1329152K), 0.0335286 secs] [Times: user=0.06 sys=0.02, real=0.03 secs] 
2024-11-06T14:16:08.459+0800: 12.182: [Full GC (Metadata GC Threshold) [PSYoungGen: 42196K->0K(670720K)] [ParOldGen: 31880K->55897K(916480K)] 74076K->55897K(1587200K), [Metaspace: 93222K->93222K(1134592K)], 0.0877626 secs] [Times: user=0.18 sys=0.03, real=0.09 secs] 
ERROR:root:Exception while sending command.
Traceback (most recent call last):
  File "xxxx/py4j-0.10.9.5-src.zip/py4j/clientserver.py", line 516, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "xxxx/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1038, in send_command
    response = connection.send_command(command)
  File "xxxx/py4j-0.10.9.5-src.zip/py4j/clientserver.py", line 539, in send_command
    raise Py4JNetworkError(
py4j.protocol.Py4JNetworkError: Error while sending or receiving
Traceback (most recent call last):
  File "xxxx/main.py", line 142, in <module>
    df_spark.show(3, truncate=False)
  File "xxxx/pyspark.zip/pyspark/sql/dataframe.py", line 615, in show
  File "xxxx/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
  File "xxxx/pyspark.zip/pyspark/sql/utils.py", line 190, in deco
    def __init__(self, *args, **kwargs):
  File "xxxx/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 334, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling o190.showString

报错代码:

from pyspark.sql import SparkSession

# 创建一个SparkSession
spark = (
    SparkSession.builder.appName("spark XX Demo")
    .master("local[1]")  # <----- 把这里删掉就好了
    .getOrCreate()
)

报错原因:

在使用 PySpark 本地模式时,.master("local[1]") 和没有指定 master 设置之间是有区别的:

.master("local[1]")

  • 这个配置表示使用单线程在本地运行 Spark 应用程序。所有任务都会在同一个线程中顺序执行。
  • 在这种模式下,Spark 的执行环境是非常受限的,尤其是当你的应用程序涉及到多线程或需要并行执行时,可能会遇到资源不足或死锁等问题。
  • 当你使用 local[1] 时,所有的任务(包括驱动程序和执行器)都在同一个线程中运行,这可能导致 Py4J 的通信问题,因为没有足够的线程来处理 Python 和 JVM 之间的通信。

没有指定 master

  • 如果没有指定 master,Spark 默认为 local[*],即使用本地机器上所有可用的 CPU 核心。
  • 这种情况下,Spark 可以并行执行任务,充分利用多核 CPU 的资源,更加接近于集群模式的运行方式,也更不容易出现资源竞争的问题。
  • 使用多个线程可以避免一些因为单线程执行导致的网络通信阻塞或资源竞争问题。

因此,删除 .master("local[1]") 之后,Spark 使用了多线程模式,这解决了之前由于单线程执行导致的 Py4J 网络通信问题。在开发和测试时,使用 local[*] 通常会提供更好的性能和更少的运行时问题。


原文地址:https://blog.csdn.net/HaoZiHuang/article/details/143575902

免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!