How to set logLevel in a pyspark job
Asked Answered
S

3

6

I'm trying to set the log level in a pyspark job. I'm not using the spark shell, so I can't just do what it advises and call sc.setLogLevel(newLevel), since I don't have an sc object.

A lot of sources say to just modify the log4j.properties, but I don't know where to find/put that file. I used pip install pyspark in a virtual environment, so I don't have a $SPARK_HOME environment variable that I've set (the sources usually say the log4j.properties is in the $SPARK_HOME).

I hope I can call this programmatically, but I don't know where to call setLogLevel. Right now my setup code is just this:

spark = SparkSession.builder.master("local").appName("test-mf").getOrCreate()
Scare answered 27/3, 2018 at 22:26 Comment(0)
S
10

The SparkSession object has an attribute to get the SparkContext object, and calling setLogLevel on it does change the log level being used:

spark = SparkSession.builder.master("local").appName("test-mf").getOrCreate()
spark.sparkContext.setLogLevel("DEBUG")
Scare answered 27/3, 2018 at 22:28 Comment(2)
how to set that property before getOrCreate() which emits message to stderr?Manualmanubrium
This didn't work for me. The WARN level is still active.Monogenetic
L
1

In order to suppress messages that are emitted by getOrCreate, you'll need to pass config options to the sparksession builder using appName(...).config(...).config(...).getOrCreate()

Use the following config properties to override the default log4j properties:

    ...
    .config("spark.driver.extraJavaOptions", "-Dlog4j.configuration=file:custom_log4j.properties")

    ...
    .config("spark.executor.extraJavaOptions", "-Dlog4j.configuration=file:custom_log4j.properties")

The following custom_log4j.properties file is sufficient to mute warnings on my system:

# Set everything to be logged to the console
log4j.rootCategory=ERROR, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Set the log level to ERROR for everything
log4j.logger.org.apache=ERROR
Lovelorn answered 7/5, 2023 at 21:57 Comment(0)
B
0

This is old, but I lowered much of the logs from Blake's answer. This is how I did it:

engine = SparkSession.builder.appName(app_name) \
            .config("spark.log.level", "ERROR") \
            .getOrCreate()

I still have a starting log, but just after it:

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Setting Spark log level to "ERROR".
Bleeding answered 28/7, 2024 at 7:33 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.