How do I write messages to the output log on AWS Glue?
Asked Answered
A

7

44

AWS Glue jobs log output and errors to two different CloudWatch logs, /aws-glue/jobs/error and /aws-glue/jobs/output by default. When I include print() statements in my scripts for debugging, they get written to the error log (/aws-glue/jobs/error).

I have tried using:

log4jLogger = sparkContext._jvm.org.apache.log4j 
log = log4jLogger.LogManager.getLogger(__name__) 
log.warn("Hello World!")

but "Hello World!" doesn't show up in either of the logs for the test job I ran.

Does anyone know how to go about writing debug log statements to the output log (/aws-glue/jobs/output)?

TIA!

EDIT:

It turns out the above actually does work. What was happening was that I was running the job in the AWS Glue Script editor window which captures Command-F key combinations and only searches in the current script. So when I tried to search within the page for the logging output it seemed as if it hadn't been logged.

NOTE: I did discover through testing the first responder's suggestion that AWS Glue scripts don't seem to output any log message with a level less than WARN!

Aurora answered 21/2, 2018 at 19:51 Comment(3)
Do you need to import anything to use log4jLogger?Somehow adding these three lines to my script, my job hangs there. The status shows running but no log is generatedSather
This does not work for me in the Glue Job. I am outputting WARN level logs and can not see the min Cloud Watch. Is there anything else you needed to get it working? ThanksCusack
@Cusack I had the same problem. When you view the logs, you need to search for the log text in the filter event search box. log some nonsense text that will not appear in any other log records to test this.Ie
P
36

Try to use built-in python logger from logging module, by default it writes messages to standard output stream.

import logging

MSG_FORMAT = '%(asctime)s %(levelname)s %(name)s: %(message)s'
DATETIME_FORMAT = '%Y-%m-%d %H:%M:%S'
logging.basicConfig(format=MSG_FORMAT, datefmt=DATETIME_FORMAT)
logger = logging.getLogger(<logger-name-here>)

logger.setLevel(logging.INFO)

...

logger.info("Test log message")
Pleurisy answered 22/2, 2018 at 7:12 Comment(6)
Turns out the way I was originally trying to log works too. I also discovered that AWS Glue pyspark scripts won't output anything less than a WARN level (see edits above). I'll accept your answer since it works too. Thanks!Aurora
What "<logger-name-here>" i write to do the cloudwatch see my log?Lively
Any meaningful string you want, for ex. application name. This value will be used in place of %(name)s in a log message.Pleurisy
Is it possible to write only the custom messages to s3?Jehovah
Hi I have small question, logging.basicConfig(filename='s3://<bucketname>/spark.logs',level=logging.INFO) Can i store log inso into s3 bucket I tired by above config, it didnt work @AlexeyBakulinGroggery
What if I want to print out an intermediate data value such as the input data so that I can debug? I used logger.info(input_data) seems not working..Nino
H
42

I know the article is not new but maybe it could be helpful for someone: For me logging in glue works with the following lines of code:

# create glue context
glueContext = GlueContext(sc)
# set custom logging on
logger = glueContext.get_logger()
...
#write into the log file with:
logger.info("s3_key:" + your_value)
Hydria answered 9/7, 2019 at 11:29 Comment(5)
what is this s3 key means here? @Lars, is it possible to write the error messages to a file in s3?Jehovah
Official documentation on the subject docs.aws.amazon.com/glue/latest/dg/…Kith
Couple of things to note: 1. Glue logger does not take msg format strings, instead it expects full strings (so you have to handle the arguments). 2. Glue logger doesn't seem to be able to be broadcasted out to workers, so if you're trying to log from UDFs you'll need to use the Python logger.Vevine
What if I want to print out an intermediate data value such as the input data so that I can debug? I used logger.info(input_data) seems not working..Nino
@Jehovah the s3 key here is just an example of the content of a log message. You pass whatever you want in the contents of your logs to logger.info()Seabee
P
36

Try to use built-in python logger from logging module, by default it writes messages to standard output stream.

import logging

MSG_FORMAT = '%(asctime)s %(levelname)s %(name)s: %(message)s'
DATETIME_FORMAT = '%Y-%m-%d %H:%M:%S'
logging.basicConfig(format=MSG_FORMAT, datefmt=DATETIME_FORMAT)
logger = logging.getLogger(<logger-name-here>)

logger.setLevel(logging.INFO)

...

logger.info("Test log message")
Pleurisy answered 22/2, 2018 at 7:12 Comment(6)
Turns out the way I was originally trying to log works too. I also discovered that AWS Glue pyspark scripts won't output anything less than a WARN level (see edits above). I'll accept your answer since it works too. Thanks!Aurora
What "<logger-name-here>" i write to do the cloudwatch see my log?Lively
Any meaningful string you want, for ex. application name. This value will be used in place of %(name)s in a log message.Pleurisy
Is it possible to write only the custom messages to s3?Jehovah
Hi I have small question, logging.basicConfig(filename='s3://<bucketname>/spark.logs',level=logging.INFO) Can i store log inso into s3 bucket I tired by above config, it didnt work @AlexeyBakulinGroggery
What if I want to print out an intermediate data value such as the input data so that I can debug? I used logger.info(input_data) seems not working..Nino
V
11

I noticed the above answers are written in python. For Scala you could do the following

import com.amazonaws.services.glue.log.GlueLogger

object GlueApp {
  def main(sysArgs: Array[String]) {
    val logger = new GlueLogger
    logger.info("info message")
    logger.warn("warn message")
    logger.error("error message")
  }
}

You can find both Python and Scala solution from official doc here

Volscian answered 26/4, 2020 at 21:42 Comment(0)
T
7

Just in case this helps. This works to change the log level.

sc = SparkContext()
sc.setLogLevel('DEBUG')
glueContext = GlueContext(sc)
logger = glueContext.get_logger()
logger.info('Hello Glue')
Toponymy answered 22/1, 2021 at 12:25 Comment(0)
C
3

This worked for INFO level in a Glue Python job:

import sys

root = logging.getLogger()
root.setLevel(logging.DEBUG)

handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
root.addHandler(handler)
root.info("check")

source

Cyndycynera answered 29/4, 2021 at 5:5 Comment(0)
F
2

I faced the same problem. I resolved it by added logging.getLogger().addHandler(logging.StreamHandler(sys.stdout))

Before there was no prints at all, even ERROR level

The idea was taken from here https://medium.com/tieto-developers/how-to-do-application-logging-in-aws-745114ac6eb7

Another option would be to log to stdout and glue AWS logging to stdout (using stdout is actually one of the best practices in cloud logging).

Update: it works only for setLevel("WARNING") and when prints ERROR or WARING. I didn't find how to manage it for the INFO level :(

Firdausi answered 4/3, 2020 at 14:3 Comment(3)
Did you check in the error log? That's where my stderr log events end upKith
...same for my stdout log events using a logging.basicConfigKith
my prints are not in the stderrFirdausi
P
0

If you're just debugging, print() (Python) or println() (Scala) works just fine.

Pamplona answered 15/6, 2022 at 2:37 Comment(1)
print() works, kind of. But all print() statements land in a single line in the Glue log which is not ideal.Spenser

© 2022 - 2024 — McMap. All rights reserved.