convert spark dataframe to aws glue dynamic frame
Asked Answered
B

3

21

I tried converting my spark dataframes to dynamic to output as glueparquet files but I'm getting the error

'DataFrame' object has no attribute 'fromDF'"

My code uses heavily spark dataframes. Is there a way to convert from spark dataframe to dynamic frame so I can write out as glueparquet? If so could you please provide an example, and point out what I'm doing wrong below?

code:

# importing libraries

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

glueContext = GlueContext(SparkContext.getOrCreate())

# updated 11/19/19 for error caused in error logging function

spark = glueContext.spark_session

from pyspark.sql import Window
from pyspark.sql.functions import col
from pyspark.sql.functions import first
from pyspark.sql.functions  import date_format
from pyspark.sql.functions import lit,StringType
from pyspark.sql.types import *
from pyspark.sql.functions import substring, length, min,when,format_number,dayofmonth,hour,dayofyear,month,year,weekofyear,date_format,unix_timestamp


base_pth='s3://test/'

bckt_pth1=base_pth+'test_write/glueparquet/'


test_df=glueContext.create_dynamic_frame.from_catalog(
                 database='test_inventory',
                 table_name='inventory_tz_inventory').toDF()

test_df.fromDF(test_df, glueContext, "test_nest")


glueContext.write_dynamic_frame.from_options(frame = test_nest,
                                             connection_type = "s3",
                                             connection_options = {"path": bckt_pth1+'inventory'},
                                             format = "glueparquet")

error:

'DataFrame' object has no attribute 'fromDF'
Traceback (most recent call last):
  File "/mnt/yarn/usercache/livy/appcache/application_1574556353910_0001/container_1574556353910_0001_01_000001/pyspark.zip/pyspark/sql/dataframe.py", line 1300, in __getattr__
    "'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
AttributeError: 'DataFrame' object has no attribute 'fromDF'
Bregma answered 24/11, 2019 at 4:25 Comment(1)
It looks like you are trying to create dynamic frame from dynamic frame. Can you confirm test_df is a data frame, from the script I see that you are creating it as dynamic frame and not data frame.Arguello
C
38

fromDF is a class function. Here is how you can convert Dataframe to DynamicFrame

from awsglue.dynamicframe import DynamicFrame

DynamicFrame.fromDF(test_df, glueContext, "test_nest")

AWS Docs

Coeval answered 9/1, 2020 at 16:36 Comment(0)
D
14

Just to consolidate the answers for Scala users too, here's how to transform a Spark Dataframe to a DynamicFrame (the method fromDF doesn't exist in the scala API of the DynamicFrame) :

import com.amazonaws.services.glue.DynamicFrame  
val dynamicFrame = DynamicFrame(df, glueContext)

I hope it helps !

Daggett answered 13/2, 2020 at 11:58 Comment(1)
probably not the place for this question, but what are the benefit to scala in glue vs pyspark for df transformations and loads?Fuzee
G
0
# Import Dynamic DataFrame class
from awsglue.dynamicframe import DynamicFrame

#Convert from Spark Data Frame to Glue Dynamic Frame
dyfCustomersConvert = DynamicFrame.fromDF(df, glueContext, "convert")

#Show converted Glue Dynamic Frame
dyfCustomersConvert.show()
Garzon answered 28/1 at 15:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.