So it turns out AWSGlue/pyspark does have this feature but it requires a little data wrangling and use of the scripting feature in aws glue jobs.
You can use the input_file_name function to get the full file path. This can be mapped to a column like so:
ApplyMapping_node2 = ApplyMapping_node1.toDF().withColumn("path", input_file_name())
ApplyMapping_node3 = ApplyMapping_node2.fromDF(ApplyMapping_node2, glueContext, "ApplyMapping_node2")
However, you if you need to split the path to get a specific file name you can do something like this:
ApplyMapping_node2 = ApplyMapping_node1.toDF().withColumn("path", input_file_name())
ApplyMapping_node3 = ApplyMapping_node2.withColumn("split_path", split_path_UDF(ApplyMapping_node3['path']))
ApplyMapping_node4 = ApplyMapping_node1.fromDF(ApplyMapping_node3, glueContext, "ApplyMapping_node4")
Where the split_path function is setup as a udf. Like so:
from pyspark.sql.functions import input_file_name, udf
def split_path(path):
return path.split('/')[-1]
split_path_UDF = udf(split_path, StringType())