Where can I find an exhaustive list of actions for spark?
Asked Answered
C

2

8

I want to know exactly what I can do in spark without triggering the computation of the spark RDD/DataFrame.

It's my understanding that only actions trigger the execution of the transformations in order to produce a DataFrame. The problem is that I'm unable to find a comprehensive list of spark actions.

Spark documentation lists some actions, but it's not exhaustive. For example show is not there, but it is considered an action.

  • Where can I find a full list of actions?
  • Can I assume that all methods listed here are also actions?
Cusp answered 8/7, 2024 at 21:20 Comment(1)
To answer your second question, the methods for pyspark.sql.DataFrame are not all actions. count and show are both actions; select and join are not actionsMelbourne
D
4

All the methods annotated in the with @group action are actions. They can be found as a list here in scaladocs. They can also be found in the source where each method is defined, looking like this:

   * @group action
   * @since 1.6.0
   */
  def show(numRows: Int): Unit = show(numRows, truncate = true)

Additionally, some other methods do not have that annotation, but also perform an eager evaluation: Those that call withAction. Checkpoint, for example, actually performs an action but isn't grouped as such in the docs:

private[sql] def checkpoint(eager: Boolean, reliableCheckpoint: Boolean): Dataset[T] = {
    val actionName = if (reliableCheckpoint) "checkpoint" else "localCheckpoint"
    withAction(actionName, queryExecution) { physicalPlan =>
      val internalRdd = physicalPlan.execute().map(_.copy())
      if (reliableCheckpoint) {

To find all of them

  1. Go to the source
  2. Use control + F
  3. Search for private def withAction
  4. Click on withAction
  5. On the right you should see a list of methods that use them. This is how that list currently looks:

current withAction methods

Dionnadionne answered 9/7, 2024 at 8:12 Comment(0)
E
0

I don't think there exists an exhaustive list of all Spark actions out there. But I think it is helpful to build up a mental model on the difference and refer to the documentation when needed.

For transformation there is no expected output from calling the function alone. It is only when you call an action that Spark starts to compute the results. There are three kinds of actions as follows

(Excerpt from Spark: The Definitive Guide) Excerpt from Spark: The Definitive Guide

The link you provided lists some actions, but includes transformations in there as well

Esperance answered 8/7, 2024 at 22:11 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.