How to show full column content in a Spark Dataframe?

R

17

323

I am using spark-csv to load data into a DataFrame. I want to do a simple query and display the content:

val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("my.csv")
df.registerTempTable("tasks")
results = sqlContext.sql("select col from tasks");
results.show()

The col seems truncated:

scala> results.show();
+--------------------+
|                 col|
+--------------------+
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-06 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:21:...|
|2015-11-16 07:21:...|
|2015-11-16 07:21:...|
+--------------------+

How do I show the full content of the column?

Rutkowski answered 16/11, 2015 at 19:17 Comment(0)

W

562

results.show(20, false) will not truncate. Check the source

20 is the default number of rows displayed when show() is called without any arguments.

Westward answered 16/11, 2015 at 19:24 Comment(9)

Not OP but this is indeed the right answer : Minor correction, boolean should be False, not false. – Tetracaine 1/4, 2016 at 19:49

It would be "False" in python, but "false" in scala/java – Collarbone 7/10, 2016 at 23:11

it's false (not False) in spark-shell – Underside 12/1, 2018 at 17:22

the equivalent for writing to stream in console mode is dataFrame.writeStream.outputMode("append").format("console").option("truncate", "false").start() – Fatma 4/4, 2019 at 23:54

what is so special about 20? Why 20? – Welldefined 29/8, 2019 at 11:59

OP asked how not to truncate the columns, so @Westward gave them the equivalent of df.show() (20 rows per default) that does not truncate the columns. I.e. df.show(20, false) – Argentous 28/1, 2022 at 17:1

Use dataframe_name.show(truncate = False) – Antony 31/3, 2022 at 5:55

FYI: there is another interesting option for show which is to show "vertically" -- the third optional argument: n=20, truncate=True, vertical=False. It's sometimes easier to read the data in this format. – Uyekawa 6/12, 2022 at 15:29

any alternative for %%sql spark magic for this ? – Bonhomie 7/2, 2023 at 2:55

H

68

If you put results.show(false) , results will not be truncated

Huberthuberto answered 8/4, 2016 at 19:2 Comment(5)

I imagine that the comment on TomTom101's answer about false applies here, too. – Mongo 28/4, 2016 at 3:17

@Narendra Parmar the syntax should be results.show(20, False). The one you have mentioned will give error. – Bondsman 19/7, 2017 at 1:8

@ Jai Prakash , i have given this answer for scala and you are talking about python, – Huberthuberto 19/7, 2017 at 22:10

@NarendraParmar sorry you are correct. In scala both the options are valid. results.show(false) and results.show(20, false) – Bondsman 9/8, 2017 at 7:29

@JaiPrakash -- in ASA, "false" has to have a capital f: "False" is ok, but "false" gives an error. – Mungovan 28/2, 2023 at 23:22

T

41

Below code would help to view all rows without truncation in each column

df.show(df.count(), False)

Tendance answered 5/2, 2017 at 1:21 Comment(3)

same questio i asked the prior answerer: does this cause df to be collected twice? – Mediocrity 19/4, 2018 at 3:51

@javadba yes, I think count() will go through df once, and show() will collect df twice. – Tendance 13/2, 2020 at 20:0

As an alternative, you could give a very large number as the first parameter instead of df.count() in order to save on the requirement to persist. For example, if the row count of df is 1000, you could do df.show(1000000, false) and it will work. Tried the following and it worked: scala> println(df.count) res2: Long = 987 scala> df.show(990) – Fumble 1/11, 2021 at 9:53

B

23

The other solutions are good. If these are your goals:

No truncation of columns,
No loss of rows,
Fast and
Efficient

These two lines are useful ...

    df.persist
    df.show(df.count, false) // in Scala or 'False' in Python

By persisting, the 2 executor actions, count and show, are faster & more efficient when using persist or cache to maintain the interim underlying dataframe structure within the executors. See more about persist and cache.

Behalf answered 15/2, 2017 at 6:25 Comment(1)

Very nice. Thanks! – Essene 24/1, 2018 at 18:1

M

12

results.show(20, False) or results.show(20, false) depending on whether you are running it on Java/Scala/Python

Mendenhall answered 8/3, 2017 at 5:40 Comment(0)

P

11

In Pyspark we can use

df.show(truncate=False) this will display the full content of the columns without truncation.

df.show(5,truncate=False) this will display the full content of the first five rows.

Plutus answered 12/7, 2021 at 21:39 Comment(0)

D

10

The following answer applies to a Spark Streaming application.

By setting the "truncate" option to false, you can tell the output sink to display the full column.

val query = out.writeStream
          .outputMode(OutputMode.Update())
          .format("console")
          .option("truncate", false)
          .trigger(Trigger.ProcessingTime("5 seconds"))
          .start()

Detruncate answered 10/6, 2020 at 19:55 Comment(0)

I

6

In Spark Pythonic way, remember:

if you have to display data from a dataframe, use show(truncate=False) method.
else if you have to display data from a Stream dataframe view (Structured Streaming), use the writeStream.format("console").option("truncate", False).start() methods with option.

Hope it could helps someone.

Intercross answered 5/4, 2022 at 12:13 Comment(0)

B

4

Within Databricks you can visualize the dataframe in a tabular format. With the command:

display(results)

It will look like

Brittain answered 10/9, 2018 at 9:12 Comment(1)

how with display() show only, for example, first 5 rows? – Ingather 6/10, 2022 at 14:45

H

4

In c# Option("truncate", false) does not truncate data in the output.

StreamingQuery query = spark
                    .Sql("SELECT * FROM Messages")
                    .WriteStream()
                    .OutputMode("append")
                    .Format("console")
                    .Option("truncate", false)
                    .Start();

Hermaphrodite answered 1/4, 2020 at 19:37 Comment(0)

P

4

Try df.show(20,False)

Notice that if you do not specify the number of rows you want to show, it will show 20 rows but will execute all your dataframe which will take more time !

Penta answered 30/6, 2021 at 14:36 Comment(0)

T

3

try this command :

df.show(df.count())

Tildy answered 25/11, 2016 at 20:16 Comment(3)

Try this: df.show(some no) will work but df.show(df.count()) will not work df.count gives output type long which is not accepted by df.show() as it accept integer type. – Hawsepipe 22/8, 2017 at 11:38

Example use df.show(2000). It will retrieve 2000 rows – Hawsepipe 23/8, 2017 at 4:43

does this cause df to be collected twice? – Mediocrity 19/4, 2018 at 3:51

O

3

results.show(false) will show you the full column content.

Show method by default limit to 20, and adding a number before false will show more rows.

Oshiro answered 8/11, 2017 at 17:54 Comment(0)

M

3

results.show(20,false) did the trick for me in Scala.

Mellisa answered 16/4, 2018 at 18:32 Comment(0)

L

3

Tried this in pyspark

df.show(truncate=0)

Log answered 18/9, 2020 at 12:29 Comment(0)

G

1

PYSPARK

In the below code, df is the name of dataframe. 1st parameter is to show all rows in the dataframe dynamically rather than hardcoding a numeric value. The 2nd parameter will take care of displaying full column contents since the value is set as False.

df.show(df.count(),False)

SCALA

In the below code, df is the name of dataframe. 1st parameter is to show all rows in the dataframe dynamically rather than hardcoding a numeric value. The 2nd parameter will take care of displaying full column contents since the value is set as false.

df.show(df.count().toInt,false)

Glia answered 13/1, 2021 at 4:41 Comment(0)

C

0

Try this in scala:

df.show(df.count.toInt, false)

The show method accepts an integer and a Boolean value but df.count returns Long...so type casting is required

Cody answered 10/12, 2019 at 1:53 Comment(0)

Recommended topics

Hot tags