I have installed VirtualBox(Ubuntu 18.04.2 64-bit) and PySpark 2.4.0. When I created a VB I put 4 CPUs to be max.
How am I supposed to check how many cores Spark is using?
I have installed VirtualBox(Ubuntu 18.04.2 64-bit) and PySpark 2.4.0. When I created a VB I put 4 CPUs to be max.
How am I supposed to check how many cores Spark is using?
That depends on the master URL that describes what runtime environment (cluster manager) to use.
Since this is such a low-level infrastructure-oriented thing you can find the answer by querying a SparkContext
instance.
E.g. if it's local[*]
that would mean that you want to use as many CPUs (the star part) as are available on the local JVM.
$ ./bin/pyspark
Python 2.7.15 (default, Feb 19 2019, 09:17:37)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.5)] on darwin
...
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.4.0
/_/
Using Python version 2.7.15 (default, Feb 19 2019 09:17:37)
SparkSession available as 'spark'.
>>> print sc.master
local[*]
>>> print sc.defaultParallelism
8
ls -l bin/pyspark
while in spark-2.4.0-bin-hadoop2.7
? –
Embolism ./bin/pyspark
while in spark-2.4.0-bin-hadoop2.7
? –
Embolism python
. Since I don't use pyspark I don't even know where in the docs they say that you should. –
Embolism python
on command line? You should have the executable in PATH
. –
Embolism © 2022 - 2024 — McMap. All rights reserved.