How to check how many cores PySpark uses?
Asked Answered
S

1

7

I have installed VirtualBox(Ubuntu 18.04.2 64-bit) and PySpark 2.4.0. When I created a VB I put 4 CPUs to be max.

How am I supposed to check how many cores Spark is using?

Screak answered 24/2, 2019 at 16:8 Comment(0)
E
9

That depends on the master URL that describes what runtime environment (cluster manager) to use.

Since this is such a low-level infrastructure-oriented thing you can find the answer by querying a SparkContext instance.

E.g. if it's local[*] that would mean that you want to use as many CPUs (the star part) as are available on the local JVM.

$ ./bin/pyspark
Python 2.7.15 (default, Feb 19 2019, 09:17:37)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.5)] on darwin
...
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.0
      /_/

Using Python version 2.7.15 (default, Feb 19 2019 09:17:37)
SparkSession available as 'spark'.
>>> print sc.master
local[*]
>>> print sc.defaultParallelism
8
Embolism answered 24/2, 2019 at 20:37 Comment(15)
Thank you Jacek for the answer. One more question (I know that is the one that is quiet frequent...But I didn't find an answer). When I run ./bin/pyspark I get bash: ./bin/pyspark No such file or directory. How to overcome that?Screak
Where do you "When I run ./bin/pyspark"? What's the directory? What's the OS?Embolism
I tried to run it in home directory but also in folder where spark-2.4.0-bin-hadoop2.7 and also in subfolder of that folder in bin and python and everywhere I get mistake...Screak
Can you ls -l bin/pyspark while in spark-2.4.0-bin-hadoop2.7?Embolism
So when I enter in that folder and type that I get the following: -rwxr-xr-x 1 name name 2987 окт 29 07:36 bin/pysparkScreak
Looks OK. What happens when you do ./bin/pyspark while in spark-2.4.0-bin-hadoop2.7?Embolism
./bin/pyspark: line 45: python:command not found env: 'python': No such file or directoryScreak
You should then install python. Since I don't use pyspark I don't even know where in the docs they say that you should.Embolism
Can you execute python on command line? You should have the executable in PATH.Embolism
python no, python3 yesScreak
Well, you have to symlink it then. I'm not sure if pyspark works with Python3.Embolism
How did you make the code ran with python3 then since you have troubles executing pyspark?Embolism
I'm having Windows 10 os,so I created VirtualBox and installed Ubuntu18.04.2(64-bit) then I installed Python 3.6.7, jupyther notebook(where I run code), Java 10.0.2, Scala 2.11.12, Py4J, Spark 2.4.0Screak
Please ask a separate question to keep the discussion in a proper place.Embolism
Okey, thank you very much on such exhausting answering and time you spent on this.Screak

© 2022 - 2024 — McMap. All rights reserved.