I just got my new MacBook Pro with M1 Max chip and am setting up Python. I've tried several combinational settings to test speed - now I'm quite confused. First put my questions here:
- Why python run natively on M1 Max is greatly (~100%) slower than on my old MacBook Pro 2016 with Intel i5?
- On M1 Max, why there isn't significant speed difference between native run (by miniforge) and run via Rosetta (by anaconda) - which is supposed to be slower ~20%?
- On M1 Max and native run, why there isn't significant speed difference between conda installed Numpy and TensorFlow installed Numpy - which is supposed to be faster?
- On M1 Max, why run in PyCharm IDE is constantly slower ~20% than run from terminal, which doesn't happen on my old Intel Mac.
Evidence supporting my questions is as follows:
Here are the settings I've tried:
1. Python installed by
- Miniforge-arm64, so that python is natively run on M1 Max Chip. (Check from Activity Monitor,
Kind
of python process isApple
). - Anaconda. Then python is run via Rosseta. (Check from Activity Monitor,
Kind
of python process isIntel
).
2. Numpy installed by
conda install numpy
: numpy from original conda-forge channel, or pre-installed with anaconda.- Apple-TensorFlow: with python installed by miniforge, I directly install tensorflow, and numpy will also be installed. It's said that, numpy installed in this way is optimized for Apple M1 and will be faster. Here is the installation commands:
conda install -c apple tensorflow-deps
python -m pip install tensorflow-macos
python -m pip install tensorflow-metal
3. Run from
- Terminal.
- PyCharm (Apple Silicon version).
Here is the test code:
import time
import numpy as np
np.random.seed(42)
a = np.random.uniform(size=(300, 300))
runtimes = 10
timecosts = []
for _ in range(runtimes):
s_time = time.time()
for i in range(100):
a += 1
np.linalg.svd(a)
timecosts.append(time.time() - s_time)
print(f'mean of {runtimes} runs: {np.mean(timecosts):.5f}s')
and here are the results:
+-----------------------------------+-----------------------+--------------------+
| Python installed by (run on)→ | Miniforge (native M1) | Anaconda (Rosseta) |
+----------------------+------------+------------+----------+----------+---------+
| Numpy installed by ↓ | Run from → | Terminal | PyCharm | Terminal | PyCharm |
+----------------------+------------+------------+----------+----------+---------+
| Apple Tensorflow | 4.19151 | 4.86248 | / | / |
+-----------------------------------+------------+----------+----------+---------+
| conda install numpy | 4.29386 | 4.98370 | 4.10029 | 4.99271 |
+-----------------------------------+------------+----------+----------+---------+
This is quite slow. For comparison,
- run the same code on my old MacBook Pro 2016 with i5 chip - it costs
2.39917s
. - another post (but not in English) reports that run with M1 chip (not Pro or Max), miniforge+conda_installed_numpy is
2.53214s
, and miniforge+apple_tensorflow_numpy is1.00613s
. - you may also try on it your own.
Here is the CPU information details:
- My old i5:
$ sysctl -a | grep -e brand_string -e cpu.core_count
machdep.cpu.brand_string: Intel(R) Core(TM) i5-6360U CPU @ 2.00GHz
machdep.cpu.core_count: 2
- My new M1 Max:
% sysctl -a | grep -e brand_string -e cpu.core_count
machdep.cpu.brand_string: Apple M1 Max
machdep.cpu.core_count: 10
I follow instructions strictly from tutorials - but why would all these happen? Is it because of my installation flaws, or because of M1 Max chip? Since my work relies heavily on local runs, local speed is very important to me. Any suggestions to possible solution, or any data points on your own device would be greatly appreciated :)