I've been trying to get into ML and I wanted to follow a course on it but it requires Tensorflow and I've been trying to get that working on my system. I have the 2021 14" 16GB Macbook Pro with the M1 Pro Chip and I am running Ventura 13.1. I have been following this article as well as digging around about getting Tensorflow working on M1 but to no avail. I managed to get tensorflow-macos installed in my environment as well as tensorflow-metal but when I try to run some sample code in Juyter, I'm getting an error that I do not understand. In Jupyter, when I run:
import tensorflow as tf print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
I get
Num GPUs Available: 1
So it does seem like I have tensorflow and metal installed, but when I try to run the rest of the code, I get:
TensorFlow version: 2.11.0
Num GPUs Available: 1
Metal device set to: Apple M1 Pro
WARNING:tensorflow:AutoGraph could not transform <function normalize_img at 0x14a4cec10> and will run it as-is.
Cause: Unable to locate the source code of <function normalize_img at 0x14a4cec10>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
2022-12-13 13:54:33.658225: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-12-13 13:54:33.658309: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
WARNING:tensorflow:AutoGraph could not transform <function normalize_img at 0x14a4cec10> and will run it as-is.
Cause: Unable to locate the source code of <function normalize_img at 0x14a4cec10>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING: AutoGraph could not transform <function normalize_img at 0x14a4cec10> and will run it as-is.
Cause: Unable to locate the source code of <function normalize_img at 0x14a4cec10>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
Epoch 1/12
2022-12-13 13:54:34.162300: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-12-13 13:54:34.163015: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-13 13:54:35.383325: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x14a345660
2022-12-13 13:54:35.383350: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x14a345660
2022-12-13 13:54:35.389028: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x14a345660
2022-12-13 13:54:35.389049: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x14a345660
2022-12-13 13:54:35.401250: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x14a345660
2022-12-13 13:54:35.401274: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x14a345660
2022-12-13 13:54:35.405004: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x14a345660
2022-12-13 13:54:35.405025: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x14a345660
---------------------------------------------------------------------------
NotFoundError Traceback (most recent call last)
File <timed exec>:45
File ~/conda/envs/mlp3/lib/python3.8/site-packages/keras/utils/traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs)
67 filtered_tb = _process_traceback_frames(e.__traceback__)
68 # To get the full stack trace, call:
69 # `tf.debugging.disable_traceback_filtering()`
---> 70 raise e.with_traceback(filtered_tb) from None
71 finally:
72 del filtered_tb
File ~/conda/envs/mlp3/lib/python3.8/site-packages/tensorflow/python/eager/execute.py:52, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
50 try:
51 ctx.ensure_initialized()
---> 52 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
53 inputs, attrs, num_outputs)
54 except core._NotOkStatusException as e:
55 if name is not None:
NotFoundError: Graph execution error:
Detected at node 'StatefulPartitionedCall_6' defined at (most recent call last):
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/ipykernel_launcher.py", line 17, in <module>
app.launch_new_instance()
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/traitlets/config/application.py", line 992, in launch_instance
app.start()
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/ipykernel/kernelapp.py", line 711, in start
self.io_loop.start()
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/tornado/platform/asyncio.py", line 215, in start
self.asyncio_loop.run_forever()
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
self._run_once()
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/asyncio/base_events.py", line 1859, in _run_once
handle._run()
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/asyncio/events.py", line 81, in _run
self._context.run(self._callback, *self._args)
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 510, in dispatch_queue
await self.process_one()
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 499, in process_one
await dispatch(*args)
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 406, in dispatch_shell
await result
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 729, in execute_request
reply_content = await reply_content
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/ipykernel/ipkernel.py", line 411, in do_execute
res = shell.run_cell(
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/ipykernel/zmqshell.py", line 531, in run_cell
return super().run_cell(*args, **kwargs)
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 2940, in run_cell
result = self._run_cell(
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 2995, in _run_cell
return runner(coro)
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner
coro.send(None)
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3194, in run_cell_async
has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3373, in run_ast_nodes
if await self.run_code(code, result, async_=asy):
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3433, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/var/folders/k4/vgd34_w913ndkfkmvgssqgjr0000gn/T/ipykernel_16072/1016625245.py", line 1, in <module>
get_ipython().run_cell_magic('time', '', 'import tensorflow as tf\nimport tensorflow_datasets as tfds\nprint("TensorFlow version:", tf.__version__)\nprint("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices(\'GPU\')))\ntf.config.list_physical_devices(\'GPU\')\n(ds_train, ds_test), ds_info = tfds.load(\n \'mnist\',\n split=[\'train\', \'test\'],\n shuffle_files=True,\n as_supervised=True,\n with_info=True,\n)\ndef normalize_img(image, label):\n """Normalizes images: `uint8` -> `float32`."""\n return tf.cast(image, tf.float32) / 255., label\nbatch_size = 128\nds_train = ds_train.map(\n normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)\nds_train = ds_train.cache()\nds_train = ds_train.shuffle(ds_info.splits[\'train\'].num_examples)\nds_train = ds_train.batch(batch_size)\nds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)\nds_test = ds_test.map(\n normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)\nds_test = ds_test.batch(batch_size)\nds_test = ds_test.cache()\nds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)\nmodel = tf.keras.models.Sequential([\n tf.keras.layers.Conv2D(32, kernel_size=(3, 3),\n activation=\'relu\'),\n tf.keras.layers.Conv2D(64, kernel_size=(3, 3),\n activation=\'relu\'),\n tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),\n# tf.keras.layers.Dropout(0.25),\n tf.keras.layers.Flatten(),\n tf.keras.layers.Dense(128, activation=\'relu\'),\n# tf.keras.layers.Dropout(0.5),\n tf.keras.layers.Dense(10, activation=\'softmax\')\n])\nmodel.compile(\n loss=\'sparse_categorical_crossentropy\',\n optimizer=tf.keras.optimizers.Adam(0.001),\n metrics=[\'accuracy\'],\n)\nmodel.fit(\n ds_train,\n epochs=12,\n validation_data=ds_test,\n)\n')
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 2417, in run_cell_magic
result = fn(*args, **kwargs)
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/IPython/core/magics/execution.py", line 1321, in time
out = eval(code_2, glob, local_ns)
File "<timed exec>", line 45, in <module>
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/keras/engine/training.py", line 1650, in fit
tmp_logs = self.train_function(iterator)
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/keras/engine/training.py", line 1249, in train_function
return step_function(self, iterator)
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/keras/engine/training.py", line 1233, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/keras/engine/training.py", line 1222, in run_step
outputs = model.train_step(data)
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/keras/engine/training.py", line 1027, in train_step
self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize
self.apply_gradients(grads_and_vars)
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients
return super().apply_gradients(grads_and_vars, name=name)
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients
iteration = self._internal_apply_gradients(grads_and_vars)
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients
return tf.__internal__.distribute.interim.maybe_merge_call(
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn
distribution.extended.update(
File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var
return self._update_step_xla(grad, var, id(self._var_key(var)))
Node: 'StatefulPartitionedCall_6'
could not find registered platform with id: 0x14a345660
[[{{node StatefulPartitionedCall_6}}]] [Op:__inference_train_function_1261]
Sorry for just dumping the entire error code but as you can see something's gone awry. It only seems to run the first Epoch and I'm not sure what's going wrong. I've followed everything in that guide as well as the instructions from tensor flow-metal. I've looked around seeminly everywhere but this is as far as I've gotten after hours of battling. I just updated my Mac today so the Xcode command line tools should be up to date. Any and all advice or helping me decipher the error code would be greatly appreciated. I just want to learn Machine Learning but I can't even follow my course without this working.
I've uninstalled and reinstalled Conda Miniforge for M1 several times. I've created and tried the steps in a blank environment. I've followed the steps listed in the guides I've linked above and went through them multiple times. I was originally getting some issues with numpy, h5py, grcio, and protobuf but after tinkering with the versions I no longer get error codes for them, so I'm not sure if that's all good but I don't see any explicit mentions. I've also ran
conda install -c conda-forge openblas
after looking at this page from StackOverflow from someone with a similar issue, but I'm still getting this error.