I am trying to implement SAC with a custom environment in Stable Baselines3 and I keep getting the error in the title. The error occurs with any off policy algorithm not just SAC.
Traceback:
File "<MY PROJECT PATH>\src\main.py", line 70, in <module>
main()
File "<MY PROJECT PATH>\src\main.py", line 66, in main
model.learn(total_timesteps=timesteps, reset_num_timesteps=False, tb_log_name=f"sac_{num_cars}_cars")
File "<MY PROJECT PATH>\venv\lib\site-packages\stable_baselines3\sac\sac.py", line 309, in learn
return super().learn(
File "<MY PROJECT PATH>\venv\lib\site-packages\stable_baselines3\common\off_policy_algorithm.py", line 375, in learn
self.train(batch_size=self.batch_size, gradient_steps=gradient_steps)
File "<MY PROJECT PATH>\venv\lib\site-packages\stable_baselines3\sac\sac.py", line 256, in train
current_q_values = self.critic(replay_data.observations, replay_data.actions)
File "<MY PROJECT PATH>\venv\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "<MY PROJECT PATH>\venv\lib\site-packages\stable_baselines3\common\policies.py", line 885, in forward
return tuple(q_net(qvalue_input) for q_net in self.q_networks)
File "<MY PROJECT PATH>\venv\lib\site-packages\stable_baselines3\common\policies.py", line 885, in <genexpr>
return tuple(q_net(qvalue_input) for q_net in self.q_networks)
File "<MY PROJECT PATH>\venv\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "<MY PROJECT PATH>\venv\lib\site-packages\torch\nn\modules\container.py", line 204, in forward
input = module(input)
File "<MY PROJECT PATH>\venv\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "<MY PROJECT PATH>\venv\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 must have the same dtype
Action and observation spaces:
self.action_space = Box(low=-1., high=1., shape=(2,), dtype=np.float)
self.observation_space = Box(
np.array(
[-np.inf] * (9 * 40) + [-np.inf] * 3 + [-np.inf] * 3 + [-np.inf] * 3
+ [0.] + [0.] + [0.] + [-1.] + [0.] * 4 + [0.] * 4 + [0.] * 4,
dtype=np.float
),
np.array(
[np.inf] * (9 * 40) + [np.inf] * 3 + [np.inf] * 3 + [np.inf] * 3
+ [np.inf] + [1.] + [1.] + [1.] + [1.] * 4 + [np.inf] * 4 + [np.inf] * 4,
dtype=np.float
),
dtype=np.float
)
Observations are returned in the step and reset methods as a numpy array of floats.
Is there something I'm missing which is causing this error? If I use one of the environments that come with gym such as pendulum it works fine which is why I think I have a problem with my custom environment.
Thanks in advance for any help and please let me know if more information is required.
np.float
should return afloat64
by default. For some reason,F.linear
seems to struggle with precision. If your program allows the use of a lower precision, a quick fix might be to replacedtype=np.float
withdtype='float32'
. – Salic