Ml Agents release 19 training is freezing. Unity out of memory.
Hi, I'm having trouble training my 5dof robotic arm. I upgraded to release 19 branch, now I get this error and I still run out of memory:
\ml-agents\mlagents\trainers\torch\networks.py:91: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at C:\Users\builder\tkoch\workspace\pytorch\pytorch_1647970138273\work\torch\csrc\utils\tensor_new.cpp:201.)
enc.update_normalization(torch.as_tensor(vec_input))
I don't know why this error comes out now but it may be related to the laggy (10 robot arm training at 19fps in the editor)
here is the config for the trainer:
[INFO] Hyperparameters for behavior name arm:
trainer_type: ppo
hyperparameters:
batch_size: 1024
buffer_size: 10240
learning_rate: 0.0003
beta: 0.005
epsilon: 0.2
lambd: 0.95
num_epoch: 7
learning_rate_schedule: linear
beta_schedule: linear
epsilon_schedule: linear
network_settings:
normalize: True
hidden_units: 32
num_layers: 2
vis_encode_type: simple
memory: None
goal_conditioning_type: hyper
deterministic: False
reward_signals:
extrinsic:
gamma: 0.995
strength: 1.0
network_settings:
normalize: False
hidden_units: 128
num_layers: 2
vis_encode_type: simple
memory: None
goal_conditioning_type: hyper
deterministic: False
init_path: results\scenario1\arm\checkpoint.pt
keep_checkpoints: 5
checkpoint_interval: 500000
max_steps: 30000000
time_horizon: 250
summary_freq: 30000
threaded: False
self_play: None
behavioral_cloning: None
running python 3.7.13.
Anybody have a clue ?
Hello, I've been having the same problem since yesterday. I'm not sure what I did wrong, but everything was good until I started receiving that error every time I tried to train my models.
Answer by aymenou190 · Jun 04 at 06:19 PM
Hey, I have sorted out the problem: I had to reinstall PyTorch because I was using the latest version, however mlAgents only supports 1.7.1. To install it, execute the following command in command prompt: torch=1.7.1 pip3 install -f https://download.pytorch.org/whl/torch stable.html
During the installation process, I encountered another problem with Windows not accepting path characters over 260, so I simply followed this guide: https://www.howtogeek.com/266621/how-to-make-windows-10-accept-file-paths-over-260-characters/
Your answer
Follow this Question
Related Questions
assetBundle memory can not be free 0 Answers
Memory error pls help 0 Answers
Why my second NavMesh Agent can't find the NavMesh? 1 Answer
Best practice for cleaning up leaked procedural materials? 0 Answers