submitit INFO (2025-07-28 13:20:13,891) - Starting with JobEnvironment(job_id=525776, hostname=c137, local_rank=0(1), node=0(1), global_rank=0(1))
submitit INFO (2025-07-28 13:20:13,893) - Loading pickle: /data/horse/ws/pwinkler-mnist_tst/omniopt/runs/mnist_gpu_noall/2/single_runs/525776/525776_submitted.pkl
/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv/lib64/python3.9/site-packages/networkx/utils/backends.py:135: RuntimeWarning: networkx backend defined more than once: nx-loopback
backends.update(_get_backends("networkx.backends"))
Traceback (most recent call last):
File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/train", line 285, in main
model = SimpleMLP(
File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv/lib64/python3.9/site-packages/torch/nn/modules/module.py", line 1355, in to
return self._apply(convert)
File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv/lib64/python3.9/site-packages/torch/nn/modules/module.py", line 915, in _apply
module._apply(fn)
File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv/lib64/python3.9/site-packages/torch/nn/modules/module.py", line 915, in _apply
module._apply(fn)
File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv/lib64/python3.9/site-packages/torch/nn/modules/module.py", line 942, in _apply
param_applied = fn(param)
File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv/lib64/python3.9/site-packages/torch/nn/modules/module.py", line 1341, in convert
return t.to(
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Parameters: {"epochs": 15, "lr": 0.028901296601295475, "batch_size": 571, "hidden_size": 417, "dropout": 0.4168514609336853, "num_dense_layers": 1, "weight_decay": 0.8655822277069092, "activation": "leaky_relu", "init": "normal"}
Debug-Infos:
========
DEBUG INFOS START:
Program-Code: python3 .tests/mnist/train --epochs 15 --learning_rate 0.02890129660129547515 --batch_size 571 --hidden_size 417 --dropout 0.41685146093368530273 --activation leaky_relu --num_dense_layers 1 --init normal --weight_decay 0.86558222770690917969
pwd: /data/horse/ws/pwinkler-mnist_tst/omniopt
File: .tests/mnist/train
UID: 2054851
GID: 200270
SLURM_JOB_ID: 525776
Status-Change-Time: 1753273859.0
Size: 12359 Bytes
Permissions: -rwxr-xr-x
Owner: pwinkler
Last access: 1753701619.0
Last modification: 1753270258.0
Hostname: c137
========
DEBUG INFOS END
python3 .tests/mnist/train --epochs 15 --learning_rate 0.02890129660129547515 --batch_size 571 --hidden_size 417 --dropout 0.41685146093368530273 --activation leaky_relu --num_dense_layers 1 --init normal --weight_decay 0.86558222770690917969
stdout:
Hyperparameters
โญโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโฎ
โ Parameter โ Value โ
โโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโค
โ Device โ cuda โ
โ Epochs โ 15 โ
โ Num Dense Layers โ 1 โ
โ Batch size โ 571 โ
โ Learning rate โ 0.028901296601295475 โ
โ Hidden size โ 417 โ
โ Dropout โ 0.4168514609336853 โ
โ Optimizer โ adam โ
โ Momentum โ 0.9 โ
โ Weight Decay โ 0.8655822277069092 โ
โ Activation โ leaky_relu โ
โ Init Method โ normal โ
โ Seed โ None โ
โฐโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโฏ
Using device: cuda
An error occurred: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so
the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
stderr:
/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv/lib64/python3.9/site-packages/networkx/utils/backends.py:135: RuntimeWarning: networkx backend defined more than once: nx-loopback
backends.update(_get_backends("networkx.backends"))
Traceback (most recent call last):
File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/train", line 285, in main
model = SimpleMLP(
File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv/lib64/python3.9/site-packages/torch/nn/modules/module.py", line 1355, in to
return self._apply(convert)
File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv/lib64/python3.9/site-packages/torch/nn/modules/module.py", line 915, in _apply
module._apply(fn)
File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv/lib64/python3.9/site-packages/torch/nn/modules/module.py", line 915, in _apply
module._apply(fn)
File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv/lib64/python3.9/site-packages/torch/nn/modules/module.py", line 942, in _apply
param_applied = fn(param)
File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv/lib64/python3.9/site-packages/torch/nn/modules/module.py", line 1341, in convert
return t.to(
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Result: {'VAL_ACC': None}
Final-results: {'VAL_ACC': None}
EXIT_CODE: 1
submitit INFO (2025-07-28 13:20:28,950) - Job completed successfully
submitit INFO (2025-07-28 13:20:28,951) - Exiting after successful completion