submitit INFO (2025-07-28 14:18:16,568) - Starting with JobEnvironment(job_id=525814, hostname=c137, local_rank=0(1), node=0(1), global_rank=0(1))
submitit INFO (2025-07-28 14:18:16,570) - Loading pickle: /data/horse/ws/pwinkler-mnist_tst/omniopt/runs/mnist_gpu_noall/5/single_runs/525814/525814_submitted.pkl
Traceback (most recent call last):
File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/train", line 295, in main
).to(args.device)
^^^^^^^^^^^^^^^
File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv_1bdd5e1e8b/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1355, in to
return self._apply(convert)
^^^^^^^^^^^^^^^^^^^^
File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv_1bdd5e1e8b/lib/python3.11/site-packages/torch/nn/modules/module.py", line 915, in _apply
module._apply(fn)
File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv_1bdd5e1e8b/lib/python3.11/site-packages/torch/nn/modules/module.py", line 915, in _apply
module._apply(fn)
File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv_1bdd5e1e8b/lib/python3.11/site-packages/torch/nn/modules/module.py", line 942, in _apply
param_applied = fn(param)
^^^^^^^^^
File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv_1bdd5e1e8b/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1341, in convert
return t.to(
^^^^^
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Parameters: {"epochs": 43, "lr": 0.025320670226514342, "batch_size": 1270, "hidden_size": 1739, "dropout": 0.32288113236427307, "num_dense_layers": 2, "weight_decay": 0.7341029644012451, "activation": "leaky_relu", "init": "normal"}
Debug-Infos:
========
DEBUG INFOS START:
Program-Code: python3 .tests/mnist/train --epochs 43 --learning_rate 0.02532067022651434199 --batch_size 1270 --hidden_size 1739 --dropout 0.32288113236427307129 --activation leaky_relu --num_dense_layers 2 --init normal --weight_decay 0.73410296440124511719
pwd: /data/horse/ws/pwinkler-mnist_tst/omniopt
File: .tests/mnist/train
UID: 2054851
GID: 200270
SLURM_JOB_ID: 525814
Status-Change-Time: 1753703502.0
Size: 12452 Bytes
Permissions: -rwxr-xr-x
Owner: pwinkler
Last access: 1753705105.0
Last modification: 1753703502.0
Hostname: c137
========
DEBUG INFOS END
python3 .tests/mnist/train --epochs 43 --learning_rate 0.02532067022651434199 --batch_size 1270 --hidden_size 1739 --dropout 0.32288113236427307129 --activation leaky_relu --num_dense_layers 2 --init normal --weight_decay 0.73410296440124511719
stdout:
Hyperparameters
โญโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโฎ
โ Parameter โ Value โ
โโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโค
โ Device โ cuda โ
โ Epochs โ 43 โ
โ Num Dense Layers โ 2 โ
โ Batch size โ 1270 โ
โ Learning rate โ 0.025320670226514342 โ
โ Hidden size โ 1739 โ
โ Dropout โ 0.32288113236427307 โ
โ Optimizer โ adam โ
โ Momentum โ 0.9 โ
โ Weight Decay โ 0.7341029644012451 โ
โ Activation โ leaky_relu โ
โ Init Method โ normal โ
โ Seed โ None โ
โฐโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโฏ
Using device: cuda
An error occurred: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so
the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
stderr:
Traceback (most recent call last):
File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/train", line 295, in main
).to(args.device)
^^^^^^^^^^^^^^^
File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv_1bdd5e1e8b/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1355, in to
return self._apply(convert)
^^^^^^^^^^^^^^^^^^^^
File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv_1bdd5e1e8b/lib/python3.11/site-packages/torch/nn/modules/module.py", line 915, in _apply
module._apply(fn)
File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv_1bdd5e1e8b/lib/python3.11/site-packages/torch/nn/modules/module.py", line 915, in _apply
module._apply(fn)
File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv_1bdd5e1e8b/lib/python3.11/site-packages/torch/nn/modules/module.py", line 942, in _apply
param_applied = fn(param)
^^^^^^^^^
File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv_1bdd5e1e8b/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1341, in convert
return t.to(
^^^^^
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Result: {'VAL_ACC': None}
Final-results: {'VAL_ACC': None}
EXIT_CODE: 1
submitit INFO (2025-07-28 14:18:37,144) - Job completed successfully
submitit INFO (2025-07-28 14:18:37,146) - Exiting after successful completion