OmniOpt2

Experiment overview

Setting	Value
Model for non-random steps	BOTORCH_MODULAR
Max. nr. evaluations	500
Number random steps	20
Nr. of workers (parameter)	20
Main process memory (GB)	8
Worker memory (GB)	10

Job Summary per Generation Node

Generation Node	Total	FAILED	RUNNING
SOBOL	5	3	2

Experiment parameters

Name	Type	Lower bound	Upper bound	Values	Type	Log Scale?
epochs	range	10	200		int	No
lr	range	1e-05	0.1		float	No
batch_size	range	8	2048		int	No
hidden_size	range	8	2048		int	No
dropout	range	0	0.5		float	No
activation	fixed			leaky_relu
num_dense_layers	range	1	4		int	No
init	fixed			normal
weight_decay	range	0	1		float	No

Number of evaluations

Failed	Succeeded	Running	Total
3	0	2	5

Result names and types

name	min/max
VAL_ACC	max

Last progressbar status

2025-07-31 15:49:22: Sobol, failed: 3 ('VAL_ACC: <FLOAT>' not found), completed 1∑1 (0%/20), requested 1 jobs, got 1, 29.45 s/job

Git-Version

Commit: f9547a580b93e0983ebff52a5b7750569294ad57

trial_index,submit_time,queue_time,start_time,end_time,run_time,program_string,VAL_ACC,exit_code,signal,hostname,OO_Info_SLURM_JOB_ID,arm_name,trial_status,generation_node,epochs,lr,batch_size,hidden_size,dropout,num_dense_layers,weight_decay,activation,init
0,1753969574,30,1753969604,1753969640,36,python3 .tests/mnist/train --epochs 102 --learning_rate 0.08048484589278698254 --batch_size 232 --hidden_size 735 --dropout 0.04738559573888778687 --activation leaky_relu --num_dense_layers 3 --init normal --weight_decay 0.03413222730159759521,,1,,c137,530989,0_0,FAILED,SOBOL,102,0.080484845892786982535227480184,232,735,0.047385595738887786865234375,3,0.03413222730159759521484375,leaky_relu,normal
1,1753969603,53,1753969656,1753969692,36,python3 .tests/mnist/train --epochs 179 --learning_rate 0.04291434645502828493 --batch_size 1854 --hidden_size 1896 --dropout 0.36651515169069170952 --activation leaky_relu --num_dense_layers 1 --init normal --weight_decay 0.7658537672832608223,,1,,c131,530990,1_0,FAILED,SOBOL,179,0.042914346455028284932353699332,1854,1896,0.366515151690691709518432617188,1,0.765853767283260822296142578125,leaky_relu,normal
2,1753969634,30,1753969664,1753969694,30,python3 .tests/mnist/train --epochs 127 --learning_rate 0.0669408453206624815 --batch_size 874 --hidden_size 481 --dropout 0.42825613263994455338 --activation leaky_relu --num_dense_layers 2 --init normal --weight_decay 0.25958294328302145004,,1,,c137,530991,2_0,FAILED,SOBOL,127,0.066940845320662481499063289903,874,481,0.428256132639944553375244140625,2,0.259582943283021450042724609375,leaky_relu,normal
3,,,,,,,,,,,,3_0,RUNNING,SOBOL,15,0.003351246024295687856581205111,1213,1131,0.235605723690241575241088867188,4,0.5577103309333324432373046875,leaky_relu,normal
4,,,,,,,,,,,,4_0,RUNNING,SOBOL,44,0.058332434732653211384434399633,1589,1326,0.16976323537528514862060546875,3,0.471416492946445941925048828125,leaky_relu,normal

To cancel, press CTRL c, then run 'scancel 530984'
⠋ Importing logging...
⠋ Importing warnings...
⠋ Importing argparse...
⠋ Importing datetime...
⠋ Importing dataclass...
⠋ Importing hashlib...
⠋ Importing socket...
⠋ Importing stat...
⠋ Importing pwd...
⠋ Importing signal...
⠋ Importing base64...
⠋ Importing json...
⠋ Importing yaml...
⠋ Importing toml...
⠋ Importing csv...
⠋ Importing ast...
⠋ Importing rich.table...
⠋ Importing rich print...
⠋ Importing rich.pretty...
⠋ Importing rich.prompt...
⠋ Importing types.FunctionType...
⠋ Importing typing...
⠋ Importing ThreadPoolExecutor...
⠙ Importing submitit.LocalExecutor...
⠋ Importing submitit.Job...
⠋ Importing importlib.util...
⠋ Importing inspect...
⠋ Importing platform...
⠋ Importing inspect frame info...
⠋ Importing pathlib.Path...
⠋ Importing uuid...
⠋ Importing traceback...
⠋ Importing cowsay...
⠋ Importing psutil...
⠋ Importing shutil...
⠋ Importing itertools.combinations...
⠋ Importing os.listdir...
⠋ Importing os.path...
⠋ Importing PIL.Image...
⠋ Importing sixel...
⠋ Importing subprocess...
⠋ Importing tqdm...
⠙ Importing beartype...
⠋ Importing statistics...
⠋ Trying to import pyfiglet...
⠦ Importing helpers...
⠋ Parsing arguments...
⠧ Importing torch...
⠋ Importing numpy...
⠋ Importing collections...
⠧ Importing ax...
⠋ Importing ax.core.generator_run...
⠋ Importing Cont_X_trans and Y_trans from ax.modelbridge.registry...
⠋ Importing ax.core.arm...
⠋ Importing ax.core.objective...
⠋ Importing ax.core.Metric...
⠋ Importing ax.exceptions.core...
⠋ Importing ax.exceptions.generation_strategy...
⠋ Importing CORE_DECODER_REGISTRY...
⠋ Trying ax.generation_strategy.generation_node...
⠋ Importing GenerationStep, GenerationStrategy from generation_strategy...
⠋ Importing GenerationNode from generation_node...
⠋ Importing ExternalGenerationNode...
⠋ Importing MaxTrials...
⠋ Importing GeneratorSpec...
⠋ Importing Models from ax.modelbridge.registry...
⠋ Importing get_pending_observation_features...
⠋ Importing load_experiment...
⠋ Importing save_experiment...
⠋ Importing save_experiment_to_db...
⠋ Importing TrialStatus...
⠋ Importing Data...
⠋ Importing Experiment...
⠋ Importing parameter types...
⠋ Importing TParameterization...
⠋ Importing pandas...
⠋ Importing AxClient and ObjectiveProperties...
⠋ Importing RandomForestRegressor...
⠋ Importing botorch...
⠋ Importing submitit...
⠋ Importing ax logger...
⠋ Importing SQL-Storage-Stuff...
Run-UUID: ec8e236d-54e4-4390-95ce-7c2cfb5e2b41
  _________________________________________________
 /                                                 \
| OmniOpt2 - Tuning so deep, even Lovecraft would b |
| e scared!                                         |
 \                                                 /
  =================================================
                                                 \
                                                  \
                                                    ^__^
                                                    (oo)\_______
                                                    (__)\       )\/\
                                                        ||----w |
                                                        ||     ||
⠋ Writing worker creation log...
omniopt --partition=alpha --experiment_name=mnist_gpu_noall --mem_gb=10 --time=2880 --worker_timeout=120 --max_eval=500 --num_parallel_jobs=20 --gpus=1 --num_random_steps=20 --follow --live_share --send_anonymized_usage_stats --result_names VAL_ACC=max --run_program=cHl0aG9uMyAudGVzdHMvbW5pc3QvdHJhaW4gLS1lcG9jaHMgJWVwb2NocyAtLWxlYXJuaW5nX3JhdGUgJWxyIC0tYmF0Y2hfc2l6ZSAlYmF0Y2hfc2l6ZSAtLWhpZGRlbl9zaXplICVoaWRkZW5fc2l6ZSAtLWRyb3BvdXQgJWRyb3BvdXQgLS1hY3RpdmF0aW9uICVhY3RpdmF0aW9uIC0tbnVtX2RlbnNlX2xheWVycyAlbnVtX2RlbnNlX2xheWVycyAtLWluaXQgJWluaXQgLS13ZWlnaHRfZGVjYXkgJXdlaWdodF9kZWNheQ== --cpus_per_task=1 --nodes_per_job=1 --revert_to_random_when_seemingly_exhausted --model=BOTORCH_MODULAR --n_estimators_randomforest=100 --run_mode=local --occ_type=euclid --main_process_gb=8 --max_nr_of_zero_results=50 --slurm_signal_delay_s=0 --max_failed_jobs=0 --max_attempts_for_generation=20 --num_restarts=20 --raw_samples=1024 --max_abandoned_retrial=20 --max_num_of_parallel_sruns=16 --parameter epochs range 10 200 int false --parameter lr range 0.00001 0.1 float false --parameter batch_size range 8 2048 int false --parameter hidden_size range 8 2048 int false --parameter dropout range 0 0.5 float false --parameter activation fixed leaky_relu --parameter num_dense_layers range 1 4 int false --parameter init fixed normal --parameter weight_decay range 0 1 float false --ui_url aHR0cHM6Ly9pbWFnZXNlZy5zY2Fkcy5kZS9vbW5pYXgvZ3VpP3BhcnRpdGlvbj1hbHBoYSZleHBlcmltZW50X25hbWU9bW5pc3RfZ3B1X25vYWxsJnJlc2VydmF0aW9uPSZhY2NvdW50PSZtZW1fZ2I9MTAmdGltZT0yODgwJndvcmtlcl90aW1lb3V0PTEyMCZtYXhfZXZhbD01MDAmbnVtX3BhcmFsbGVsX2pvYnM9MjAmZ3B1cz0xJm51bV9yYW5kb21fc3RlcHM9MjAmZm9sbG93PTEmbGl2ZV9zaGFyZT0xJnNlbmRfYW5vbnltaXplZF91c2FnZV9zdGF0cz0xJmNvbnN0cmFpbnRzPSZyZXN1bHRfbmFtZXM9VkFMX0FDQyUzRG1heCZydW5fcHJvZ3JhbT1weXRob24zJTIwLnRlc3RzJTJGbW5pc3QlMkZ0cmFpbiUyMC0tZXBvY2hzJTIwJTI1ZXBvY2hzJTIwLS1sZWFybmluZ19yYXRlJTIwJTI1bHIlMjAtLWJhdGNoX3NpemUlMjAlMjViYXRjaF9zaXplJTIwLS1oaWRkZW5fc2l6ZSUyMCUyNWhpZGRlbl9zaXplJTIwLS1kcm9wb3V0JTIwJTI1ZHJvcG91dCUyMC0tYWN0aXZhdGlvbiUyMCUyNWFjdGl2YXRpb24lMjAtLW51bV9kZW5zZV9sYXllcnMlMjAlMjVudW1fZGVuc2VfbGF5ZXJzJTIwLS1pbml0JTIwJTI1aW5pdCUyMC0td2VpZ2h0X2RlY2F5JTIwJTI1d2VpZ2h0X2RlY2F5JmNwdXNfcGVyX3Rhc2s9MSZub2Rlc19wZXJfam9iPTEmc2VlZD0mZHJ5cnVuPTAmZGVidWc9MCZyZXZlcnRfdG9fcmFuZG9tX3doZW5fc2VlbWluZ2x5X2V4aGF1c3RlZD0xJmdyaWRzZWFyY2g9MCZtb2RlbD1CT1RPUkNIX01PRFVMQVImZXh0ZXJuYWxfZ2VuZXJhdG9yPSZuX2VzdGltYXRvcnNfcmFuZG9tZm9yZXN0PTEwMCZpbnN0YWxsYXRpb25fbWV0aG9kPWNsb25lJnJ1bl9tb2RlPWxvY2FsJmRpc2FibGVfdHFkbT0wJnZlcmJvc2VfdHFkbT0wJmZvcmNlX2xvY2FsX2V4ZWN1dGlvbj0wJmF1dG9fZXhjbHVkZV9kZWZlY3RpdmVfaG9zdHM9MCZzaG93X3NpeGVsX2dlbmVyYWw9MCZzaG93X3NpeGVsX3RyaWFsX2luZGV4X3Jlc3VsdD0wJnNob3dfc2l4ZWxfc2NhdHRlcj0wJnNob3dfd29ya2VyX3BlcmNlbnRhZ2VfdGFibGVfYXRfZW5kPTAmb2NjPTAmb2NjX3R5cGU9ZXVjbGlkJm5vX3NsZWVwPTAmc2x1cm1fdXNlX3NydW49MCZ2ZXJib3NlX2JyZWFrX3J1bl9zZWFyY2hfdGFibGU9MCZhYmJyZXZpYXRlX2pvYl9uYW1lcz0wJm1haW5fcHJvY2Vzc19nYj04Jm1heF9ucl9vZl96ZXJvX3Jlc3VsdHM9NTAmc2x1cm1fc2lnbmFsX2RlbGF5X3M9MCZtYXhfZmFpbGVkX2pvYnM9MCZleGNsdWRlPSZ1c2VybmFtZT0mZ2VuZXJhdGlvbl9zdHJhdGVneT0mcm9vdF92ZW52X2Rpcj0md29ya2Rpcj0mZG9udF9qaXRfY29tcGlsZT0wJmZpdF9vdXRfb2ZfZGVzaWduPTAmcmVmaXRfb25fY3Y9MCZzaG93X2dlbmVyYXRlX3RpbWVfdGFibGU9MCZkb250X3dhcm1fc3RhcnRfcmVmaXR0aW5nPTAmbWF4X2F0dGVtcHRzX2Zvcl9nZW5lcmF0aW9uPTIwJm51bV9yZXN0YXJ0cz0yMCZyYXdfc2FtcGxlcz0xMDI0Jm1heF9hYmFuZG9uZWRfcmV0cmlhbD0yMCZtYXhfbnVtX29mX3BhcmFsbGVsX3NydW5zPTE2JmZvcmNlX2Nob2ljZV9mb3JfcmFuZ2VzPTAmbm9fdHJhbnNmb3JtX2lucHV0cz0wJmZpdF9hYmFuZG9uZWQ9MCZub19ub3JtYWxpemVfeT0wJnZlcmJvc2U9MCZnZW5lcmF0ZV9hbGxfam9ic19hdF9vbmNlPTAmZmxhbWVfZ3JhcGg9MCZjaGVja291dF90b19sYXRlc3RfdGVzdGVkX3ZlcnNpb249MCZwYXJhbWV0ZXJfMF9uYW1lPWVwb2NocyZwYXJhbWV0ZXJfMF90eXBlPXJhbmdlJnBhcmFtZXRlcl8wX21pbj0xMCZwYXJhbWV0ZXJfMF9tYXg9MjAwJnBhcmFtZXRlcl8wX251bWJlcl90eXBlPWludCZwYXJhbWV0ZXJfMF9sb2dfc2NhbGU9ZmFsc2UmcGFyYW1ldGVyXzFfbmFtZT1sciZwYXJhbWV0ZXJfMV90eXBlPXJhbmdlJnBhcmFtZXRlcl8xX21pbj0wLjAwMDAxJnBhcmFtZXRlcl8xX21heD0wLjEmcGFyYW1ldGVyXzFfbnVtYmVyX3R5cGU9ZmxvYXQmcGFyYW1ldGVyXzFfbG9nX3NjYWxlPWZhbHNlJnBhcmFtZXRlcl8yX25hbWU9YmF0Y2hfc2l6ZSZwYXJhbWV0ZXJfMl90eXBlPXJhbmdlJnBhcmFtZXRlcl8yX21pbj04JnBhcmFtZXRlcl8yX21heD0yMDQ4JnBhcmFtZXRlcl8yX251bWJlcl90eXBlPWludCZwYXJhbWV0ZXJfMl9sb2dfc2NhbGU9ZmFsc2UmcGFyYW1ldGVyXzNfbmFtZT1oaWRkZW5fc2l6ZSZwYXJhbWV0ZXJfM190eXBlPXJhbmdlJnBhcmFtZXRlcl8zX21pbj04JnBhcmFtZXRlcl8zX21heD0yMDQ4JnBhcmFtZXRlcl8zX251bWJlcl90eXBlPWludCZwYXJhbWV0ZXJfM19sb2dfc2NhbGU9ZmFsc2UmcGFyYW1ldGVyXzRfbmFtZT1kcm9wb3V0JnBhcmFtZXRlcl80X3R5cGU9cmFuZ2UmcGFyYW1ldGVyXzRfbWluPTAmcGFyYW1ldGVyXzRfbWF4PTAuNSZwYXJhbWV0ZXJfNF9udW1iZXJfdHlwZT1mbG9hdCZwYXJhbWV0ZXJfNF9sb2dfc2NhbGU9ZmFsc2UmcGFyYW1ldGVyXzVfbmFtZT1hY3RpdmF0aW9uJnBhcmFtZXRlcl81X3R5cGU9Zml4ZWQmcGFyYW1ldGVyXzVfdmFsdWU9bGVha3lfcmVsdSZwYXJhbWV0ZXJfNl9uYW1lPW51bV9kZW5zZV9sYXllcnMmcGFyYW1ldGVyXzZfdHlwZT1yYW5nZSZwYXJhbWV0ZXJfNl9taW49MSZwYXJhbWV0ZXJfNl9tYXg9NCZwYXJhbWV0ZXJfNl9udW1iZXJfdHlwZT1pbnQmcGFyYW1ldGVyXzZfbG9nX3NjYWxlPWZhbHNlJnBhcmFtZXRlcl83X25hbWU9aW5pdCZwYXJhbWV0ZXJfN190eXBlPWZpeGVkJnBhcmFtZXRlcl83X3ZhbHVlPW5vcm1hbCZwYXJhbWV0ZXJfOF9uYW1lPXdlaWdodF9kZWNheSZwYXJhbWV0ZXJfOF90eXBlPXJhbmdlJnBhcmFtZXRlcl84X21pbj0wJnBhcmFtZXRlcl84X21heD0xJnBhcmFtZXRlcl84X251bWJlcl90eXBlPWZsb2F0JnBhcmFtZXRlcl84X2xvZ19zY2FsZT1mYWxzZSZwYXJ0aXRpb249YWxwaGEmbnVtX3BhcmFtZXRlcnM9OQ==
⠋ Disabling logging...
⠋ Setting run folder...
⠋ Creating folder /data/cat/ws/pwinkler-mnist_tst/omniopt/runs/mnist_gpu_noall/0...
⠋ Writing revert_to_random_when_seemingly_exhausted file ...
⠋ Writing username state file...
⠋ Writing result names file...
⠋ Writing result min/max file...
⠋ Saving state files...
Run-folder: /data/cat/ws/pwinkler-mnist_tst/omniopt/runs/mnist_gpu_noall/0
⠋ Printing run info...
⠋ Initializing NVIDIA-Logs...
⠋ Writing ui_url file if it is present...
⠋ Writing live_share file if it is present...
⠋ Writing job_start_time file...
⠹ Writing git info file...
⠋ Checking max_eval...
⠋ Calculating number of steps...
⠋ Adding excluded nodes...
⠋ Handling random steps...
⠋ Initializing ax_client...
[WARNING 07-31 15:45:26] ax.service.ax_client: Selecting a GenerationStrategy when using BatchTrials is in beta. Double check the recommended strategy matches your expectations.
⠋ Setting orchestrator...
You have 1 CPUs available for the main process. Using CUDA device NVIDIA H100. Generation strategy: SOBOL for 20 steps and then BOTORCH_MODULAR for 480 steps.
Run-Program: python3 .tests/mnist/train --epochs %epochs --learning_rate %lr --batch_size %batch_size --hidden_size %hidden_size --dropout %dropout --activation %activation --num_dense_layers %num_dense_layers --init %init --weight_decay %weight_decay
                                  Experiment parameters                                   
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━┓
┃ Name             ┃ Type  ┃ Lower bound ┃ Upper bound ┃ Values     ┃ Type  ┃ Log Scale? ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━┩
│ epochs           │ range │ 10          │ 200         │            │ int   │ No         │
│ lr               │ range │ 1e-05       │ 0.1         │            │ float │ No         │
│ batch_size       │ range │ 8           │ 2048        │            │ int   │ No         │
│ hidden_size      │ range │ 8           │ 2048        │            │ int   │ No         │
│ dropout          │ range │ 0           │ 0.5         │            │ float │ No         │
│ activation       │ fixed │             │             │ leaky_relu │       │            │
│ num_dense_layers │ range │ 1           │ 4           │            │ int   │ No         │
│ init             │ fixed │             │             │ normal     │       │            │
│ weight_decay     │ range │ 0           │ 1           │            │ float │ No         │
└──────────────────┴───────┴─────────────┴─────────────┴────────────┴───────┴────────────┘
        Result-Names         
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Result-Name ┃ Min or max? ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ VAL_ACC     │         max │
└─────────────┴─────────────┘
See https://imageseg.scads.de/omniax/share?user_id=pwinkler&experiment_name=mnist_gpu_noall&run_nr=7 for live-results.

█▀▀▀▀▀█ ▀▄▄▀▀▄▀  ▀█▄▄▄█   ▀ ▀ █▀▀▀▀▀█
█ ███ █   ▀ ██▄█▄▄ ▄██▀▄██▀█▀ █ ███ █
█ ▀▀▀ █  ▀▄▀▄▄█ ▀▄▀▀▄▄▄▄ ▄  ▀ █ ▀▀▀ █
▀▀▀▀▀▀▀ █ █▄▀▄█ ▀ ▀ ▀ █▄▀▄▀ ▀ ▀▀▀▀▀▀▀
▀█▄██ ▀▄▄█▀▀▄ ▀█▀▀▄ ▀▄▄▄ ▄██▀ ▀▄▄  ▄▀
▄ ▄▄██▀  ▀▄█▄▀▀█▀█  █▀███ ▀▀█▀▀▀ ▀█▀▀
▀▀▀ ▄▄▀▄▀▀▀▀▀▄ ▄█▄▀▀   █ ▄▄▄█ ▀ ▄█▄█▀
▄█ █ ▄▀▀▄▀▀█▀▀▀█▀  █ ▀   ▀ █▀▄██▀ ▄▀█
▀▀  ▀ ▀▀█ ▀▄▄█ ▄  ▄██ ▀ ▄▀▄▄ ▄▀  █▀▀▀
 ▄▄█▀▀▀▀  ▄▀▄▄▄█▄▀ █  ▀▀▀▄█▀▄ █ ▄▄  █
▄▀█  ▄▀█▀█▄ ▀█▀  █▄█  █▄▀ █▄▀█▀██▀ █▀
▀██▀▀█▀█▄ █ ▀▀▄▀▀▄▄ ▄ ██▄ ▄█▀▀▀█▀▄▄▄█
▄ ▄█▀█▀▄█▄▀ ▄ ▄█▄▄▄ ▀  ▄▀ ▀▄█▀▀▀▄█▄▄▀
█▀ █ ▄▀██▀▀ ▄ █▀█ ▀▀█▀ ▄  ▀█ ▀  █▄█▀█
▀ ▀▀▀▀▀ ▄▀██▀▀▄█▀▄ ██▄▀▄ ▀▄ █▀▀▀█▀██ 
█▀▀▀▀▀█  █▀▄▀▄██▄ ▀▀█▀▀███  █ ▀ █▄▀ █
█ ███ █ ███▀█▄ █▀▀ █▀  ▄  ▀█████▀█▄ ▀
█ ▀▀▀ █ ▄▀▄▀  ▄█▀█▄ ▄▀▄█ ▀█  ▀▄█▀▄█▄█
▀▀▀▀▀▀▀ ▀ ▀▀  ▀  ▀ ▀▀  ▀▀   ▀▀▀ ▀▀  ▀
Sobol, failed: 3 ('VAL_ACC: ' not found), running 1∑1 (5%/20), getting new HP set                                :   0%|░░░░░░░░░░| 0/500 [03:21failed: 3 ('VAL_ACC: ' not found), completed 1∑1 (0%/20), requested 1 jobs, got 1, 29.45 s/job            :   0%|░░░░░░░░░░| 0/500 [03:51

2025-07-31 15:45:31: SOBOL, Started OmniOpt2 run...
2025-07-31 15:45:49: Sobol, getting new HP set     
2025-07-31 15:45:59: Sobol, requested 1 jobs, got 1, 10.49 s/job
2025-07-31 15:46:04: Sobol, eval #1/1 start                     
2025-07-31 15:46:09: Sobol, starting new job                    
2025-07-31 15:46:15: Sobol, unknown 1∑1 (5%/20), started new job
2025-07-31 15:46:20: Sobol, pending 1∑1 (5%/20), getting new HP set
2025-07-31 15:46:29: Sobol, pending 1∑1 (5%/20), requested 1 jobs, got 1, 9.47 s/job
2025-07-31 15:46:34: Sobol, pending 1∑1 (5%/20), eval #1/1 start                    
2025-07-31 15:46:38: Sobol, pending 1∑1 (5%/20), starting new job                   
2025-07-31 15:46:44: Sobol, running/unknown 1/1∑2 (10%/20), started new job         
2025-07-31 15:46:49: Sobol, running/pending 1/1∑2 (10%/20), getting new HP set      
2025-07-31 15:46:58: Sobol, running/pending 1/1∑2 (10%/20), requested 1 jobs, got 1, 9.50 s/job
2025-07-31 15:47:03: Sobol, running/pending 1/1∑2 (10%/20), eval #1/1 start                    
2025-07-31 15:47:10: Sobol, running/pending 1/1∑2 (10%/20), starting new job                   
2025-07-31 15:47:15: Sobol, running/unknown 2/1∑3 (15%/20), started new job                    
2025-07-31 15:47:38: Sobol, completed/running/pending 1/1/1∑3 (10%/20), getting new HP set     
2025-07-31 15:47:59: Sobol, completed/running/pending 1/1/1∑3 (10%/20), requested 1 jobs, got 1, 38.44 s/job
2025-07-31 15:48:04: Sobol, completed/running 1/2∑3 (10%/20), eval #1/1 start                               
2025-07-31 15:48:09: Sobol, completed/running 1/2∑3 (10%/20), starting new job                              
2025-07-31 15:48:16: Sobol, completed/unknown 3/1∑4 (5%/20), started new job                                
2025-07-31 15:48:26: Sobol, completed/pending 3/1∑4 (5%/20), job_failed                                     
2025-07-31 15:48:26: Sobol, completed/pending 3/1∑4 (5%/20), job_failed                                     
2025-07-31 15:48:26: Sobol, completed/pending 3/1∑4 (5%/20), job_failed                                     
2025-07-31 15:48:46: Sobol, failed: 3 ('VAL_ACC: <FLOAT>' not found), running 1∑1 (5%/20), finishing jobs (_get_next_trials), finished 3 jobs
2025-07-31 15:48:53: Sobol, failed: 3 ('VAL_ACC: <FLOAT>' not found), running 1∑1 (5%/20), getting new HP set                                
2025-07-31 15:49:22: Sobol, failed: 3 ('VAL_ACC: <FLOAT>' not found), completed 1∑1 (0%/20), requested 1 jobs, got 1, 29.45 s/job

Arguments Overview

Key	Value
config_yaml	None
config_toml	None
config_json	None
num_random_steps	20
max_eval	500
run_program	[['cHl0aG9uMyAudGVzdHMvbW5pc3QvdHJhaW4gLS1lcG9jaHMgJWVwb2NocyAtLWxlYXJuaW5nX3JhdGUgJWxyIC0tYmF0Y2hfc2l6ZSAlYmF0Y2hfc2l6ZSAtLWhpZGRlbl9zaXplICVoaWRkZW5f…
experiment_name	mnist_gpu_noall
mem_gb	10
parameter	[['epochs', 'range', '10', '200', 'int', 'false'], ['lr', 'range', '0.00001', '0.1', 'float', 'false'], ['batch_size', 'range', '8', '2048', 'int',
	'false'], ['hidden_size', 'range', '8', '2048', 'int', 'false'], ['dropout', 'range', '0', '0.5', 'float', 'false'], ['activation', 'fixed',
	'leaky_relu'], ['num_dense_layers', 'range', '1', '4', 'int', 'false'], ['init', 'fixed', 'normal'], ['weight_decay', 'range', '0', '1', 'float',
	'false']]
continue_previous_job	None
experiment_constraints	None
run_dir	runs
seed	None
verbose_tqdm	False
model	BOTORCH_MODULAR
gridsearch	False
occ	False
show_sixel_scatter	False
show_sixel_general	False
show_sixel_trial_index_result	False
follow	True
send_anonymized_usage_stats	True
ui_url	aHR0cHM6Ly9pbWFnZXNlZy5zY2Fkcy5kZS9vbW5pYXgvZ3VpP3BhcnRpdGlvbj1hbHBoYSZleHBlcmltZW50X25hbWU9bW5pc3RfZ3B1X25vYWxsJnJlc2VydmF0aW9uPSZhY2NvdW50PSZtZW1fZ2I…
root_venv_dir	/home/pwinkler
exclude	None
main_process_gb	8
max_nr_of_zero_results	50
abbreviate_job_names	False
orchestrator_file	None
checkout_to_latest_tested_version	False
live_share	True
disable_tqdm	False
disable_previous_job_constraint	False
workdir
occ_type	euclid
result_names	['VAL_ACC=max']
minkowski_p	2
signed_weighted_euclidean_weights
generation_strategy	None
generate_all_jobs_at_once	False
revert_to_random_when_seemingly_exhausted	True
load_data_from_existing_jobs	[]
n_estimators_randomforest	100
max_attempts_for_generation	20
external_generator	None
username	None
max_failed_jobs	0
num_cpus_main_job	None
calculate_pareto_front_of_job	[]
show_generate_time_table	False
force_choice_for_ranges	False
max_abandoned_retrial	20
share_password	None
dryrun	False
db_url	None
run_program_once	None
dont_warm_start_refitting	False
refit_on_cv	False
fit_out_of_design	False
fit_abandoned	False
dont_jit_compile	False
num_restarts	20
raw_samples	1024
max_num_of_parallel_sruns	16
no_transform_inputs	False
no_normalize_y	False
transforms	[]
num_parallel_jobs	20
worker_timeout	120
slurm_use_srun	False
time	2880
partition	alpha
reservation	None
force_local_execution	False
slurm_signal_delay_s	0
nodes_per_job	1
cpus_per_task	1
account	None
gpus	1
run_mode	local
verbose	False
verbose_break_run_search_table	False
debug	False
flame_graph	False
no_sleep	False
tests	False
show_worker_percentage_table_at_end	False
auto_exclude_defective_hosts	False
run_tests_that_fail_on_taurus	False
raise_in_eval	False
show_ram_every_n_seconds	0
show_generation_and_submission_sixel	False
just_return_defaults	False
prettyprint	False

1753969531.7190268,20,0,0
1753969535.7268255,20,0,0
1753969535.8578537,20,0,0

timestamp,ram_usage_mb,cpu_usage_percent
1753969527,709.98046875,8.1
1753969531,710.98046875,9.7
1753969535,710.98046875,6.8
1753969535,710.98046875,9.1
1753969535,710.98046875,7.1
1753969535,711.48046875,16.5
1753969535,711.48046875,38.5

submitit INFO (2025-07-31 15:46:39,691) - Starting with JobEnvironment(job_id=530989, hostname=c137, local_rank=0(1), node=0(1), global_rank=0(1))
submitit INFO (2025-07-31 15:46:39,691) - Loading pickle: /data/cat/ws/pwinkler-mnist_tst/omniopt/runs/mnist_gpu_noall/0/single_runs/530989/530989_submitted.pkl
Traceback (most recent call last):
  File "/data/cat/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/train", line 301, in main
    ).to(args.device)
      ^^^^^^^^^^^^^^^
  File "/data/cat/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv_1bdd5e1e8b/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1355, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "/data/cat/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv_1bdd5e1e8b/lib/python3.11/site-packages/torch/nn/modules/module.py", line 915, in _apply
    module._apply(fn)
  File "/data/cat/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv_1bdd5e1e8b/lib/python3.11/site-packages/torch/nn/modules/module.py", line 915, in _apply
    module._apply(fn)
  File "/data/cat/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv_1bdd5e1e8b/lib/python3.11/site-packages/torch/nn/modules/module.py", line 942, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "/data/cat/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv_1bdd5e1e8b/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1341, in convert
    return t.to(
           ^^^^^
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


Parameters: {"epochs": 102, "lr": 0.08048484589278698, "batch_size": 232, "hidden_size": 735, "dropout": 0.04738559573888779, "num_dense_layers": 3, "weight_decay": 0.034132227301597595, "activation": "leaky_relu", "init": "normal"}
Debug-Infos: 
========
DEBUG INFOS START:
Program-Code: python3 .tests/mnist/train --epochs 102 --learning_rate 0.08048484589278698254 --batch_size 232 --hidden_size 735 --dropout 0.04738559573888778687 --activation leaky_relu --num_dense_layers 3 --init normal --weight_decay 0.03413222730159759521
pwd: /data/cat/ws/pwinkler-mnist_tst/omniopt
File: .tests/mnist/train
UID: 2054851
GID: 200270
SLURM_JOB_ID: 530989
Status-Change-Time: 1753967537.933081
Size: 12760 Bytes
Permissions: -rwxr-xr-x
Owner: pwinkler
Last access: 1753968020.9583523
Last modification: 1753967537.933081
Hostname: c137
========
DEBUG INFOS END

python3 .tests/mnist/train --epochs 102 --learning_rate 0.08048484589278698254 --batch_size 232 --hidden_size 735 --dropout 0.04738559573888778687 --activation leaky_relu --num_dense_layers 3 --init normal --weight_decay 0.03413222730159759521
stdout:
 Available GPU memory: 0.00 MB reserved
Free GPU memory: 0.00 MB allocated
Max GPU memory allocated: 0.00 MB
              Hyperparameters              
╭──────────────────┬──────────────────────╮
│ Parameter        │ Value                │
├──────────────────┼──────────────────────┤
│ Device           │ cuda                 │
│ Epochs           │ 102                  │
│ Num Dense Layers │ 3                    │
│ Batch size       │ 232                  │
│ Learning rate    │ 0.08048484589278698  │
│ Hidden size      │ 735                  │
│ Dropout          │ 0.04738559573888779  │
│ Optimizer        │ adam                 │
│ Momentum         │ 0.9                  │
│ Weight Decay     │ 0.034132227301597595 │
│ Activation       │ leaky_relu           │
│ Init Method      │ normal               │
│ Seed             │ None                 │
╰──────────────────┴──────────────────────╯
Using device: cuda
An error occurred: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so 
the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


stderr:
 Traceback (most recent call last):
  File "/data/cat/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/train", line 301, in main
    ).to(args.device)
      ^^^^^^^^^^^^^^^
  File "/data/cat/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv_1bdd5e1e8b/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1355, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "/data/cat/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv_1bdd5e1e8b/lib/python3.11/site-packages/torch/nn/modules/module.py", line 915, in _apply
    module._apply(fn)
  File "/data/cat/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv_1bdd5e1e8b/lib/python3.11/site-packages/torch/nn/modules/module.py", line 915, in _apply
    module._apply(fn)
  File "/data/cat/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv_1bdd5e1e8b/lib/python3.11/site-packages/torch/nn/modules/module.py", line 942, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "/data/cat/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv_1bdd5e1e8b/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1341, in convert
    return t.to(
           ^^^^^
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


Result: {'VAL_ACC': None}
Final-results: {'VAL_ACC': None}
EXIT_CODE: 1
submitit INFO (2025-07-31 15:47:20,860) - Job completed successfully
submitit INFO (2025-07-31 15:47:20,863) - Exiting after successful completion