OmniOpt2

Experiment overview

Setting	Value
Model for non-random steps	BOTORCH_MODULAR
Max. nr. evaluations	500
Number random steps	20
Nr. of workers (parameter)	20
Main process memory (GB)	8
Worker memory (GB)	10

Job Summary per Generation Node

Generation Node	Total	FAILED	RUNNING
SOBOL	4	3	1

Experiment parameters

Name	Type	Lower bound	Upper bound	Values	Type	Log Scale?
epochs	range	10	200		int	No
lr	range	0.0001	0.1		float	No
batch_size	range	8	2048		int	No
hidden_size	range	8	4096		int	No
dropout	range	0	0.5		float	No
activation	fixed			leaky_relu
num_dense_layers	range	1	4		int	No
init	fixed			normal
weight_decay	range	0	1		float	No

Number of evaluations

Failed	Succeeded	Running	Total
3	0	1	4

Result names and types

name	min/max
VAL_ACC	max

Last progressbar status

2025-07-28 13:12:58: Sobol, failed: 3 ('VAL_ACC: <FLOAT>' not found), cancelled by 2054851 1∑1 (0%/20), finishing jobs (_get_next_trials), finished 1 job

Git-Version

Commit: 763ea96fbc9b7fc932e55190ab5e45707b9113f6 (7747)

trial_index,submit_time,queue_time,start_time,end_time,run_time,program_string,VAL_ACC,exit_code,signal,hostname,OO_Info_SLURM_JOB_ID,arm_name,trial_status,generation_node,epochs,lr,batch_size,hidden_size,dropout,num_dense_layers,weight_decay,activation,init
0,1753701037,9,1753701046,1753701058,12,python3 .tests/mnist/train --epochs 152 --learning_rate 0.08946310024261475147 --batch_size 745 --hidden_size 60 --dropout 0.31263169646263122559 --activation leaky_relu --num_dense_layers 2 --init normal --weight_decay 0.92956846952438354492,,1,,c137,525762,0_0,FAILED,SOBOL,152,0.089463100242614751467229439186,745,60,0.3126316964626312255859375,2,0.929568469524383544921875,leaky_relu,normal
1,1753701067,8,1753701075,1753701088,13,python3 .tests/mnist/train --epochs 45 --learning_rate 0.0142388096586801103 --batch_size 1540 --hidden_size 3392 --dropout 0.02955625485628843307 --activation leaky_relu --num_dense_layers 4 --init normal --weight_decay 0.20209008455276489258,,1,,c137,525763,1_0,FAILED,SOBOL,45,0.014238809658680110295514431584,1540,3392,0.029556254856288433074951171875,4,0.202090084552764892578125,leaky_relu,normal
2,1753701105,31,1753701136,1753701148,12,python3 .tests/mnist/train --epochs 105 --learning_rate 0.06659040905935689758 --batch_size 309 --hidden_size 1289 --dropout 0.24355002585798501968 --activation leaky_relu --num_dense_layers 3 --init normal --weight_decay 0.57932628598064184189,,1,,c137,525764,2_0,FAILED,SOBOL,105,0.066590409059356897580883583032,309,1289,0.243550025857985019683837890625,3,0.579326285980641841888427734375,leaky_relu,normal
3,,,,,,,,,,,,3_0,RUNNING,SOBOL,187,0.04171815225835889817673773905,1486,2062,0.413102468010038137435913085938,1,0.289224959909915924072265625,leaky_relu,normal

To cancel, press CTRL c, then run 'scancel 525759'
⠋ Importing logging...
⠋ Importing warnings...
⠋ Importing argparse...
⠋ Importing datetime...
⠋ Importing dataclass...
⠋ Importing hashlib...
⠋ Importing socket...
⠋ Importing stat...
⠋ Importing pwd...
⠋ Importing signal...
⠋ Importing base64...
⠋ Importing json...
⠋ Importing yaml...
⠋ Importing toml...
⠋ Importing csv...
⠋ Importing ast...
⠋ Importing rich.table...
⠋ Importing rich print...
⠋ Importing rich.pretty...
⠋ Importing rich.prompt...
⠋ Importing types.FunctionType...
⠋ Importing typing...
⠋ Importing ThreadPoolExecutor...
⠋ Importing submitit.LocalExecutor...
⠋ Importing submitit.Job...
⠋ Importing importlib.util...
⠋ Importing inspect...
⠋ Importing platform...
⠋ Importing inspect frame info...
⠋ Importing pathlib.Path...
⠋ Importing uuid...
⠋ Importing traceback...
⠋ Importing cowsay...
⠋ Importing psutil...
⠋ Importing shutil...
⠋ Importing itertools.combinations...
⠋ Importing os.listdir...
⠋ Importing os.path...
⠋ Importing PIL.Image...
⠋ Importing sixel...
⠋ Importing subprocess...
⠋ Importing tqdm...
⠴ Importing beartype...
⠋ Importing statistics...
⠋ Trying to import pyfiglet...
⠙ Importing helpers...
⠋ Parsing arguments...
⠙ Importing torch...
⠋ Importing numpy...
⠋ Importing collections...
⠦ Importing ax...
⠋ Importing ax.core.generator_run...
⠋ Importing Cont_X_trans and Y_trans from ax.modelbridge.registry...
⠋ Importing ax.core.arm...
⠋ Importing ax.core.objective...
⠋ Importing ax.core.Metric...
⠋ Importing ax.exceptions.core...
⠋ Importing ax.exceptions.generation_strategy...
⠋ Importing CORE_DECODER_REGISTRY...
⠋ Trying ax.generation_strategy.generation_node...
⠋ Importing GenerationStep, GenerationStrategy from generation_strategy...
⠋ Importing GenerationNode from generation_node...
⠋ Importing ExternalGenerationNode...
⠋ Importing MaxTrials...
⠋ Importing GeneratorSpec...
⠋ Importing Models from ax.modelbridge.registry...
⠋ Importing get_pending_observation_features...
⠋ Importing load_experiment...
⠋ Importing save_experiment...
⠋ Importing save_experiment_to_db...
⠋ Importing TrialStatus...
⠋ Importing Data...
⠋ Importing Experiment...
⠋ Importing parameter types...
⠋ Importing TParameterization...
⠋ Importing pandas...
⠋ Importing AxClient and ObjectiveProperties...
⠋ Importing RandomForestRegressor...
⠋ Importing botorch...
⠋ Importing submitit...
⠋ Importing ax logger...
⠋ Importing SQL-Storage-Stuff...
Run-UUID: 53cd40e1-2ed2-4c38-8ffd-abefe35a71d2
 _____                 _ _____       _   _____ 
|  _  |               (_)  _  |     | | / __  \
| | | |_ __ ___  _ __  _| | | |_ __ | |_`' / /'
| | | | '_ ` _ \| '_ \| | | | | '_ \| __| / /  
\ \_/ / | | | | | | | | \ \_/ / |_) | |_./ /___
 \___/|_| |_| |_|_| |_|_|\___/| .__/ \__\_____/
                              | |              
                              |_|              

⠋ Writing worker creation log...
omniopt --partition=alpha --experiment_name=mnist_gpu_noall --mem_gb=10 --time=1440 --worker_timeout=120 --max_eval=500 --num_parallel_jobs=20 --gpus=1 --num_random_steps=20 --follow --live_share --send_anonymized_usage_stats --result_names VAL_ACC=max --run_program=cHl0aG9uMyAudGVzdHMvbW5pc3QvdHJhaW4gLS1lcG9jaHMgJWVwb2NocyAtLWxlYXJuaW5nX3JhdGUgJWxyIC0tYmF0Y2hfc2l6ZSAlYmF0Y2hfc2l6ZSAtLWhpZGRlbl9zaXplICVoaWRkZW5fc2l6ZSAtLWRyb3BvdXQgJWRyb3BvdXQgLS1hY3RpdmF0aW9uICVhY3RpdmF0aW9uIC0tbnVtX2RlbnNlX2xheWVycyAlbnVtX2RlbnNlX2xheWVycyAtLWluaXQgJWluaXQgLS13ZWlnaHRfZGVjYXkgJXdlaWdodF9kZWNheQ== --cpus_per_task=1 --nodes_per_job=1 --revert_to_random_when_seemingly_exhausted --model=BOTORCH_MODULAR --n_estimators_randomforest=100 --run_mode=local --occ_type=euclid --main_process_gb=8 --max_nr_of_zero_results=50 --slurm_signal_delay_s=0 --max_failed_jobs=0 --max_attempts_for_generation=20 --num_restarts=20 --raw_samples=1024 --max_abandoned_retrial=20 --max_num_of_parallel_sruns=16 --parameter epochs range 10 200 int false --parameter lr range 0.0001 0.1 float false --parameter batch_size range 8 2048 int false --parameter hidden_size range 8 4096 int false --parameter dropout range 0 0.5 float false --parameter activation fixed leaky_relu --parameter num_dense_layers range 1 4 int false --parameter init fixed normal --parameter weight_decay range 0 1 float false --ui_url aHR0cHM6Ly9pbWFnZXNlZy5zY2Fkcy5kZS9vbW5pYXgvZ3VpP3BhcnRpdGlvbj1hbHBoYSZleHBlcmltZW50X25hbWU9bW5pc3RfZ3B1X25vYWxsJnJlc2VydmF0aW9uPSZhY2NvdW50PSZtZW1fZ2I9MTAmdGltZT0xNDQwJndvcmtlcl90aW1lb3V0PTEyMCZtYXhfZXZhbD01MDAmbnVtX3BhcmFsbGVsX2pvYnM9MjAmZ3B1cz0xJm51bV9yYW5kb21fc3RlcHM9MjAmZm9sbG93PTEmbGl2ZV9zaGFyZT0xJnNlbmRfYW5vbnltaXplZF91c2FnZV9zdGF0cz0xJmNvbnN0cmFpbnRzPSZyZXN1bHRfbmFtZXM9VkFMX0FDQyUzRG1heCZydW5fcHJvZ3JhbT1weXRob24zJTIwLnRlc3RzJTJGbW5pc3QlMkZ0cmFpbiUyMC0tZXBvY2hzJTIwJTI1ZXBvY2hzJTIwLS1sZWFybmluZ19yYXRlJTIwJTI1bHIlMjAtLWJhdGNoX3NpemUlMjAlMjViYXRjaF9zaXplJTIwLS1oaWRkZW5fc2l6ZSUyMCUyNWhpZGRlbl9zaXplJTIwLS1kcm9wb3V0JTIwJTI1ZHJvcG91dCUyMC0tYWN0aXZhdGlvbiUyMCUyNWFjdGl2YXRpb24lMjAtLW51bV9kZW5zZV9sYXllcnMlMjAlMjVudW1fZGVuc2VfbGF5ZXJzJTIwLS1pbml0JTIwJTI1aW5pdCUyMC0td2VpZ2h0X2RlY2F5JTIwJTI1d2VpZ2h0X2RlY2F5JmNwdXNfcGVyX3Rhc2s9MSZub2Rlc19wZXJfam9iPTEmc2VlZD0mZHJ5cnVuPTAmZGVidWc9MCZyZXZlcnRfdG9fcmFuZG9tX3doZW5fc2VlbWluZ2x5X2V4aGF1c3RlZD0xJmdyaWRzZWFyY2g9MCZtb2RlbD1CT1RPUkNIX01PRFVMQVImZXh0ZXJuYWxfZ2VuZXJhdG9yPSZuX2VzdGltYXRvcnNfcmFuZG9tZm9yZXN0PTEwMCZpbnN0YWxsYXRpb25fbWV0aG9kPWNsb25lJnJ1bl9tb2RlPWxvY2FsJmRpc2FibGVfdHFkbT0wJnZlcmJvc2VfdHFkbT0wJmZvcmNlX2xvY2FsX2V4ZWN1dGlvbj0wJmF1dG9fZXhjbHVkZV9kZWZlY3RpdmVfaG9zdHM9MCZzaG93X3NpeGVsX2dlbmVyYWw9MCZzaG93X3NpeGVsX3RyaWFsX2luZGV4X3Jlc3VsdD0wJnNob3dfc2l4ZWxfc2NhdHRlcj0wJnNob3dfd29ya2VyX3BlcmNlbnRhZ2VfdGFibGVfYXRfZW5kPTAmb2NjPTAmb2NjX3R5cGU9ZXVjbGlkJm5vX3NsZWVwPTAmc2x1cm1fdXNlX3NydW49MCZ2ZXJib3NlX2JyZWFrX3J1bl9zZWFyY2hfdGFibGU9MCZhYmJyZXZpYXRlX2pvYl9uYW1lcz0wJm1haW5fcHJvY2Vzc19nYj04Jm1heF9ucl9vZl96ZXJvX3Jlc3VsdHM9NTAmc2x1cm1fc2lnbmFsX2RlbGF5X3M9MCZtYXhfZmFpbGVkX2pvYnM9MCZleGNsdWRlPSZ1c2VybmFtZT0mZ2VuZXJhdGlvbl9zdHJhdGVneT0mcm9vdF92ZW52X2Rpcj0md29ya2Rpcj0mZG9udF9qaXRfY29tcGlsZT0wJmZpdF9vdXRfb2ZfZGVzaWduPTAmcmVmaXRfb25fY3Y9MCZzaG93X2dlbmVyYXRlX3RpbWVfdGFibGU9MCZkb250X3dhcm1fc3RhcnRfcmVmaXR0aW5nPTAmbWF4X2F0dGVtcHRzX2Zvcl9nZW5lcmF0aW9uPTIwJm51bV9yZXN0YXJ0cz0yMCZyYXdfc2FtcGxlcz0xMDI0Jm1heF9hYmFuZG9uZWRfcmV0cmlhbD0yMCZtYXhfbnVtX29mX3BhcmFsbGVsX3NydW5zPTE2JmZvcmNlX2Nob2ljZV9mb3JfcmFuZ2VzPTAmbm9fdHJhbnNmb3JtX2lucHV0cz0wJmZpdF9hYmFuZG9uZWQ9MCZub19ub3JtYWxpemVfeT0wJnZlcmJvc2U9MCZnZW5lcmF0ZV9hbGxfam9ic19hdF9vbmNlPTAmZmxhbWVfZ3JhcGg9MCZjaGVja291dF90b19sYXRlc3RfdGVzdGVkX3ZlcnNpb249MCZwYXJhbWV0ZXJfMF9uYW1lPWVwb2NocyZwYXJhbWV0ZXJfMF90eXBlPXJhbmdlJnBhcmFtZXRlcl8wX21pbj0xMCZwYXJhbWV0ZXJfMF9tYXg9MjAwJnBhcmFtZXRlcl8wX251bWJlcl90eXBlPWludCZwYXJhbWV0ZXJfMF9sb2dfc2NhbGU9ZmFsc2UmcGFyYW1ldGVyXzFfbmFtZT1sciZwYXJhbWV0ZXJfMV90eXBlPXJhbmdlJnBhcmFtZXRlcl8xX21pbj0wLjAwMDEmcGFyYW1ldGVyXzFfbWF4PTAuMSZwYXJhbWV0ZXJfMV9udW1iZXJfdHlwZT1mbG9hdCZwYXJhbWV0ZXJfMV9sb2dfc2NhbGU9ZmFsc2UmcGFyYW1ldGVyXzJfbmFtZT1iYXRjaF9zaXplJnBhcmFtZXRlcl8yX3R5cGU9cmFuZ2UmcGFyYW1ldGVyXzJfbWluPTgmcGFyYW1ldGVyXzJfbWF4PTIwNDgmcGFyYW1ldGVyXzJfbnVtYmVyX3R5cGU9aW50JnBhcmFtZXRlcl8yX2xvZ19zY2FsZT1mYWxzZSZwYXJhbWV0ZXJfM19uYW1lPWhpZGRlbl9zaXplJnBhcmFtZXRlcl8zX3R5cGU9cmFuZ2UmcGFyYW1ldGVyXzNfbWluPTgmcGFyYW1ldGVyXzNfbWF4PTQwOTYmcGFyYW1ldGVyXzNfbnVtYmVyX3R5cGU9aW50JnBhcmFtZXRlcl8zX2xvZ19zY2FsZT1mYWxzZSZwYXJhbWV0ZXJfNF9uYW1lPWRyb3BvdXQmcGFyYW1ldGVyXzRfdHlwZT1yYW5nZSZwYXJhbWV0ZXJfNF9taW49MCZwYXJhbWV0ZXJfNF9tYXg9MC41JnBhcmFtZXRlcl80X251bWJlcl90eXBlPWZsb2F0JnBhcmFtZXRlcl80X2xvZ19zY2FsZT1mYWxzZSZwYXJhbWV0ZXJfNV9uYW1lPWFjdGl2YXRpb24mcGFyYW1ldGVyXzVfdHlwZT1maXhlZCZwYXJhbWV0ZXJfNV92YWx1ZT1sZWFreV9yZWx1JnBhcmFtZXRlcl82X25hbWU9bnVtX2RlbnNlX2xheWVycyZwYXJhbWV0ZXJfNl90eXBlPXJhbmdlJnBhcmFtZXRlcl82X21pbj0xJnBhcmFtZXRlcl82X21heD00JnBhcmFtZXRlcl82X251bWJlcl90eXBlPWludCZwYXJhbWV0ZXJfNl9sb2dfc2NhbGU9ZmFsc2UmcGFyYW1ldGVyXzdfbmFtZT1pbml0JnBhcmFtZXRlcl83X3R5cGU9Zml4ZWQmcGFyYW1ldGVyXzdfdmFsdWU9bm9ybWFsJnBhcmFtZXRlcl84X25hbWU9d2VpZ2h0X2RlY2F5JnBhcmFtZXRlcl84X3R5cGU9cmFuZ2UmcGFyYW1ldGVyXzhfbWluPTAmcGFyYW1ldGVyXzhfbWF4PTEmcGFyYW1ldGVyXzhfbnVtYmVyX3R5cGU9ZmxvYXQmcGFyYW1ldGVyXzhfbG9nX3NjYWxlPWZhbHNlJnBhcnRpdGlvbj1hbHBoYSZudW1fcGFyYW1ldGVycz05
⠋ Disabling logging...
⠋ Setting run folder...
⠋ Creating folder /data/horse/ws/pwinkler-mnist_tst/omniopt/runs/mnist_gpu_noall/1...
⠋ Writing revert_to_random_when_seemingly_exhausted file ...
⠋ Writing username state file...
⠋ Writing result names file...
⠋ Writing result min/max file...
⠋ Saving state files...
Run-folder: /data/horse/ws/pwinkler-mnist_tst/omniopt/runs/mnist_gpu_noall/1
⠋ Printing run info...
⠋ Initializing NVIDIA-Logs...
⠋ Writing ui_url file if it is present...
⠋ Writing live_share file if it is present...
⠋ Writing job_start_time file...
⠹ Writing git info file...
⠋ Checking max_eval...
⠋ Calculating number of steps...
⠋ Adding excluded nodes...
⠋ Handling random steps...
⠋ Initializing ax_client...
[WARNING 07-28 13:09:45] ax.service.ax_client: Selecting a GenerationStrategy when using BatchTrials is in beta. Double check the recommended strategy matches your expectations.
⠋ Setting orchestrator...
You have 1 CPUs available for the main process. Using CUDA device NVIDIA H100. Generation strategy: SOBOL for 20 steps and then BOTORCH_MODULAR for 480 steps.
Run-Program: python3 .tests/mnist/train --epochs %epochs --learning_rate %lr --batch_size %batch_size --hidden_size %hidden_size --dropout %dropout --activation %activation --num_dense_layers %num_dense_layers --init %init --weight_decay %weight_decay
                                  Experiment parameters                                   
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━┓
┃ Name             ┃ Type  ┃ Lower bound ┃ Upper bound ┃ Values     ┃ Type  ┃ Log Scale? ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━┩
│ epochs           │ range │ 10          │ 200         │            │ int   │ No         │
│ lr               │ range │ 0.0001      │ 0.1         │            │ float │ No         │
│ batch_size       │ range │ 8           │ 2048        │            │ int   │ No         │
│ hidden_size      │ range │ 8           │ 4096        │            │ int   │ No         │
│ dropout          │ range │ 0           │ 0.5         │            │ float │ No         │
│ activation       │ fixed │             │             │ leaky_relu │       │            │
│ num_dense_layers │ range │ 1           │ 4           │            │ int   │ No         │
│ init             │ fixed │             │             │ normal     │       │            │
│ weight_decay     │ range │ 0           │ 1           │            │ float │ No         │
└──────────────────┴───────┴─────────────┴─────────────┴────────────┴───────┴────────────┘
        Result-Names         
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Result-Name ┃ Min or max? ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ VAL_ACC     │         max │
└─────────────┴─────────────┘
See https://imageseg.scads.de/omniax/share?user_id=pwinkler&experiment_name=mnist_gpu_noall&run_nr=1 for live-results.

█▀▀▀▀▀█ ▄ █ ▀▀▄▄▀ █▀▀ ▄▀ █▄▄  █▀▀▀▀▀█
█ ███ █ ▄▄▄ █▄ ▀▀▄ █▀▀▄▄█▄█▀▄ █ ███ █
█ ▀▀▀ █ ▄█ ▄▄▄▄▄█▀▀▀█  ▀ ▄▀▄█ █ ▀▀▀ █
▀▀▀▀▀▀▀ ▀▄█▄▀ ▀▄▀ ▀▄█▄█▄▀ █▄▀ ▀▀▀▀▀▀▀
██▀▀█ ▀▀▀█▄▀▄▀█▀▄▀▄▀█ ▀▄ █▀▀▄▀ █ █▄█ 
▄▀██  ▀█ ▀█ ▄▀ ▀█   ▄█▀ █  █▀▄▀▀▀█▀▄▀
  ▄▀  ▀ ▀▄█ ▀▀█ ▄█▀▄█▄▀▄ ▀▀ ▄▀▀█▀▀█▄▀
 █▄▄▄█▀█▀▄ █▀ █▀▄  ▄▄██   ▄▀▄▄█▄█▄▀▀█
▀▄█▀▄▄▀█ ▀ ▀▄█▀ ▄█▄█▄▄██▄▀█ ▄▀▀ ▀▀█▄▀
█▀▄█▄ ▀█ ▄▄ ▄▀▀▀█   █▄  ▀▀ ██▀██▀ ▀▀█
▀█▄▀▄▀▀██   ▀▄█▄██▄▄▄▄ ▄▀▀▀ ▄█▀▄▀███▀
▀▀▀▀▀▄▀▀██▀█▀▀███▀▄ █▄▀ ▄ █▀█▄▀█   ▀█
█▄▀██▄▀▄▄█▀▀▄█▀▀██▄█▄▄▀█▀█▄ ▄ ▀▄▀▀██▀
█ ▀ ▄▄▀▀ ▀▀ ▄▀▀█  ▀ ▀██▄ ▀█▀█▀ ▀▀  ▀█
▀ ▀   ▀ █▄▄ ▀▀█▀█▀ █▄ █▀ ▀█▄█▀▀▀██▀  
█▀▀▀▀▀█ ▀ ▀█▀▀ ▀█▀▀▄ █ ▄█ █▄█ ▀ █  ▀█
█ ███ █ █▄█▀██▄▀▄▀ ▄█▄█▄ ▀█▀▀████▀▀ ▀
█ ▀▀▀ █ █ ▀ ▄ █▀█ ▄ ██   ▀▄▄▄▄▄█  ▀▀█
▀▀▀▀▀▀▀ ▀▀▀  ▀  ▀     ▀ ▀▀▀   ▀▀ ▀▀▀▀
Sobol, failed: 2 ('VAL_ACC: ' not found), completed/pending 1/1∑2 (5%/20), job_failed                           :   0%|░░░░░░░░░░| 0/500 [02:31failed: 3 ('VAL_ACC: ' not found), cancelled by 2054851 1∑1 (0%/20), finishing jobs (_get_next_trials), finished 1 job:   0%|░░░░░░░░░░| 0/500 [02:54

2025-07-28 13:10:03: SOBOL, Started OmniOpt2 run...
2025-07-28 13:10:17: Sobol, getting new HP set     
2025-07-28 13:10:25: Sobol, requested 1 jobs, got 1, 8.26 s/job
2025-07-28 13:10:29: Sobol, eval #1/1 start                    
2025-07-28 13:10:33: Sobol, starting new job                   
2025-07-28 13:10:38: Sobol, unknown 1∑1 (5%/20), started new job
2025-07-28 13:10:43: Sobol, running 1∑1 (5%/20), getting new HP set
2025-07-28 13:10:51: Sobol, running 1∑1 (5%/20), requested 1 jobs, got 1, 8.36 s/job
2025-07-28 13:10:55: Sobol, running 1∑1 (5%/20), eval #1/1 start                    
2025-07-28 13:11:03: Sobol, running 1∑1 (0%/20), starting new job                   
2025-07-28 13:11:08: Sobol, completed/unknown 1/1∑2 (5%/20), started new job        
2025-07-28 13:11:13: Sobol, completed/running 1/1∑2 (5%/20), job_failed             
2025-07-28 13:11:21: Sobol, failed: 1 ('VAL_ACC: <FLOAT>' not found), running 1∑1 (5%/20), finishing jobs (_get_next_trials), finished 1 job
2025-07-28 13:11:26: Sobol, failed: 1 ('VAL_ACC: <FLOAT>' not found), running 1∑1 (5%/20), getting new HP set                               
2025-07-28 13:11:34: Sobol, failed: 1 ('VAL_ACC: <FLOAT>' not found), running 1∑1 (0%/20), requested 1 jobs, got 1, 8.27 s/job              
2025-07-28 13:11:38: Sobol, failed: 1 ('VAL_ACC: <FLOAT>' not found), completed 1∑1 (0%/20), eval #1/1 start                                
2025-07-28 13:11:42: Sobol, failed: 1 ('VAL_ACC: <FLOAT>' not found), completed 1∑1 (0%/20), starting new job                               
2025-07-28 13:11:48: Sobol, failed: 1 ('VAL_ACC: <FLOAT>' not found), completed/unknown 1/1∑2 (5%/20), started new job                      
2025-07-28 13:11:52: Sobol, failed: 1 ('VAL_ACC: <FLOAT>' not found), completed/pending 1/1∑2 (5%/20), job_failed                           
2025-07-28 13:12:00: Sobol, failed: 2 ('VAL_ACC: <FLOAT>' not found), pending 1∑1 (5%/20), finishing jobs (_get_next_trials), finished 1 job
2025-07-28 13:12:05: Sobol, failed: 2 ('VAL_ACC: <FLOAT>' not found), pending 1∑1 (5%/20), getting new HP set                               
2025-07-28 13:12:13: Sobol, failed: 2 ('VAL_ACC: <FLOAT>' not found), pending 1∑1 (5%/20), requested 1 jobs, got 1, 8.21 s/job              
2025-07-28 13:12:17: Sobol, failed: 2 ('VAL_ACC: <FLOAT>' not found), running 1∑1 (5%/20), eval #1/1 start                                  
2025-07-28 13:12:21: Sobol, failed: 2 ('VAL_ACC: <FLOAT>' not found), running 1∑1 (5%/20), starting new job                                 
2025-07-28 13:12:26: Sobol, failed: 2 ('VAL_ACC: <FLOAT>' not found), running/unknown 1/1∑2 (10%/20), started new job                       
2025-07-28 13:12:35: Sobol, failed: 2 ('VAL_ACC: <FLOAT>' not found), completed/pending 1/1∑2 (5%/20), job_failed                           
2025-07-28 13:12:58: Sobol, failed: 3 ('VAL_ACC: <FLOAT>' not found), cancelled by 2054851 1∑1 (0%/20), finishing jobs (_get_next_trials), finished 1 job

Arguments Overview

Key	Value
config_yaml	None
config_toml	None
config_json	None
num_random_steps	20
max_eval	500
run_program	[['cHl0aG9uMyAudGVzdHMvbW5pc3QvdHJhaW4gLS1lcG9jaHMgJWVwb2NocyAtLWxlYXJuaW5nX3JhdGUgJWxyIC0tYmF0Y2hfc2l6ZSAlYmF0Y2hfc2l6ZSAtLWhpZGRlbl9zaXplICVoaWRkZW5f…
experiment_name	mnist_gpu_noall
mem_gb	10
parameter	[['epochs', 'range', '10', '200', 'int', 'false'], ['lr', 'range', '0.0001', '0.1', 'float', 'false'], ['batch_size', 'range', '8', '2048', 'int',
	'false'], ['hidden_size', 'range', '8', '4096', 'int', 'false'], ['dropout', 'range', '0', '0.5', 'float', 'false'], ['activation', 'fixed',
	'leaky_relu'], ['num_dense_layers', 'range', '1', '4', 'int', 'false'], ['init', 'fixed', 'normal'], ['weight_decay', 'range', '0', '1', 'float',
	'false']]
continue_previous_job	None
experiment_constraints	None
run_dir	runs
seed	None
verbose_tqdm	False
model	BOTORCH_MODULAR
gridsearch	False
occ	False
show_sixel_scatter	False
show_sixel_general	False
show_sixel_trial_index_result	False
follow	True
send_anonymized_usage_stats	True
ui_url	aHR0cHM6Ly9pbWFnZXNlZy5zY2Fkcy5kZS9vbW5pYXgvZ3VpP3BhcnRpdGlvbj1hbHBoYSZleHBlcmltZW50X25hbWU9bW5pc3RfZ3B1X25vYWxsJnJlc2VydmF0aW9uPSZhY2NvdW50PSZtZW1fZ2I…
root_venv_dir	/home/pwinkler
exclude	None
main_process_gb	8
max_nr_of_zero_results	50
abbreviate_job_names	False
orchestrator_file	None
checkout_to_latest_tested_version	False
live_share	True
disable_tqdm	False
disable_previous_job_constraint	False
workdir
occ_type	euclid
result_names	['VAL_ACC=max']
minkowski_p	2
signed_weighted_euclidean_weights
generation_strategy	None
generate_all_jobs_at_once	False
revert_to_random_when_seemingly_exhausted	True
load_data_from_existing_jobs	[]
n_estimators_randomforest	100
max_attempts_for_generation	20
external_generator	None
username	None
max_failed_jobs	0
num_cpus_main_job	None
calculate_pareto_front_of_job	[]
show_generate_time_table	False
force_choice_for_ranges	False
max_abandoned_retrial	20
share_password	None
dryrun	False
db_url	None
run_program_once	None
dont_warm_start_refitting	False
refit_on_cv	False
fit_out_of_design	False
fit_abandoned	False
dont_jit_compile	False
num_restarts	20
raw_samples	1024
max_num_of_parallel_sruns	16
no_transform_inputs	False
no_normalize_y	False
transforms	[]
num_parallel_jobs	20
worker_timeout	120
slurm_use_srun	False
time	1440
partition	alpha
reservation	None
force_local_execution	False
slurm_signal_delay_s	0
nodes_per_job	1
cpus_per_task	1
account	None
gpus	1
run_mode	local
verbose	False
verbose_break_run_search_table	False
debug	False
flame_graph	False
no_sleep	False
tests	False
show_worker_percentage_table_at_end	False
auto_exclude_defective_hosts	False
run_tests_that_fail_on_taurus	False
raise_in_eval	False
show_ram_every_n_seconds	0
show_generation_and_submission_sixel	False
just_return_defaults	False
prettyprint	False

1753701003.2988896,20,0,0
1753701010.1255257,20,0,0
1753701010.251857,20,0,0

timestamp,ram_usage_mb,cpu_usage_percent
1753700985,715.12109375,1.2
1753701003,716.12109375,0.7
1753701010,716.12109375,0.9
1753701010,716.12109375,1.8
1753701010,716.12109375,0.5

submitit INFO (2025-07-28 13:10:43,435) - Starting with JobEnvironment(job_id=525762, hostname=c137, local_rank=0(1), node=0(1), global_rank=0(1))
submitit INFO (2025-07-28 13:10:43,437) - Loading pickle: /data/horse/ws/pwinkler-mnist_tst/omniopt/runs/mnist_gpu_noall/1/single_runs/525762/525762_submitted.pkl
/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv/lib64/python3.9/site-packages/networkx/utils/backends.py:135: RuntimeWarning: networkx backend defined more than once: nx-loopback
  backends.update(_get_backends("networkx.backends"))
Traceback (most recent call last):
  File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/train", line 285, in main
    model = SimpleMLP(
  File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv/lib64/python3.9/site-packages/torch/nn/modules/module.py", line 1355, in to
    return self._apply(convert)
  File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv/lib64/python3.9/site-packages/torch/nn/modules/module.py", line 915, in _apply
    module._apply(fn)
  File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv/lib64/python3.9/site-packages/torch/nn/modules/module.py", line 915, in _apply
    module._apply(fn)
  File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv/lib64/python3.9/site-packages/torch/nn/modules/module.py", line 942, in _apply
    param_applied = fn(param)
  File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv/lib64/python3.9/site-packages/torch/nn/modules/module.py", line 1341, in convert
    return t.to(
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


Parameters: {"epochs": 152, "lr": 0.08946310024261475, "batch_size": 745, "hidden_size": 60, "dropout": 0.3126316964626312, "num_dense_layers": 2, "weight_decay": 0.9295684695243835, "activation": "leaky_relu", "init": "normal"}
Debug-Infos: 
========
DEBUG INFOS START:
Program-Code: python3 .tests/mnist/train --epochs 152 --learning_rate 0.08946310024261475147 --batch_size 745 --hidden_size 60 --dropout 0.31263169646263122559 --activation leaky_relu --num_dense_layers 2 --init normal --weight_decay 0.92956846952438354492
pwd: /data/horse/ws/pwinkler-mnist_tst/omniopt
File: .tests/mnist/train
UID: 2054851
GID: 200270
SLURM_JOB_ID: 525762
Status-Change-Time: 1753273859.0
Size: 12359 Bytes
Permissions: -rwxr-xr-x
Owner: pwinkler
Last access: 1753701046.0
Last modification: 1753270258.0
Hostname: c137
========
DEBUG INFOS END

python3 .tests/mnist/train --epochs 152 --learning_rate 0.08946310024261475147 --batch_size 745 --hidden_size 60 --dropout 0.31263169646263122559 --activation leaky_relu --num_dense_layers 2 --init normal --weight_decay 0.92956846952438354492
stdout:
              Hyperparameters              
╭──────────────────┬─────────────────────╮
│ Parameter        │ Value               │
├──────────────────┼─────────────────────┤
│ Device           │ cuda                │
│ Epochs           │ 152                 │
│ Num Dense Layers │ 2                   │
│ Batch size       │ 745                 │
│ Learning rate    │ 0.08946310024261475 │
│ Hidden size      │ 60                  │
│ Dropout          │ 0.3126316964626312  │
│ Optimizer        │ adam                │
│ Momentum         │ 0.9                 │
│ Weight Decay     │ 0.9295684695243835  │
│ Activation       │ leaky_relu          │
│ Init Method      │ normal              │
│ Seed             │ None                │
╰──────────────────┴─────────────────────╯
Using device: cuda
An error occurred: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so 
the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


stderr:
 /data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv/lib64/python3.9/site-packages/networkx/utils/backends.py:135: RuntimeWarning: networkx backend defined more than once: nx-loopback
  backends.update(_get_backends("networkx.backends"))
Traceback (most recent call last):
  File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/train", line 285, in main
    model = SimpleMLP(
  File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv/lib64/python3.9/site-packages/torch/nn/modules/module.py", line 1355, in to
    return self._apply(convert)
  File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv/lib64/python3.9/site-packages/torch/nn/modules/module.py", line 915, in _apply
    module._apply(fn)
  File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv/lib64/python3.9/site-packages/torch/nn/modules/module.py", line 915, in _apply
    module._apply(fn)
  File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv/lib64/python3.9/site-packages/torch/nn/modules/module.py", line 942, in _apply
    param_applied = fn(param)
  File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv/lib64/python3.9/site-packages/torch/nn/modules/module.py", line 1341, in convert
    return t.to(
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


Result: {'VAL_ACC': None}
Final-results: {'VAL_ACC': None}
EXIT_CODE: 1
submitit INFO (2025-07-28 13:10:58,619) - Job completed successfully
submitit INFO (2025-07-28 13:10:58,620) - Exiting after successful completion