OmniOpt2

Experiment overview

Setting	Value
Model for non-random steps	BOTORCH_MODULAR
Max. nr. evaluations	500
Number random steps	20
Nr. of workers (parameter)	20
Main process memory (GB)	8
Worker memory (GB)	10

Job Summary per Generation Node

Generation Node	Total	FAILED	RUNNING
SOBOL	4	2	2

Experiment parameters

Name	Type	Lower bound	Upper bound	Values	Type	Log Scale?
epochs	range	10	200		int	No
lr	range	1e-05	0.1		float	No
batch_size	range	8	2048		int	No
hidden_size	range	8	2048		int	No
dropout	range	0	0.5		float	No
activation	fixed			leaky_relu
num_dense_layers	range	1	4		int	No
init	fixed			normal
weight_decay	range	0	1		float	No

Number of evaluations

Failed	Succeeded	Running	Total
2	0	2	4

Result names and types

name	min/max
VAL_ACC	max

Last progressbar status

2025-07-28 14:20:45: Sobol, failed: 2 ('VAL_ACC: <FLOAT>' not found), completed/unknown 1/1∑2 (5%/20), started new job

Git-Version

Commit: f8cc2bb61388d40d59d6f905d04ace6fce06fc06 (7747-1-gf8cc2bb61)

trial_index,submit_time,queue_time,start_time,end_time,run_time,program_string,VAL_ACC,exit_code,signal,hostname,OO_Info_SLURM_JOB_ID,arm_name,trial_status,generation_node,epochs,lr,batch_size,hidden_size,dropout,num_dense_layers,weight_decay,activation,init
0,1753705087,11,1753705098,1753705117,19,python3 .tests/mnist/train --epochs 43 --learning_rate 0.02532067022651434199 --batch_size 1270 --hidden_size 1739 --dropout 0.32288113236427307129 --activation leaky_relu --num_dense_layers 2 --init normal --weight_decay 0.73410296440124511719,,1,,c137,525814,0_0,FAILED,SOBOL,43,0.025320670226514341988321987742,1270,1739,0.3228811323642730712890625,2,0.7341029644012451171875,leaky_relu,normal
1,1753705114,15,1753705129,1753705141,12,python3 .tests/mnist/train --epochs 141 --learning_rate 0.0650301386935543263 --batch_size 903 --hidden_size 474 --dropout 0.05287032248452305794 --activation leaky_relu --num_dense_layers 3 --init normal --weight_decay 0.08460972178727388382,,1,,c137,525815,1_0,FAILED,SOBOL,141,0.065030138693554326301260459786,903,474,0.052870322484523057937622070312,3,0.084609721787273883819580078125,leaky_relu,normal
2,1753705187,32,1753705219,1753705231,12,python3 .tests/mnist/train --epochs 200 --learning_rate 0.00939196249185130123 --batch_size 1719 --hidden_size 1358 --dropout 0.20223156921565532684 --activation leaky_relu --num_dense_layers 4 --init normal --weight_decay 0.8465766608715057373,,1,,c137,525817,2_0,RUNNING,SOBOL,200,0.009391962491851301234047078026,1719,1358,0.20223156921565532684326171875,4,0.8465766608715057373046875,leaky_relu,normal
3,,,,,,,,,,,,3_0,RUNNING,SOBOL,102,0.099511309040747591980746733498,459,599,0.425381433684378862380981445312,1,0.4648427776992321014404296875,leaky_relu,normal

To cancel, press CTRL c, then run 'scancel 525812'
⠋ Importing logging...
⠋ Importing warnings...
⠋ Importing argparse...
⠋ Importing datetime...
⠋ Importing dataclass...
⠋ Importing hashlib...
⠋ Importing socket...
⠋ Importing stat...
⠋ Importing pwd...
⠋ Importing signal...
⠋ Importing base64...
⠋ Importing json...
⠋ Importing yaml...
⠋ Importing toml...
⠋ Importing csv...
⠋ Importing ast...
⠋ Importing rich.table...
⠋ Importing rich print...
⠋ Importing rich.pretty...
⠋ Importing rich.prompt...
⠋ Importing types.FunctionType...
⠋ Importing typing...
⠋ Importing ThreadPoolExecutor...
⠋ Importing submitit.LocalExecutor...
⠋ Importing submitit.Job...
⠋ Importing importlib.util...
⠋ Importing inspect...
⠋ Importing platform...
⠋ Importing inspect frame info...
⠋ Importing pathlib.Path...
⠋ Importing uuid...
⠋ Importing traceback...
⠋ Importing cowsay...
⠋ Importing psutil...
⠋ Importing shutil...
⠋ Importing itertools.combinations...
⠋ Importing os.listdir...
⠋ Importing os.path...
⠋ Importing PIL.Image...
⠋ Importing sixel...
⠋ Importing subprocess...
⠋ Importing tqdm...
⠸ Importing beartype...
⠋ Importing statistics...
⠋ Trying to import pyfiglet...
⠇ Importing helpers...
⠋ Parsing arguments...
⠧ Importing torch...
⠋ Importing numpy...
⠋ Importing collections...
⠧ Importing ax...
⠋ Importing ax.core.generator_run...
⠋ Importing Cont_X_trans and Y_trans from ax.modelbridge.registry...
⠋ Importing ax.core.arm...
⠋ Importing ax.core.objective...
⠋ Importing ax.core.Metric...
⠋ Importing ax.exceptions.core...
⠋ Importing ax.exceptions.generation_strategy...
⠋ Importing CORE_DECODER_REGISTRY...
⠋ Trying ax.generation_strategy.generation_node...
⠋ Importing GenerationStep, GenerationStrategy from generation_strategy...
⠋ Importing GenerationNode from generation_node...
⠋ Importing ExternalGenerationNode...
⠋ Importing MaxTrials...
⠋ Importing GeneratorSpec...
⠋ Importing Models from ax.modelbridge.registry...
⠋ Importing get_pending_observation_features...
⠋ Importing load_experiment...
⠋ Importing save_experiment...
⠋ Importing save_experiment_to_db...
⠋ Importing TrialStatus...
⠋ Importing Data...
⠋ Importing Experiment...
⠋ Importing parameter types...
⠋ Importing TParameterization...
⠋ Importing pandas...
⠋ Importing AxClient and ObjectiveProperties...
⠋ Importing RandomForestRegressor...
⠋ Importing botorch...
⠋ Importing submitit...
⠋ Importing ax logger...
⠋ Importing SQL-Storage-Stuff...
Run-UUID: 90b21b28-5c79-4d3a-a111-84d8b45cfc9c
 _____                           _____           __       ___     
/\  __`\                      __/\  __`\        /\ \__  /'___`\   
\ \ \/\ \    ___ ___     ___ /\_\ \ \/\ \  _____\ \ ,_\/\_\ /\ \  
 \ \ \ \ \ /' __` __`\ /' _ `\/\ \ \ \ \ \/\ '__`\ \ \/\/_/// /__ 
  \ \ \_\ \/\ \/\ \/\ \/\ \/\ \ \ \ \ \_\ \ \ \L\ \ \ \_  // /_\ \
   \ \_____\ \_\ \_\ \_\ \_\ \_\ \_\ \_____\ \ ,__/\ \__\/\______/
    \/_____/\/_/\/_/\/_/\/_/\/_/\/_/\/_____/\ \ \/  \/__/\/_____/ 
                                             \ \_\                
                                              \/_/                

⠋ Writing worker creation log...
omniopt --partition=alpha --experiment_name=mnist_gpu_noall --mem_gb=10 --time=1440 --worker_timeout=120 --max_eval=500 --num_parallel_jobs=20 --gpus=1 --num_random_steps=20 --follow --live_share --send_anonymized_usage_stats --result_names VAL_ACC=max --run_program=cHl0aG9uMyAudGVzdHMvbW5pc3QvdHJhaW4gLS1lcG9jaHMgJWVwb2NocyAtLWxlYXJuaW5nX3JhdGUgJWxyIC0tYmF0Y2hfc2l6ZSAlYmF0Y2hfc2l6ZSAtLWhpZGRlbl9zaXplICVoaWRkZW5fc2l6ZSAtLWRyb3BvdXQgJWRyb3BvdXQgLS1hY3RpdmF0aW9uICVhY3RpdmF0aW9uIC0tbnVtX2RlbnNlX2xheWVycyAlbnVtX2RlbnNlX2xheWVycyAtLWluaXQgJWluaXQgLS13ZWlnaHRfZGVjYXkgJXdlaWdodF9kZWNheQ== --cpus_per_task=1 --nodes_per_job=1 --revert_to_random_when_seemingly_exhausted --model=BOTORCH_MODULAR --n_estimators_randomforest=100 --run_mode=local --occ_type=euclid --main_process_gb=8 --max_nr_of_zero_results=50 --slurm_signal_delay_s=0 --max_failed_jobs=0 --max_attempts_for_generation=20 --num_restarts=20 --raw_samples=1024 --max_abandoned_retrial=20 --max_num_of_parallel_sruns=16 --parameter epochs range 10 200 int false --parameter lr range 0.00001 0.1 float false --parameter batch_size range 8 2048 int false --parameter hidden_size range 8 2048 int false --parameter dropout range 0 0.5 float false --parameter activation fixed leaky_relu --parameter num_dense_layers range 1 4 int false --parameter init fixed normal --parameter weight_decay range 0 1 float false --ui_url aHR0cHM6Ly9pbWFnZXNlZy5zY2Fkcy5kZS9vbW5pYXgvZ3VpP3BhcnRpdGlvbj1hbHBoYSZleHBlcmltZW50X25hbWU9bW5pc3RfZ3B1X25vYWxsJnJlc2VydmF0aW9uPSZhY2NvdW50PSZtZW1fZ2I9MTAmdGltZT0xNDQwJndvcmtlcl90aW1lb3V0PTEyMCZtYXhfZXZhbD01MDAmbnVtX3BhcmFsbGVsX2pvYnM9MjAmZ3B1cz0xJm51bV9yYW5kb21fc3RlcHM9MjAmZm9sbG93PTEmbGl2ZV9zaGFyZT0xJnNlbmRfYW5vbnltaXplZF91c2FnZV9zdGF0cz0xJmNvbnN0cmFpbnRzPSZyZXN1bHRfbmFtZXM9VkFMX0FDQyUzRG1heCZydW5fcHJvZ3JhbT1weXRob24zJTIwLnRlc3RzJTJGbW5pc3QlMkZ0cmFpbiUyMC0tZXBvY2hzJTIwJTI1ZXBvY2hzJTIwLS1sZWFybmluZ19yYXRlJTIwJTI1bHIlMjAtLWJhdGNoX3NpemUlMjAlMjViYXRjaF9zaXplJTIwLS1oaWRkZW5fc2l6ZSUyMCUyNWhpZGRlbl9zaXplJTIwLS1kcm9wb3V0JTIwJTI1ZHJvcG91dCUyMC0tYWN0aXZhdGlvbiUyMCUyNWFjdGl2YXRpb24lMjAtLW51bV9kZW5zZV9sYXllcnMlMjAlMjVudW1fZGVuc2VfbGF5ZXJzJTIwLS1pbml0JTIwJTI1aW5pdCUyMC0td2VpZ2h0X2RlY2F5JTIwJTI1d2VpZ2h0X2RlY2F5JmNwdXNfcGVyX3Rhc2s9MSZub2Rlc19wZXJfam9iPTEmc2VlZD0mZHJ5cnVuPTAmZGVidWc9MCZyZXZlcnRfdG9fcmFuZG9tX3doZW5fc2VlbWluZ2x5X2V4aGF1c3RlZD0xJmdyaWRzZWFyY2g9MCZtb2RlbD1CT1RPUkNIX01PRFVMQVImZXh0ZXJuYWxfZ2VuZXJhdG9yPSZuX2VzdGltYXRvcnNfcmFuZG9tZm9yZXN0PTEwMCZpbnN0YWxsYXRpb25fbWV0aG9kPWNsb25lJnJ1bl9tb2RlPWxvY2FsJmRpc2FibGVfdHFkbT0wJnZlcmJvc2VfdHFkbT0wJmZvcmNlX2xvY2FsX2V4ZWN1dGlvbj0wJmF1dG9fZXhjbHVkZV9kZWZlY3RpdmVfaG9zdHM9MCZzaG93X3NpeGVsX2dlbmVyYWw9MCZzaG93X3NpeGVsX3RyaWFsX2luZGV4X3Jlc3VsdD0wJnNob3dfc2l4ZWxfc2NhdHRlcj0wJnNob3dfd29ya2VyX3BlcmNlbnRhZ2VfdGFibGVfYXRfZW5kPTAmb2NjPTAmb2NjX3R5cGU9ZXVjbGlkJm5vX3NsZWVwPTAmc2x1cm1fdXNlX3NydW49MCZ2ZXJib3NlX2JyZWFrX3J1bl9zZWFyY2hfdGFibGU9MCZhYmJyZXZpYXRlX2pvYl9uYW1lcz0wJm1haW5fcHJvY2Vzc19nYj04Jm1heF9ucl9vZl96ZXJvX3Jlc3VsdHM9NTAmc2x1cm1fc2lnbmFsX2RlbGF5X3M9MCZtYXhfZmFpbGVkX2pvYnM9MCZleGNsdWRlPSZ1c2VybmFtZT0mZ2VuZXJhdGlvbl9zdHJhdGVneT0mcm9vdF92ZW52X2Rpcj0md29ya2Rpcj0mZG9udF9qaXRfY29tcGlsZT0wJmZpdF9vdXRfb2ZfZGVzaWduPTAmcmVmaXRfb25fY3Y9MCZzaG93X2dlbmVyYXRlX3RpbWVfdGFibGU9MCZkb250X3dhcm1fc3RhcnRfcmVmaXR0aW5nPTAmbWF4X2F0dGVtcHRzX2Zvcl9nZW5lcmF0aW9uPTIwJm51bV9yZXN0YXJ0cz0yMCZyYXdfc2FtcGxlcz0xMDI0Jm1heF9hYmFuZG9uZWRfcmV0cmlhbD0yMCZtYXhfbnVtX29mX3BhcmFsbGVsX3NydW5zPTE2JmZvcmNlX2Nob2ljZV9mb3JfcmFuZ2VzPTAmbm9fdHJhbnNmb3JtX2lucHV0cz0wJmZpdF9hYmFuZG9uZWQ9MCZub19ub3JtYWxpemVfeT0wJnZlcmJvc2U9MCZnZW5lcmF0ZV9hbGxfam9ic19hdF9vbmNlPTAmZmxhbWVfZ3JhcGg9MCZjaGVja291dF90b19sYXRlc3RfdGVzdGVkX3ZlcnNpb249MCZwYXJhbWV0ZXJfMF9uYW1lPWVwb2NocyZwYXJhbWV0ZXJfMF90eXBlPXJhbmdlJnBhcmFtZXRlcl8wX21pbj0xMCZwYXJhbWV0ZXJfMF9tYXg9MjAwJnBhcmFtZXRlcl8wX251bWJlcl90eXBlPWludCZwYXJhbWV0ZXJfMF9sb2dfc2NhbGU9ZmFsc2UmcGFyYW1ldGVyXzFfbmFtZT1sciZwYXJhbWV0ZXJfMV90eXBlPXJhbmdlJnBhcmFtZXRlcl8xX21pbj0wLjAwMDAxJnBhcmFtZXRlcl8xX21heD0wLjEmcGFyYW1ldGVyXzFfbnVtYmVyX3R5cGU9ZmxvYXQmcGFyYW1ldGVyXzFfbG9nX3NjYWxlPWZhbHNlJnBhcmFtZXRlcl8yX25hbWU9YmF0Y2hfc2l6ZSZwYXJhbWV0ZXJfMl90eXBlPXJhbmdlJnBhcmFtZXRlcl8yX21pbj04JnBhcmFtZXRlcl8yX21heD0yMDQ4JnBhcmFtZXRlcl8yX251bWJlcl90eXBlPWludCZwYXJhbWV0ZXJfMl9sb2dfc2NhbGU9ZmFsc2UmcGFyYW1ldGVyXzNfbmFtZT1oaWRkZW5fc2l6ZSZwYXJhbWV0ZXJfM190eXBlPXJhbmdlJnBhcmFtZXRlcl8zX21pbj04JnBhcmFtZXRlcl8zX21heD0yMDQ4JnBhcmFtZXRlcl8zX251bWJlcl90eXBlPWludCZwYXJhbWV0ZXJfM19sb2dfc2NhbGU9ZmFsc2UmcGFyYW1ldGVyXzRfbmFtZT1kcm9wb3V0JnBhcmFtZXRlcl80X3R5cGU9cmFuZ2UmcGFyYW1ldGVyXzRfbWluPTAmcGFyYW1ldGVyXzRfbWF4PTAuNSZwYXJhbWV0ZXJfNF9udW1iZXJfdHlwZT1mbG9hdCZwYXJhbWV0ZXJfNF9sb2dfc2NhbGU9ZmFsc2UmcGFyYW1ldGVyXzVfbmFtZT1hY3RpdmF0aW9uJnBhcmFtZXRlcl81X3R5cGU9Zml4ZWQmcGFyYW1ldGVyXzVfdmFsdWU9bGVha3lfcmVsdSZwYXJhbWV0ZXJfNl9uYW1lPW51bV9kZW5zZV9sYXllcnMmcGFyYW1ldGVyXzZfdHlwZT1yYW5nZSZwYXJhbWV0ZXJfNl9taW49MSZwYXJhbWV0ZXJfNl9tYXg9NCZwYXJhbWV0ZXJfNl9udW1iZXJfdHlwZT1pbnQmcGFyYW1ldGVyXzZfbG9nX3NjYWxlPWZhbHNlJnBhcmFtZXRlcl83X25hbWU9aW5pdCZwYXJhbWV0ZXJfN190eXBlPWZpeGVkJnBhcmFtZXRlcl83X3ZhbHVlPW5vcm1hbCZwYXJhbWV0ZXJfOF9uYW1lPXdlaWdodF9kZWNheSZwYXJhbWV0ZXJfOF90eXBlPXJhbmdlJnBhcmFtZXRlcl84X21pbj0wJnBhcmFtZXRlcl84X21heD0xJnBhcmFtZXRlcl84X251bWJlcl90eXBlPWZsb2F0JnBhcmFtZXRlcl84X2xvZ19zY2FsZT1mYWxzZSZwYXJ0aXRpb249YWxwaGEmbnVtX3BhcmFtZXRlcnM9OQ==
⠋ Disabling logging...
⠋ Setting run folder...
⠋ Creating folder /data/horse/ws/pwinkler-mnist_tst/omniopt/runs/mnist_gpu_noall/5...
⠋ Writing revert_to_random_when_seemingly_exhausted file ...
⠋ Writing username state file...
⠋ Writing result names file...
⠋ Writing result min/max file...
⠋ Saving state files...
Run-folder: /data/horse/ws/pwinkler-mnist_tst/omniopt/runs/mnist_gpu_noall/5
⠋ Printing run info...
⠋ Initializing NVIDIA-Logs...
⠋ Writing ui_url file if it is present...
⠋ Writing live_share file if it is present...
⠋ Writing job_start_time file...
⠹ Writing git info file...
⠋ Checking max_eval...
⠋ Calculating number of steps...
⠋ Adding excluded nodes...
⠋ Handling random steps...
⠋ Initializing ax_client...
[WARNING 07-28 14:17:32] ax.service.ax_client: Selecting a GenerationStrategy when using BatchTrials is in beta. Double check the recommended strategy matches your expectations.
⠋ Setting orchestrator...
You have 1 CPUs available for the main process. Using CUDA device NVIDIA H100. Generation strategy: SOBOL for 20 steps and then BOTORCH_MODULAR for 480 steps.
Run-Program: python3 .tests/mnist/train --epochs %epochs --learning_rate %lr --batch_size %batch_size --hidden_size %hidden_size --dropout %dropout --activation %activation --num_dense_layers %num_dense_layers --init %init --weight_decay %weight_decay
                                  Experiment parameters                                   
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━┓
┃ Name             ┃ Type  ┃ Lower bound ┃ Upper bound ┃ Values     ┃ Type  ┃ Log Scale? ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━┩
│ epochs           │ range │ 10          │ 200         │            │ int   │ No         │
│ lr               │ range │ 1e-05       │ 0.1         │            │ float │ No         │
│ batch_size       │ range │ 8           │ 2048        │            │ int   │ No         │
│ hidden_size      │ range │ 8           │ 2048        │            │ int   │ No         │
│ dropout          │ range │ 0           │ 0.5         │            │ float │ No         │
│ activation       │ fixed │             │             │ leaky_relu │       │            │
│ num_dense_layers │ range │ 1           │ 4           │            │ int   │ No         │
│ init             │ fixed │             │             │ normal     │       │            │
│ weight_decay     │ range │ 0           │ 1           │            │ float │ No         │
└──────────────────┴───────┴─────────────┴─────────────┴────────────┴───────┴────────────┘
        Result-Names         
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Result-Name ┃ Min or max? ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ VAL_ACC     │         max │
└─────────────┴─────────────┘
See https://imageseg.scads.de/omniax/share?user_id=pwinkler&experiment_name=mnist_gpu_noall&run_nr=3 for live-results.

█▀▀▀▀▀█ ▄ █ ▀▀▄▄▀ █▀▀ ▄▀ █▄▄  █▀▀▀▀▀█
█ ███ █ ▄▄▄ █▄ ▀▀▄ █▀▀▄▄█▄█▀▄ █ ███ █
█ ▀▀▀ █ ▄█ ▄▄▄▄▄█▀▀▀█  ▀ ▄▀▄█ █ ▀▀▀ █
▀▀▀▀▀▀▀ ▀▄█▄▀ ▀▄▀ ▀▄█▄█▄▀ █▄▀ ▀▀▀▀▀▀▀
██▀▀▀▄▀▀█▀▄▀▄▀█▀▄▀▄▀█ ▀▄ █▀▀▄▀ █ █▄█ 
▄ ▀▄ ▄▀▄▀▄▀ ▄▀ ▀█   ▄█▀ █  █▀▄▀▀▀█▀▄▀
 ▄  █▀▀▀▀▀▄ ▀▀█ ▄█▀▄█▄▀▄ ▀▀ ▄▀▀█▀▀█▄▀
█▀ █▀█▀▀██▀█▀ █▀▄  ▄▄██   ▄▀▄▄█▄█▄▀▀█
▄ ▀█▄ ▀   ▀▀▄█▀ ▄█▄█▄▄██▄▀█ ▄▀▀ ▀▀█▄▀
  ▀█▄ ▀▄▄▄█ ▄▀▀▀█   █▄  ▀▀ ██▀██▀ ▀▀█
 ▄█▄ ▄▀███  ▀▄█▄██▄▄▄▄ ▄▀▀▀ ▄█▀▄▀███▀
 ▄█▀▄▀▀█▀  █▀▀███▀▄ █▄▀ ▄ █▀█▄▀█   ▀█
▄▄█ █▀▀▄█▀█▀▄█▀▀██▄█▄▄▀█▀█▄ ▄ ▀▄▀▀██▀
█  ▀██▀█▄▄█ ▄▀▀█  ▀ ▀██▄ ▀█▀█▀ ▀▀  ▀█
▀  ▀  ▀▀█▀█ ▀▀█▀█▀ █▄ █▀ ▀█▄█▀▀▀██▀  
█▀▀▀▀▀█ ▀ ▄█▀▀ ▀█▀▀▄ █ ▄█ █▄█ ▀ █  ▀█
█ ███ █ ██▄▀██▄▀▄▀ ▄█▄█▄ ▀█▀▀████▀▀ ▀
█ ▀▀▀ █ █▄▄▄▄ █▀█ ▄ ██   ▀▄▄▄▄▄█  ▀▀█
▀▀▀▀▀▀▀ ▀ ▀  ▀  ▀     ▀ ▀▀▀   ▀▀ ▀▀▀▀
Sobol, failed: 2 ('VAL_ACC: ' not found), pending 1∑1 (5%/20), getting new HP set                                 :   0%|░░░░░░░░░░| 0/500 [02:31failed: 2 ('VAL_ACC: ' not found), completed/unknown 1/1∑2 (5%/20), started new job                        :   0%|░░░░░░░░░░| 0/500 [03:07

2025-07-28 14:17:37: SOBOL, Started OmniOpt2 run...
2025-07-28 14:17:47: Sobol, getting new HP set     
2025-07-28 14:17:55: Sobol, requested 1 jobs, got 1, 8.22 s/job
2025-07-28 14:17:59: Sobol, eval #1/1 start                    
2025-07-28 14:18:03: Sobol, starting new job                   
2025-07-28 14:18:09: Sobol, unknown 1∑1 (5%/20), started new job
2025-07-28 14:18:13: Sobol, pending 1∑1 (5%/20), getting new HP set
2025-07-28 14:18:21: Sobol, running 1∑1 (5%/20), requested 1 jobs, got 1, 8.50 s/job
2025-07-28 14:18:26: Sobol, running 1∑1 (5%/20), eval #1/1 start                    
2025-07-28 14:18:29: Sobol, running 1∑1 (5%/20), starting new job                   
2025-07-28 14:18:35: Sobol, running/unknown 1/1∑2 (10%/20), started new job         
2025-07-28 14:18:43: Sobol, completed/pending 1/1∑2 (5%/20), job_failed             
2025-07-28 14:19:03: Sobol, failed: 1 ('VAL_ACC: <FLOAT>' not found), completed 1∑1 (0%/20), finishing jobs (_get_next_trials), finished 1 job
2025-07-28 14:19:23: Sobol, failed: 1 ('VAL_ACC: <FLOAT>' not found), completed 1∑1 (0%/20), getting new HP set                               
2025-07-28 14:19:33: Sobol, failed: 1 ('VAL_ACC: <FLOAT>' not found), completed 1∑1 (0%/20), requested 1 jobs, got 1, 22.51 s/job             
2025-07-28 14:19:37: Sobol, failed: 1 ('VAL_ACC: <FLOAT>' not found), completed 1∑1 (0%/20), eval #1/1 start                                  
2025-07-28 14:19:42: Sobol, failed: 1 ('VAL_ACC: <FLOAT>' not found), completed 1∑1 (0%/20), starting new job                                 
2025-07-28 14:19:48: Sobol, failed: 1 ('VAL_ACC: <FLOAT>' not found), completed/unknown 1/1∑2 (5%/20), started new job                        
2025-07-28 14:19:53: Sobol, failed: 1 ('VAL_ACC: <FLOAT>' not found), completed/pending 1/1∑2 (5%/20), job_failed                             
2025-07-28 14:20:03: Sobol, failed: 2 ('VAL_ACC: <FLOAT>' not found), pending 1∑1 (5%/20), finishing jobs (_get_next_trials), finished 1 job  
2025-07-28 14:20:09: Sobol, failed: 2 ('VAL_ACC: <FLOAT>' not found), pending 1∑1 (5%/20), getting new HP set                                 
2025-07-28 14:20:19: Sobol, failed: 2 ('VAL_ACC: <FLOAT>' not found), pending 1∑1 (5%/20), requested 1 jobs, got 1, 9.47 s/job                
2025-07-28 14:20:29: Sobol, failed: 2 ('VAL_ACC: <FLOAT>' not found), running 1∑1 (5%/20), eval #1/1 start                                    
2025-07-28 14:20:38: Sobol, failed: 2 ('VAL_ACC: <FLOAT>' not found), running 1∑1 (0%/20), starting new job                                   
2025-07-28 14:20:45: Sobol, failed: 2 ('VAL_ACC: <FLOAT>' not found), completed/unknown 1/1∑2 (5%/20), started new job

Arguments Overview

Key	Value
config_yaml	None
config_toml	None
config_json	None
num_random_steps	20
max_eval	500
run_program	[['cHl0aG9uMyAudGVzdHMvbW5pc3QvdHJhaW4gLS1lcG9jaHMgJWVwb2NocyAtLWxlYXJuaW5nX3JhdGUgJWxyIC0tYmF0Y2hfc2l6ZSAlYmF0Y2hfc2l6ZSAtLWhpZGRlbl9zaXplICVoaWRkZW5f…
experiment_name	mnist_gpu_noall
mem_gb	10
parameter	[['epochs', 'range', '10', '200', 'int', 'false'], ['lr', 'range', '0.00001', '0.1', 'float', 'false'], ['batch_size', 'range', '8', '2048', 'int',
	'false'], ['hidden_size', 'range', '8', '2048', 'int', 'false'], ['dropout', 'range', '0', '0.5', 'float', 'false'], ['activation', 'fixed',
	'leaky_relu'], ['num_dense_layers', 'range', '1', '4', 'int', 'false'], ['init', 'fixed', 'normal'], ['weight_decay', 'range', '0', '1', 'float',
	'false']]
continue_previous_job	None
experiment_constraints	None
run_dir	runs
seed	None
verbose_tqdm	False
model	BOTORCH_MODULAR
gridsearch	False
occ	False
show_sixel_scatter	False
show_sixel_general	False
show_sixel_trial_index_result	False
follow	True
send_anonymized_usage_stats	True
ui_url	aHR0cHM6Ly9pbWFnZXNlZy5zY2Fkcy5kZS9vbW5pYXgvZ3VpP3BhcnRpdGlvbj1hbHBoYSZleHBlcmltZW50X25hbWU9bW5pc3RfZ3B1X25vYWxsJnJlc2VydmF0aW9uPSZhY2NvdW50PSZtZW1fZ2I…
root_venv_dir	/home/pwinkler
exclude	None
main_process_gb	8
max_nr_of_zero_results	50
abbreviate_job_names	False
orchestrator_file	None
checkout_to_latest_tested_version	False
live_share	True
disable_tqdm	False
disable_previous_job_constraint	False
workdir
occ_type	euclid
result_names	['VAL_ACC=max']
minkowski_p	2
signed_weighted_euclidean_weights
generation_strategy	None
generate_all_jobs_at_once	False
revert_to_random_when_seemingly_exhausted	True
load_data_from_existing_jobs	[]
n_estimators_randomforest	100
max_attempts_for_generation	20
external_generator	None
username	None
max_failed_jobs	0
num_cpus_main_job	None
calculate_pareto_front_of_job	[]
show_generate_time_table	False
force_choice_for_ranges	False
max_abandoned_retrial	20
share_password	None
dryrun	False
db_url	None
run_program_once	None
dont_warm_start_refitting	False
refit_on_cv	False
fit_out_of_design	False
fit_abandoned	False
dont_jit_compile	False
num_restarts	20
raw_samples	1024
max_num_of_parallel_sruns	16
no_transform_inputs	False
no_normalize_y	False
transforms	[]
num_parallel_jobs	20
worker_timeout	120
slurm_use_srun	False
time	1440
partition	alpha
reservation	None
force_local_execution	False
slurm_signal_delay_s	0
nodes_per_job	1
cpus_per_task	1
account	None
gpus	1
run_mode	local
verbose	False
verbose_break_run_search_table	False
debug	False
flame_graph	False
no_sleep	False
tests	False
show_worker_percentage_table_at_end	False
auto_exclude_defective_hosts	False
run_tests_that_fail_on_taurus	False
raise_in_eval	False
show_ram_every_n_seconds	0
show_generation_and_submission_sixel	False
just_return_defaults	False
prettyprint	False

1753705057.7139962,20,0,0
1753705062.1682985,20,0,0
1753705062.2960618,20,0,0

timestamp,ram_usage_mb,cpu_usage_percent
1753705053,711.859375,1.5
1753705057,711.859375,1.1
1753705062,711.859375,0.9
1753705062,712.359375,3.2
1753705062,712.359375,1.5

submitit INFO (2025-07-28 14:18:16,568) - Starting with JobEnvironment(job_id=525814, hostname=c137, local_rank=0(1), node=0(1), global_rank=0(1))
submitit INFO (2025-07-28 14:18:16,570) - Loading pickle: /data/horse/ws/pwinkler-mnist_tst/omniopt/runs/mnist_gpu_noall/5/single_runs/525814/525814_submitted.pkl
Traceback (most recent call last):
  File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/train", line 295, in main
    ).to(args.device)
      ^^^^^^^^^^^^^^^
  File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv_1bdd5e1e8b/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1355, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv_1bdd5e1e8b/lib/python3.11/site-packages/torch/nn/modules/module.py", line 915, in _apply
    module._apply(fn)
  File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv_1bdd5e1e8b/lib/python3.11/site-packages/torch/nn/modules/module.py", line 915, in _apply
    module._apply(fn)
  File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv_1bdd5e1e8b/lib/python3.11/site-packages/torch/nn/modules/module.py", line 942, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv_1bdd5e1e8b/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1341, in convert
    return t.to(
           ^^^^^
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


Parameters: {"epochs": 43, "lr": 0.025320670226514342, "batch_size": 1270, "hidden_size": 1739, "dropout": 0.32288113236427307, "num_dense_layers": 2, "weight_decay": 0.7341029644012451, "activation": "leaky_relu", "init": "normal"}
Debug-Infos: 
========
DEBUG INFOS START:
Program-Code: python3 .tests/mnist/train --epochs 43 --learning_rate 0.02532067022651434199 --batch_size 1270 --hidden_size 1739 --dropout 0.32288113236427307129 --activation leaky_relu --num_dense_layers 2 --init normal --weight_decay 0.73410296440124511719
pwd: /data/horse/ws/pwinkler-mnist_tst/omniopt
File: .tests/mnist/train
UID: 2054851
GID: 200270
SLURM_JOB_ID: 525814
Status-Change-Time: 1753703502.0
Size: 12452 Bytes
Permissions: -rwxr-xr-x
Owner: pwinkler
Last access: 1753705105.0
Last modification: 1753703502.0
Hostname: c137
========
DEBUG INFOS END

python3 .tests/mnist/train --epochs 43 --learning_rate 0.02532067022651434199 --batch_size 1270 --hidden_size 1739 --dropout 0.32288113236427307129 --activation leaky_relu --num_dense_layers 2 --init normal --weight_decay 0.73410296440124511719
stdout:
               Hyperparameters              
╭──────────────────┬──────────────────────╮
│ Parameter        │ Value                │
├──────────────────┼──────────────────────┤
│ Device           │ cuda                 │
│ Epochs           │ 43                   │
│ Num Dense Layers │ 2                    │
│ Batch size       │ 1270                 │
│ Learning rate    │ 0.025320670226514342 │
│ Hidden size      │ 1739                 │
│ Dropout          │ 0.32288113236427307  │
│ Optimizer        │ adam                 │
│ Momentum         │ 0.9                  │
│ Weight Decay     │ 0.7341029644012451   │
│ Activation       │ leaky_relu           │
│ Init Method      │ normal               │
│ Seed             │ None                 │
╰──────────────────┴──────────────────────╯
Using device: cuda
An error occurred: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so 
the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


stderr:
 Traceback (most recent call last):
  File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/train", line 295, in main
    ).to(args.device)
      ^^^^^^^^^^^^^^^
  File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv_1bdd5e1e8b/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1355, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv_1bdd5e1e8b/lib/python3.11/site-packages/torch/nn/modules/module.py", line 915, in _apply
    module._apply(fn)
  File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv_1bdd5e1e8b/lib/python3.11/site-packages/torch/nn/modules/module.py", line 915, in _apply
    module._apply(fn)
  File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv_1bdd5e1e8b/lib/python3.11/site-packages/torch/nn/modules/module.py", line 942, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "/data/horse/ws/pwinkler-mnist_tst/omniopt/.tests/mnist/.torch_venv_1bdd5e1e8b/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1341, in convert
    return t.to(
           ^^^^^
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


Result: {'VAL_ACC': None}
Final-results: {'VAL_ACC': None}
EXIT_CODE: 1
submitit INFO (2025-07-28 14:18:37,144) - Job completed successfully
submitit INFO (2025-07-28 14:18:37,146) - Exiting after successful completion