ScaDS.ai-Logo OmniOpt2-Logo
Current CI-Pipeline Badge
Time since last commit
Test coverage
Tutorials&Help GUI Share Statistics v2.0.3534

Search

Exit-Codes and Bash-scripting

What are exit-codes?

Each program on Linux, after it runs, returns a value to the operating system to tell if it has succeeded and if not, what error may have occurred.

0 means 'everything was fine', every other value (possible is 1-255) mean 'something went wrong' and you can assign errors or groups of errors to one exit code. This is what OmniOpt2 extensively does, to make scripting it easier.

Exit code groups in OmniOpt2

Depending on the error, if any, occurred, OmniOpt2 ends with the following exit codes:

Exit Code Information

Exit Code Error Group Description
0Seems to have worked properly
1There was an error regarding the python-syntax. Is your python-version too old? If you are on a slurm-system, this may mean that the job cannot be started. Check stdout.
2Wrong CLI arguments
3Invalid exit code detected
4Failed loading modules
5Errors regarding toml or yaml config files
6Error creating .logs dir
7Probably versioning error. Try removing virtualenv and try again
8Probably something went wrong trying to plot sixel graphics
9Probably something went wrong trying to use or define the ax_client or executor
10Usually only returned by dier (for debugging)
11Required program not found (check logs)
12Error with pip, check logs
13Run folder already exists
14Error installing OmniOpt2 via install_omniax.sh
15Unimplemented error
16Wrongly called .py files: Probably you tried to call them directly instead of over the bash file
18test_wronggoing_stuff program not found (only --tests)
19Something was wrong with your parameters. See output for details
20Something went wrong installing the required modules
21requirements.txt or test_requirements.txt not found
22Python header-files not found
23Loading of Environment failed
31Basic modules could not be loaded or you cancelled loading them
32Error regarding the --signed_weighted_euclidean_weights parameter. Check output for details
44Continuation of previous job failed
45Could not create log dir
47Missing checkpoint or defective file or state files (check output)
49Something went wrong while creating the experiment
50Something went wrong with the --result_names option (check output)
87Search space exhausted or search was cancelled
88Search was done according to ax
89All jobs failed
90Error creating the experiment_args
91Error creating the experiment_args
92Error loading torch: This can happen when your disk is full
93Failed loading module. See output for specific errors.
99It seems like the run folder was deleted during the run
100--mem_gb or --gpus, which must be int, has received a value that is not int
101Error using ax_client: it was not defined where it should have been
103--time is not in minutes or HH:MM format
104One of the parameters --mem_gb, --time, --run_program or --experiment_name is missing
105Continued job error: previous job has missing state files
106--num_parallel_jobs must be equal to or larger than 1
123Something is wrong the the --generation_strategy
130Interrupt-Signal detected
133Error loading --config_toml, --config_json or --config_yaml
137OOM-Killer on Slurm-Systems
138USR-Signal detected
142Error in Models like THOMPSON or EMPIRICAL_BAYES_THOMPSON. Not sure why
143Slurm-Job was cancelled
146CONT-Signal detected
181Error parsing --parameter. Check output for more details
191Could not create workdir
192Unknown data type (--tests)
193Error in printing logs. You may be on a read only file system or your hard disk is full
199This happens on unstable file systems when trying to write a file
203Unsupported --model
206Invalid orchestrator file
210Unknown orchestrator mode
211Git checkout failed (--checkout_to_latest_tested_version)
233No random steps set
242Error at fetching new trials
243Job was not found in squeue anymore, it may got cancelled before it ran
244get_executor() failed. See logs for more details.
245python3 is not installed
246A path that should have been a file is actually a folder. Check output for more details.
247Trying to continue a job which was started with --generation_strategy. This is currently not possible.
255sbatch error

How to script OmniOpt2 with exit codes

This example runs OmniOpt2 and, depending on the exit-code, does something else.

#!/bin/bash
./omniopt \
	--partition=alpha \
	--experiment_name=my_experiment \
	--mem_gb=1 \
	--time=60 \
	--worker_timeout=30 \
	--max_eval=500 \
	--num_parallel_jobs=20 \
	--gpus=0 \
	--num_random_steps=20 \
	--follow \
	--show_sixel_graphics \
	--run_program=$(echo -n "bash /path/to/my_experiment/run.sh --epochs=%(epochs) --learning_rate=%(learning_rate) --layers=%(layers)" | base64 -w 0) \
	--cpus_per_task=1 \
	--send_anonymized_usage_stats \
	--model=BOTORCH_MODULAR \
	--parameter learning_rate range 0 0.5 float \
	--parameter epochs choice 1,10,20,30,100 \
	--parameter layers fixed 10

exit_code=$? # Special bash variable

if [[ $exit_code -eq 0 ]]; then
    ./omniopt --continue runs/my_experiment/0 # Run again with the same parameters, but load previous data
elif [[ $exit_code -eq 87 ]]; then # 87 = Search space exhausted
    echo "The search space was exhausted. Trying further will not find new points."
    # OmniOpt2 call for expanded search space here
fi