# Hyperparameter Searches

## The Basics

Hyperparameters control different aspects of the learning process of your model. Optimizing hyperparameters is one way to improve the accuracy of your model. Spell makes it easy to automate hyperparameter searches with the `spell hyper`

command.

## The Long Version

A hyperparameter is a high-level property of a machine learning model that typically governs the training process itself (e.g., learning rate, number of hidden layers in a neural network). Thus, a hyperparameter cannot generally be optimized with a single training of the model. Rather, the same model must be trained numerous times while varying the hyperparameter values to determine optimal values. Spell implements a number of features to help you automate this process.

## Anatomy of a Hyperparameter Command

The `spell hyper`

command kicks off your hyperparameter search. You can choose between a grid search, random search, or bayesian search.

The `spell hyper`

command is very similar to the `spell run`

command and takes all of the same command line options, with the addition of hyperparameter specifications. For more info on the `spell run`

command, see What Is a Run.

Let's take a look at the example command below.

```
$ spell hyper grid -t K80 \
--param rate=0.001,0.01,0.1,1 \
--param layers=2,3,5,10 -- \
"python train.py --learning_rate :rate: --num_layers :layers:"
```

The first part should be familiar. We request a grid search running on `K80`

machines.

Next are two `--param`

options, which list the values that we want our hyperparameter search to test for each specified parameter. Here we specify two parameters: `rate`

and `layers`

, and the values we want for each. The way to specify the values to search is different for the different types of hyperparameter searches. For more details, skip down to grid search, random search, or bayesian search.

Finally, we have our python command: `python train.py --learning_rate :rate: --num_layers :layers:`

The parameters in colon bracket form, `:rate:`

and `:layers:`

, are replaced in individual runs with specific values of the respective parameter.

## Grid Search

In grid search, a set of discrete values are provided for each hyperparameter and a run is created for all possible combinations of hyperparameters (i.e., if there are *n* hyperparameters, a run is created for each resultant *n*-tuple of the Cartesian product of the *n* hyperparameter value sets). For example:

```
$ spell hyper grid \
--param rate=0.001,0.01,0.1,1 \
--param layers=2,3,5,10 -- \
python train.py --learning_rate :rate: --num_layers :layers:
Everything up-to-date
ðŸ’« Casting hyperparameter search #59â€¦
rate layers Run ID
0.001 2 362
0.001 3 363
0.001 5 364
0.001 10 365
0.01 2 366
0.01 3 367
0.01 5 368
0.01 10 369
0.1 2 370
0.1 3 371
0.1 5 372
0.1 10 373
1 2 374
1 3 375
1 5 376
1 10 377
```

Hyperparameters are specified with the `--param NAME=VALUE[,VALUE,VALUE...]`

option flag. `NAME`

corresponds to the name of the hyperparameter. One or more comma separated `VALUE`

s can be provided after the `=`

, corresponding to the values for the hyperparameter. The values can consist of string, integer, or floating point values.

Note

The hyperparameter `NAME`

provided must exist in the run command surrounded by colons. This tells Spell where to substitute specific values for the hyperparameter in the run command when making the individual runs for the hyperparameter search.

## Random Search

In random search, each hyperparameter is randomly sampled to determine specific values for each run. Additionally, the `--num-runs`

option must be specified to indicate the total number of runs to create. Hyperparameters are specified with the `--param`

option flag and the specification can consist of either:

**A set of discrete values**, specified with`--param NAME=VALUE[,VALUE,VALUE...]`

, similar to grid search. In this case one of the discrete values is randomly selected for the hyperparameter value when constituting a run.-
**A range specification**, specified with`--param NAME=MIN:MAX[:SCALING[:TYPE]]`

. In this case the hyperparameter value is randomly selected from the specified range when constituting a run.`MIN`

and`MAX`

are required and correspond to the minimum and maximum value of the range of this hyperparameter.`SCALING`

is optional and can consist of 3 different values (`linear`

is the default if not specified):`linear`

: the hyperparameter range (i.e.,`MIN`

to`MAX`

) is sampled uniformly at random to determine a hyperparameter value.`log`

: the hyperparameter range is scaled logarithmically during the sampling (i.e., the range`log(MIN)`

to`log(MAX)`

is sampled uniformly at random and then exponentiated to yield the hyperparameter value). This results in a higher probability density for the sampling towards the lower end of the range.`reverse_log`

: this is the opposite scaling as that described in`log`

, resulting in a higher probability density for the sampling at the higher end of the range.

`TYPE`

is optional and can consist of 2 different values (`float`

is the default if not specified):`float`

: the resultant hyperparameter value is a floating point number.`int`

: the resultant hyperparameter value is an integer. If this option is specified, the value after randomly sampling is rounded to the nearest integer to yield the final hyperparameter value.

An example random hyperparameter search is as follows:

```
$ spell hyper random \
--num-runs 10 \
--param rate=.001:1.0:log \
--param layers=2:100:linear:int \
--param cell=gru,lstm,rnn -- \
python train.py --learning_rate :rate: --num_layers :layers: --cell_type :cell:
Everything up-to-date
ðŸ’« Casting hyperparameter search #60â€¦
rate layers cell Run ID
0.535637 68 lstm 378
0.192321 21 gru 379
0.501205 34 lstm 380
0.00103308 40 gru 381
0.0976437 49 gru 382
0.0131644 36 rnn 383
0.00139867 27 lstm 384
0.0274699 3 lstm 385
0.350886 9 rnn 386
0.23146 66 lstm 387
```

## Bayesian Search

Bayesian search uses the results of prior runs to try to pick new parameters to test intelligently. It will often either note that a large part of the parameter space is unexplored and pick something in that region or it will observe a prior success and pick something near that. This can help you save on the total number of iterations needed to find good parameters.

For a given objective function (e.g. the accuracy of your model), we treat this as a random function and, using the previously tested parameter samples and the resulting accuracy of your model after training, we create a posterior distribution over that objective function. From that we create an acquisition function which is our best guess of the potential a specific sample has. We then choose the sample which maximizes that acquisition function. There are a number of popular types of acquisition functions; our tool utilizes an upper confidence bound.

If that's a lot to take in, no need to worry: you only need to add a couple things to a random search and we will do the rest for you.

Similar to the random search you must specify one or more parameters via the `--param`

flag. These take the form `--param NAME=MIN:MAX[:TYPE]`

where `MIN`

is the lowest value that parameter is allowed to take `MAX`

is the highest and `TYPE`

is either `int`

or `float`

.

You must also inform the search of the name of the metric you would like to optimize via `--metric`

. To learn more about using metrics with Spell, you can check out the docs on Metrics. In addition, you need to specify how Spell should interpret the values of this metric observed. This is the `--metric-agg`

which can be `min`

, `max`

, `last`

, or `avg`

. For example, if you select the keras metric `keras/val_acc`

and aggregation type `last`

, Spell will use the last validation accuracy recorded in a given run and treat that as the success of your model for those parameters. The model will attempt to maximize this value, so make sure to select a metric and aggregation type appropriately.

Lastly, in addition to `--num-runs`

that you would specify for any hyperparameter search, bayesian search requires you to select the number of `--parallel-runs`

as well. This is the maximum number of trials that Spell will run in parallel. This reflects a tradeoff: if you choose a lower number, the search will proceed incrementally and will take longer to complete. If you choose a higher number, many runs will be in progress when a new run is launched and the new run's parameters will be selected without the benefit of knowing how well the in-progress trials do.

An example bayesian hyperparameter search is as follows:

```
$ spell hyper bayesian \
--num-runs 12 \
--parallel-runs 3 \
--metric keras/val_acc \
--metric-agg avg \
--param rate=.001:1.0 \
--param layers=2:100:int \
python train.py --learning_rate :rate: --num_layers :layers:
Everything up-to-date
ðŸ’« Casting hyperparameter search #61â€¦
rate layers Run ID
0.343882 23 388
0.294112 72 389
0.587557 64 390
```

You can also check out our blog post to see an example of bayesian search in action.

## Viewing Your Hypersearch on the Web

You can view the results of your hyperparameter run on the web.

The web visualization updates in real time, so you can see how each run is performing as they launch.

## Python API

The Spell Python API also supports creating hyperparameter searches. See Spell Python API for more information on the Spell Python API in general, and Hyperparameter Searches for the hyperparameter search functionality.