Calibration

GEB has a prebuilt calibration function. It can be used for both one- and multi-objective optimization. To run the calibration follow the steps below. Use the command below, setting the -wd command to the folder that includes your model.yml file. See also the .sh script for Linux cluster runs below.

Determine which objectives/factors you want to calibrate. Add a function which calculates the score to calibrate.py and conditionally add the function to the code in calibrate.py below. Lastly, activate the function under calibration_targets in the model.yml file. For the rest of the steps, we will assume calibration on discharge data.
```
for score in config['calibration']['calibration_targets']:
    if score == 'KGE_discharge':
        scores.append(get_KGE_discharge(run_directory, individual, config, gauges, observed_streamflow))
```
Determine the location of the gauge(s) for which you have observed discharge data. Make sure the coordinates match with the location of the river in the model. Do this by checking the upstream_area file of your model (input/routing/upstream_area) and checking the location of the gauge(s). The location of the gauge should be in the river. The observed discharge data should be stored as a .csv file where the file name is the coordinates without a comma separation (in this example: “75.8477777 17.4130555.csv”). The first column should be “date” and the second “flow”, i.e.:
```
date, flow
2001-07-21,184.8000031
```
Make sure to check which symbol is used as the seperator in the csv. file. In the calibration script, the comma (,) is assumed to be the seperator in the .csv file.
Store this file in the location below:
```
original_data/calibration/streamflow/75.8477777 17.4130555.csv
```
If this folder does not yet exist, create the folder manually and (if needed) change the path in the model.yml file.
Next, add this location under gauges in the model.yml file and under target_variables and report_hydrology. See the example of the model.yml file below.
For multiple observed locations of observed data the process is similar, just repeat step 2 to 4.
Add the parameters you want calibrated under parameters in the model.yml file. First set a min and max. After, set the variable name, which refers to the location in the overall model.yml file, e.g.:
```
expenditure_cap:
    variable: agent_settings.expected_utility.decisions.expenditure_cap
    min: 0.05
    max: 0.4
```
would setup calibration for x between 0.05 and 0.4 for the following variable:
```
agent_settings:
    expected_utility:
        decisions:
            expenditure cap: x
```
Next, set the DEAP parameters. ngen specifies the number of total generations, mu the initial population size, lambda the number of children to produce at each generation, and select_best controls the number of best individuals selected from the population for the next generation.

Lastly, run the model with the command below, or use a bash script to run it on the cluster.

geb calibrate -wd location/of/your_model.yml/folder

# An example of a Linux bash script calling the calibration. Note: the cpus-per-task command sets the number of parallel runs if multiprocessing is enabled.

#!/bin/bash
#SBATCH --job-name=
#SBATCH --output=logs/calibrate-%j.out
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=30
#SBATCH --mem=120G
#SBATCH --time=600:00:00
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=

echo $1

source ~/.bashrc

SCRIPT_DIR="$HOME/GEB/GEB_models/"
cd $SCRIPT_DIR

conda activate geeb  # activate conda environment

cd models/
geb calibrate -wd $1/base

echo "done"

# Running the bash script with your study area of choice (here the Bhima Basin):

sbatch "path_to/script/bash.sh" bhima

# Example model.yml with settings for the calibration

gauges:
    - [75.8477777, 17.41305556]
calibration:
  pre_spinup_time: 1980-01-01
  spinup_time: 1980-01-01
  start_time: 2001-01-01
  end_time: 2011-12-31
  path: calibration_multi_5
  gpus: 0
  scenario: adaptation
  monthly: false
  calibration_targets:
    KGE_discharge: 1
  DEAP:
    use_multiprocessing: true
    ngen: 10
    mu: 60
    lambda_: 25
    select_best: 10
  target_variables:
    # Variables required to calculate calibration score from cwatm, e.g. discharge at a certain gauge
    report_hydrology:
        75.8477777 17.41305556:
            varname: data.grid.discharge
            function: sample_coord,75.8477777,17.41305556
            format: csv
            save: save
    # Variables required to calculate calibration from GEB, e.g. yield ratio
    report:
        yield_ratio:
            type: farmers
            function: mean
            varname: yearly_yield_ratio[:,1]
            save: save
            format: csv
            frequency:
              every: month
              day: 1
    # The to be calibrated parameters
    parameters:
        soildepth_factor:
            variable: parameters.soildepth_factor
            min: 0.8
            max: 1.8
        preferentialFlowConstant:
            variable: parameters.preferentialFlowConstant
            min: 0.5
            max: 8
        arnoBeta_add:
            variable: parameters.arnoBeta_add
            min: 0.01
            max: 1.0
        factor_interflow:
            variable: parameters.factor_interflow
            min: 0.33
            max: 3.0
        recessionCoeff_factor:
            variable: parameters.recessionCoeff_factor
            min: 0.05
            max: 10
        manningsN:
            variable: parameters.manningsN
            min: 0.1
            max: 10.0
        lakeAFactor:
            variable: parameters.lakeAFactor
            min: 0.333
            max: 5.0
        lakeEvaFactor:
            variable: parameters.lakeEvaFactor
            min: 0.8
            max: 3.0
        max_reservoir_release_factor:
            variable: agent_settings.reservoir_operators.max_reservoir_release_factor
            min: 0.01
            max: 0.05