Settings#

The settings of the model are defined in the settings.py script.

The table below summarizes available settings.

Setting	Possible values	Default value	Description
AGGREGATE	`True` / `False`	`True`	Flag indicating whether results should be aggregated.
GROUP_BY_COLUMN	a string	`None`	The column in the ‘main’ model point set to group aggregated results.
ID_COLUMN	a string	`id`	The column in the ‘main’ model point set containing identifiers of the model points.
MULTIPROCESSING	`True` / `False`	`False`	Flag indicating whether multiple CPUs should be used for calculations.
OUTPUT_COLUMNS	empty list or list of strings	`[]`	List of variables to be included in the output. If the list is empty, all variables are included.
SAVE_DIAGNOSTIC	`True` / `False`	`True`	Flag indicating whether a diagnostic file should be created.
SAVE_LOG	`True` / `False`	`True`	Flag indicating whether a log file should be created.
SAVE_OUTPUT	`True` / `False`	`True`	Flag indicating whether output file should be created.
T_MAX_CALCULATION	integer	`720`	The maximal month for calculation.
T_MAX_OUTPUT	integer	`720`	The maximal month for output file.

AGGREGATE#

The AGGREGATE setting is a flag if the results should be aggregated for model points.

If the setting is set to False, the results will be on the individual level:

<timestamp>_output.csv#

t,fund_value
0,15000.0
1,15030.0
2,15060.06
3,15090.18
0,3000.0
1,3006.0
2,3012.01
3,3018.03
0,9000.0
1,9018.0
2,9036.04
3,9054.11

There are results for 3 separate model points.

If the AGGREGATE setting is set to True, the results will be aggregated:

<timestamp>_output.csv#

t,fund_value
0,27000.0
1,27054.0
2,27108.11
3,27162.32

There is only one set of results which is the sum of all results.

GROUP_BY_COLUMN#

The GROUP_BY_COLUMN setting is used to specify the column for grouping the aggregated results. By default, this setting is configured as None, which means that results are aggregated for all model points without grouping.

When you specify a column from the ‘main’ model point set that defines groups, the results will be grouped based on the values in this attribute.

For instance, if you want to group the results by the product_code, you can set the GROUP_BY_COLUMN in your configuration file, settings.py, as follows:

settings.py#

settings = {
    ...
    "GROUP_BY_COLUMN": "product_code",
    ...
}

Ensure that there is a corresponding column in your ‘main’ model point set, as shown in input.py:

input.py#

main = ModelPointSet(data=pd.DataFrame({
    "id": [1, 2, 3],
    "product_code": ["A", "B", "A"]
}))

The resulting output will contain aggregated results grouped by the specified column, as demonstrated in the following CSV output:

<timestamp>_output.csv#

t,product_code,fund_value
0,A,24000
1,A,24048
2,A,24096.1
3,A,24144.29
0,B,3000
1,B,3006
2,B,3012.01
3,B,3018.03

By setting the GROUP_BY_COLUMN appropriately, you can conveniently aggregate and group your results according to your specific needs.

ID_COLUMN#

Each model point must have a column with a key column used for identification. This column is also used to connect records in case of multiple model point.

By default, the column must be named id. The value can be changed using the ID_COLUMN setting.

Warning

Column names are case-sensitive. id is something else than ID.

The default value for the ID_COLUMN setting is id.

settings.py#

settings = {
    ...
    "ID_COLUMN": "id",
    ...
}

The model point must have a column with this name.

input.py#

from cashflower import ModelPointSet

main = ModelPointSet(data=pd.DataFrame({"id": [1, 2]}))

The key column might have other name.

settings.py#

settings = {
    ...
    "ID_COLUMN": "policy_number",
    ...
}

The model point must have a column with this name.

input.py#

from cashflower import ModelPointSet

main = ModelPointSet(data=pd.DataFrame({"policy_number": [1, 2]}))

MULTIPROCESSING#

By default, the model is evaluated for each model point one after another in a linear process. If the computer has multiple cores, it’s possible to perform calculations in parallel.

If MULTIPROCESSING is turned on, the model will split all model points into several parts (as many as the number of cores). It will calculate them in parallel on separate cores and then merge together into a single output.

Thanks to that, the runtime will be decreased. The more cores, the faster calculation.

It is recommended to use MULTIPROCESSING when the model is stable because the log message are more vague. For the development phase, it is recommended to use single core.

OUTPUT_COLUMNS#

By default, the model outputs all variables. If you do not need all of them, provide the list of variables that should be in the output.

The default value of the OUTPUT_COLUMNS setting is the empty list ([]). All variables are saved in the output.

settings.py#

settings = {
    ...
    "OUTPUT_COLUMNS": [],
    ...
}

If the model has 3 variables, all of them will be in the output.

model.py#

from cashflower import variable

@variable(a)
def a(t):
    return 1*t

@variable(b)
def b(t):
    return 2*t

@variable(c)
def c(t):
    return 3*t

The result contains all columns.

<timestamp>_output.csv#

t,a,b,c
0,0,0,0
1,1,2,3
2,2,4,6
3,3,6,9
0,0,0,0
1,1,2,3
2,2,4,6
3,3,6,9

The user can choose a subset of columns.

settings.py#

settings = {
    ...
    "OUTPUT_COLUMNS": ["a", "c"],
    ...
}

Only the chosen columns are in the output.

<timestamp>_output.csv#

t,a,c
0,0,0
1,1,3
2,2,6
3,3,9
0,0,0
1,1,3
2,2,6
3,3,9

SAVE_DIAGNOSTIC#

The SAVE_DIAGNOSTIC setting is a boolean flag that determines whether the model should save diagnostic information.

By default, the setting is set to True.

settings.py#

settings = {
    ...
    "SAVE_DIAGNOSTIC": True,
    ...
}

When the SAVE_DIAGNOSTIC setting is set to True, the model saves a file named <timestamp>_diagnostic.csv in the output folder:

.
└── output/
    └── <timestamp>_diagnostic.csv

If you set SAVE_DIAGNOSTIC to False, the diagnostic file will not be created.

The diagnostic file contains various pieces of information about the model’s variables, such as:

<timestamp>_diagnostic.csv#

variable,calc_order,cycle,calc_direction,type,runtime
a,1,False,irrelevant,default,5.4
c,2,False,backward,constant,2.7
b,3,False,forward,array,7.1

This file can be valuable for gaining insights into the model’s behavior, identifying variables that require the most processing time, and optimizing them for better performance.

Using the diagnostic file is helpful for understanding and improving the model’s performance.

SAVE_LOG#

The SAVE_LOG setting is a boolean flag that controls whether the model should save its log to a file.

By default, the setting is set to True.

settings.py#

settings = {
    ...
    "SAVE_LOG": True,
    ...
}

When SAVE_LOG is set to True, the model will save a file named <timestamp>_log.txt in the output folder:

.
└── output/
    └── <timestamp>_log.txt

If you change the SAVE_LOG setting to False, no log file will be created.

The log file contains saved log messages that are printed to the console during the model’s execution. It provides a record of key events and settings, which can be valuable for troubleshooting and tracking the model’s behavior.

Here is an example of the content of the log file (<timestamp>_log.txt):

<timestamp>_log.txt#

09:40:49 | Building model 'example'
09:40:49 | Timestamp: 20230920_094049
09:40:49 | Settings:
           AGGREGATE: True
           MULTIPROCESSING: False
           OUTPUT_COLUMNS: []
           ID_COLUMN: id
           SAVE_DIAGNOSTIC: True
           SAVE_LOG: True
           SAVE_OUTPUT: True
           T_MAX_CALCULATION: 720
           T_MAX_OUTPUT: 720
09:40:49 | Reading model components
09:40:49 | Total number of model points: 1
09:40:49 | Preparing output
09:40:49 | Finished
09:40:49 | Saving output file:
           output/20230920_094049_output.csv
09:40:49 | Saving diagnostic file:
           output/20230920_094049_diagnostic.csv
09:40:49 | Saving log file:
           output/20230920_094049_log.txt

The log file is a valuable resource for understanding the model’s execution flow and can be particularly useful for diagnosing issues or reviewing the model’s behavior at a later time.

SAVE_OUTPUT#

The SAVE_OUTPUT setting is a boolean flag that determines whether the model should save its results to a file.

By default, the setting is set to True.

settings.py#

settings = {
    ...
    "SAVE_OUTPUT": True,
    ...
}

When SAVE_OUTPUT is set to True, the model will save a file named <timestamp>_output.csv in the output folder:

.
└── output/
    └── <timestamp>_output.csv

If you change the SAVE_OUTPUT setting to False, no output file will be created.

You can use this setting to customize output file creation or perform other actions with the results, such as saving them to a database.

To create custom output files, you can utilize the output variable in the run.py script.

run.py#

if __name__ == "__main__":
    output = run(settings, sys.argv)
    output.to_csv(f"results/my_awesome_results.csv")

The output variable contains a data frame with the results. In the example above, it will create a CSV file named my_awesome_results.csv in the results folder:

.
└── results/
    └── my_awesome_results.csv

You can leverage this feature to tailor the output to your specific needs or further process the results as required.

T_MAX_CALCULATION#

The T_MAX_CALCULATION is the maximal month of the calculation.

The model will calculate results for all time periods from 0 to T_MAX_CALCULATION.

By default, the setting is set to 720 months (60 years).

T_MAX_OUTPUT#

The T_MAX_OUTPUT is the maximal month in the output file.

By default, the model will save results for 720 months.

settings.py#

settings = {
    ...
    "T_MAX_OUTPUT": 720,
    ...
}

If the setting gets changed, then the number of rows in the output file will change.

settings.py#

settings = {
    ...
    "T_MAX_OUTPUT": 3,
    ...
}

The file saves only results for the first 3 months.

<timestamp>_output.csv#

t,fund_value
0,27000.0
1,27054.0
2,27108.11
3,27162.32

T_MAX_OUTPUT can’t be greater than T_MAX_CALCULATION.

Warning

Model will set T_MAX_OUTPUT to min(T_MAX_OUTPUT, T_MAX_CALCULATION).