Settings#
The settings of the model are defined in the settings.py
script.
The table below summarizes available settings.
Setting |
Possible values |
Default value |
Description |
---|---|---|---|
AGGREGATE |
|
|
Flag indicating whether results should be aggregated. |
GROUP_BY_COLUMN |
a string |
|
The column in the ‘main’ model point set to group aggregated results. |
ID_COLUMN |
a string |
|
The column in the ‘main’ model point set containing identifiers of the model points. |
MULTIPROCESSING |
|
|
Flag indicating whether multiple CPUs should be used for calculations. |
OUTPUT_COLUMNS |
empty list or list of strings |
|
List of variables to be included in the output. If the list is empty, all variables are included. |
SAVE_DIAGNOSTIC |
|
|
Flag indicating whether a diagnostic file should be created. |
SAVE_LOG |
|
|
Flag indicating whether a log file should be created. |
SAVE_OUTPUT |
|
|
Flag indicating whether output file should be created. |
T_MAX_CALCULATION |
integer |
|
The maximal month for calculation. |
T_MAX_OUTPUT |
integer |
|
The maximal month for output file. |
AGGREGATE#
The AGGREGATE
setting is a flag if the results should be aggregated for model points.
If the setting is set to False
, the results will be on the individual level:
t,fund_value
0,15000.0
1,15030.0
2,15060.06
3,15090.18
0,3000.0
1,3006.0
2,3012.01
3,3018.03
0,9000.0
1,9018.0
2,9036.04
3,9054.11
There are results for 3 separate model points.
If the AGGREGATE setting is set to True
, the results will be aggregated:
t,fund_value
0,27000.0
1,27054.0
2,27108.11
3,27162.32
There is only one set of results which is the sum of all results.
GROUP_BY_COLUMN#
The GROUP_BY_COLUMN
setting is used to specify the column for grouping the aggregated results.
By default, this setting is configured as None
, which means that results are aggregated for all model points without grouping.
When you specify a column from the ‘main’ model point set that defines groups, the results will be grouped based on the values in this attribute.
For instance, if you want to group the results by the product_code
, you can set the GROUP_BY_COLUMN
in your configuration file, settings.py
, as follows:
settings = {
...
"GROUP_BY_COLUMN": "product_code",
...
}
Ensure that there is a corresponding column in your ‘main’ model point set, as shown in input.py
:
main = ModelPointSet(data=pd.DataFrame({
"id": [1, 2, 3],
"product_code": ["A", "B", "A"]
}))
The resulting output will contain aggregated results grouped by the specified column, as demonstrated in the following CSV output:
t,product_code,fund_value
0,A,24000
1,A,24048
2,A,24096.1
3,A,24144.29
0,B,3000
1,B,3006
2,B,3012.01
3,B,3018.03
By setting the GROUP_BY_COLUMN
appropriately, you can conveniently aggregate and group your results according to your specific needs.
ID_COLUMN#
Each model point must have a column with a key column used for identification. This column is also used to connect records in case of multiple model point.
By default, the column must be named id
.
The value can be changed using the ID_COLUMN
setting.
Warning
Column names are case-sensitive. id
is something else than ID
.
The default value for the ID_COLUMN
setting is id
.
settings = {
...
"ID_COLUMN": "id",
...
}
The model point must have a column with this name.
from cashflower import ModelPointSet
main = ModelPointSet(data=pd.DataFrame({"id": [1, 2]}))
The key column might have other name.
settings = {
...
"ID_COLUMN": "policy_number",
...
}
The model point must have a column with this name.
from cashflower import ModelPointSet
main = ModelPointSet(data=pd.DataFrame({"policy_number": [1, 2]}))
MULTIPROCESSING#
By default, the model is evaluated for each model point one after another in a linear process. If the computer has multiple cores, it’s possible to perform calculations in parallel.
If MULTIPROCESSING
is turned on, the model will split all model points into several parts (as many as the number of cores).
It will calculate them in parallel on separate cores and then merge together into a single output.
Thanks to that, the runtime will be decreased. The more cores, the faster calculation.
It is recommended to use MULTIPROCESSING
when the model is stable because the log message are more vague.
For the development phase, it is recommended to use single core.
OUTPUT_COLUMNS#
By default, the model outputs all variables. If you do not need all of them, provide the list of variables that should be in the output.
The default value of the OUTPUT_COLUMNS
setting is the empty list ([]
).
All variables are saved in the output.
settings = {
...
"OUTPUT_COLUMNS": [],
...
}
If the model has 3 variables, all of them will be in the output.
from cashflower import variable
@variable(a)
def a(t):
return 1*t
@variable(b)
def b(t):
return 2*t
@variable(c)
def c(t):
return 3*t
The result contains all columns.
t,a,b,c
0,0,0,0
1,1,2,3
2,2,4,6
3,3,6,9
0,0,0,0
1,1,2,3
2,2,4,6
3,3,6,9
The user can choose a subset of columns.
settings = {
...
"OUTPUT_COLUMNS": ["a", "c"],
...
}
Only the chosen columns are in the output.
t,a,c
0,0,0
1,1,3
2,2,6
3,3,9
0,0,0
1,1,3
2,2,6
3,3,9
SAVE_DIAGNOSTIC#
The SAVE_DIAGNOSTIC
setting is a boolean flag that determines whether the model should save diagnostic information.
By default, the setting is set to True
.
settings = {
...
"SAVE_DIAGNOSTIC": True,
...
}
When the SAVE_DIAGNOSTIC
setting is set to True
, the model saves a file named <timestamp>_diagnostic.csv
in the output folder:
.
└── output/
└── <timestamp>_diagnostic.csv
If you set SAVE_DIAGNOSTIC
to False
, the diagnostic file will not be created.
The diagnostic file contains various pieces of information about the model’s variables, such as:
variable,calc_order,cycle,calc_direction,type,runtime
a,1,False,irrelevant,default,5.4
c,2,False,backward,constant,2.7
b,3,False,forward,array,7.1
This file can be valuable for gaining insights into the model’s behavior, identifying variables that require the most processing time, and optimizing them for better performance.
Using the diagnostic file is helpful for understanding and improving the model’s performance.
SAVE_LOG#
The SAVE_LOG
setting is a boolean flag that controls whether the model should save its log to a file.
By default, the setting is set to True
.
settings = {
...
"SAVE_LOG": True,
...
}
When SAVE_LOG
is set to True
, the model will save a file named <timestamp>_log.txt
in the output folder:
.
└── output/
└── <timestamp>_log.txt
If you change the SAVE_LOG
setting to False
, no log file will be created.
The log file contains saved log messages that are printed to the console during the model’s execution. It provides a record of key events and settings, which can be valuable for troubleshooting and tracking the model’s behavior.
Here is an example of the content of the log file (<timestamp>_log.txt
):
09:40:49 | Building model 'example'
09:40:49 | Timestamp: 20230920_094049
09:40:49 | Settings:
AGGREGATE: True
MULTIPROCESSING: False
OUTPUT_COLUMNS: []
ID_COLUMN: id
SAVE_DIAGNOSTIC: True
SAVE_LOG: True
SAVE_OUTPUT: True
T_MAX_CALCULATION: 720
T_MAX_OUTPUT: 720
09:40:49 | Reading model components
09:40:49 | Total number of model points: 1
09:40:49 | Preparing output
09:40:49 | Finished
09:40:49 | Saving output file:
output/20230920_094049_output.csv
09:40:49 | Saving diagnostic file:
output/20230920_094049_diagnostic.csv
09:40:49 | Saving log file:
output/20230920_094049_log.txt
The log file is a valuable resource for understanding the model’s execution flow and can be particularly useful for diagnosing issues or reviewing the model’s behavior at a later time.
SAVE_OUTPUT#
The SAVE_OUTPUT
setting is a boolean flag that determines whether the model should save its results to a file.
By default, the setting is set to True
.
settings = {
...
"SAVE_OUTPUT": True,
...
}
When SAVE_OUTPUT
is set to True
, the model will save a file named <timestamp>_output.csv
in the output folder:
.
└── output/
└── <timestamp>_output.csv
If you change the SAVE_OUTPUT
setting to False
, no output file will be created.
You can use this setting to customize output file creation or perform other actions with the results, such as saving them to a database.
To create custom output files, you can utilize the output
variable in the run.py
script.
if __name__ == "__main__":
output = run(settings, sys.argv)
output.to_csv(f"results/my_awesome_results.csv")
The output variable contains a data frame with the results. In the example above, it will create a CSV file named
my_awesome_results.csv
in the results
folder:
.
└── results/
└── my_awesome_results.csv
You can leverage this feature to tailor the output to your specific needs or further process the results as required.
T_MAX_CALCULATION#
The T_MAX_CALCULATION
is the maximal month of the calculation.
The model will calculate results for all time periods from 0
to T_MAX_CALCULATION
.
By default, the setting is set to 720
months (60
years).
T_MAX_OUTPUT#
The T_MAX_OUTPUT
is the maximal month in the output file.
By default, the model will save results for 720
months.
settings = {
...
"T_MAX_OUTPUT": 720,
...
}
If the setting gets changed, then the number of rows in the output file will change.
settings = {
...
"T_MAX_OUTPUT": 3,
...
}
The file saves only results for the first 3 months.
t,fund_value
0,27000.0
1,27054.0
2,27108.11
3,27162.32
T_MAX_OUTPUT
can’t be greater than T_MAX_CALCULATION
.
Warning
Model will set T_MAX_OUTPUT
to min(T_MAX_OUTPUT, T_MAX_CALCULATION)
.