Learning¶
- learning.changepoint_scores(df, feats, target, d1, d2, w_train, w_val, w_test)[source]¶
Given as input a dataframe and a reference interval where a changepoint may lie, trains a regression model in a window before the reference interval, validates the model in a window before the reference interval and tests the model in a window after the reference interval.
- Args:
df: The input dataframe. feats: A list of names of columns of df indicating the feature variables. target: The name of a column of df indicating the dependent variable. d1: The first date in the reference interval. d2: The last date in the reference interval. w_train: The number of days defining the training set. w_val: The number of days defining the validation set. w_test: The number of days defining the test set.
- Returns:
y_pred_train: The array of predicted values in the training set. score_train: An array containing scores for the training set: the coefficient of determination “R squared”, the mean absolute error, the mean error, the mean absolute percentage error, the mean percentage error. y_pred_val: The array of predicted values in the validation set. score_val: An array containing scores for the validation set: the coefficient of determination “R squared”, the mean absolute error, the mean error, the mean absolute percentage error, the mean percentage error. y_pred_test: The array of predicted values in the test set. score_test: An array containing scores for the test set: the coefficient of determination “R squared”, the mean absolute error, the mean error, the mean absolute percentage error, the mean percentage error.
- learning.fit_linear_model(df, feats, target, a=0.0001, deg=3)[source]¶
Fits a regression model on a given dataframe, and returns the model, the predicted values and the associated scores. Applies Ridge Regression with polynomial features.
- Args:
df: The input dataframe. feats: List of names of columns of df. These are the feature variables. target: The name of a column of df corresponding to the dependent variable. a: A positive float. Regularization strength parameter for the linear least squares function (the loss function) where regularization is given by the l2-norm. deg: The degree of the regression polynomial.
- Returns:
pipeline: The regression model. This is an instance of Pipeline. y_pred: An array with the predicted values. r_sq: The coefficient of determination “R squared”. mae: The mean absolute error. me: The mean error. mape: The mean absolute percentage error. mpe: The mean percentage error.
- learning.get_line_and_slope(values)[source]¶
Fits a line on the 2-dimensional graph of a regular time series, defined by a sequence of real values.
- Args:
values: A list of real values.
- Returns:
line: The list of values as predicted by the linear model. slope: Slope of the line. intercept: Intercept of the line.
- learning.predict(df_test, model, feats, target)[source]¶
Applies a regression model to predict values of a dependent variable for a given dataframe and given features.
- Args:
df_test: The input dataframe. model: The regression model. Instance of Pipeline. feats: List of strings: each string is the name of a column of df_test. target: The name of the column of df corresponding to the dependent variable.
- Returns:
y_pred: Array of predicted values.
- learning.predict_on_sliding_windows(df, win_size, step, model, feats, target)[source]¶
Given a regression model, predicts values on a sliding window in a dataframe and outputs scores, a list of predictions and a list of windows.
- Args:
df: The input dataframe. win_size: The size of the sliding window, as a number of days. step: The sliding step. model: The regression model. feats: A list of names of columns of df indicating the feature variables. target: The name of a column of df indicating the dependent variable.
- Returns:
scores: An array of arrays of scores: one array for each window containing the coefficient of determination “R squared”, the mean absolute error, the mean error, the mean absolute percentage error, the mean percentage error. preds_test: a list of predictions: one list of predicted values for each window. windows: A list of starting/ending dates: one for each window.
- learning.train_on_reference_points(df, w_train, ref_points, feats, target, random_state=0)[source]¶
Trains a regression model on a training set defined by segments of a dataframe. These segments are defined by a set of starting points and a parameter indicating their duration. In each segment, one subset of points is randomly chosen as the training set and the remaining points define the validation set.
- Args:
df: Input dataframe. w_train: The duration, given as a number of days, of the segments where the model is trained. ref_points: A list containing the starting date of each segment where the model is trained. feats: A list of names of columns of df corresponding to the feature variables. target: A name of a column of df corresponding to the dependent variable. random_state: Seed for a random number generator, which is used in randomly selecting the validation set among the points in a fixed segment.
- Returns:
model: The regression model. This is an instance of Pipeline. training_scores: An array containing scores for the training set. It contains the coefficient of determination “R squared”, the mean absolute error, the mean error, the mean absolute percentage error. validation_scores: An array containing scores for the validation set. It contains the coefficient of determination “R squared”, the mean absolute error, the mean error, the mean absolute percentage error.