Learning¶

learning.changepoint_scores(df, feats, target, d1, d2, w_train, w_val, w_test)[source]¶

Given as input a dataframe and a reference interval where a changepoint may lie, trains a regression model in a window before the reference interval, validates the model in a window before the reference interval and tests the model in a window after the reference interval.

Args:: df: The input dataframe. feats: A list of names of columns of df indicating the feature variables. target: The name of a column of df indicating the dependent variable. d1: The first date in the reference interval. d2: The last date in the reference interval. w_train: The number of days defining the training set. w_val: The number of days defining the validation set. w_test: The number of days defining the test set.
Returns:: y_pred_train: The array of predicted values in the training set. score_train: An array containing scores for the training set: the coefficient of determination “R squared”, the mean absolute error, the mean error, the mean absolute percentage error, the mean percentage error. y_pred_val: The array of predicted values in the validation set. score_val: An array containing scores for the validation set: the coefficient of determination “R squared”, the mean absolute error, the mean error, the mean absolute percentage error, the mean percentage error. y_pred_test: The array of predicted values in the test set. score_test: An array containing scores for the test set: the coefficient of determination “R squared”, the mean absolute error, the mean error, the mean absolute percentage error, the mean percentage error.

learning.fit_linear_model(df, feats, target, a=0.0001, deg=3)[source]¶

Fits a regression model on a given dataframe, and returns the model, the predicted values and the associated scores. Applies Ridge Regression with polynomial features.

Args:: df: The input dataframe. feats: List of names of columns of df. These are the feature variables. target: The name of a column of df corresponding to the dependent variable. a: A positive float. Regularization strength parameter for the linear least squares function (the loss function) where regularization is given by the l2-norm. deg: The degree of the regression polynomial.
Returns:: pipeline: The regression model. This is an instance of Pipeline. y_pred: An array with the predicted values. r_sq: The coefficient of determination “R squared”. mae: The mean absolute error. me: The mean error. mape: The mean absolute percentage error. mpe: The mean percentage error.

learning.get_line_and_slope(values)[source]¶

Fits a line on the 2-dimensional graph of a regular time series, defined by a sequence of real values.

Args:: values: A list of real values.
Returns:: line: The list of values as predicted by the linear model. slope: Slope of the line. intercept: Intercept of the line.

learning.predict(df_test, model, feats, target)[source]¶

Applies a regression model to predict values of a dependent variable for a given dataframe and given features.

Args:: df_test: The input dataframe. model: The regression model. Instance of Pipeline. feats: List of strings: each string is the name of a column of df_test. target: The name of the column of df corresponding to the dependent variable.
Returns:: y_pred: Array of predicted values.

learning.predict_on_sliding_windows(df, win_size, step, model, feats, target)[source]¶

Given a regression model, predicts values on a sliding window in a dataframe and outputs scores, a list of predictions and a list of windows.

Args:: df: The input dataframe. win_size: The size of the sliding window, as a number of days. step: The sliding step. model: The regression model. feats: A list of names of columns of df indicating the feature variables. target: The name of a column of df indicating the dependent variable.
Returns:: scores: An array of arrays of scores: one array for each window containing the coefficient of determination “R squared”, the mean absolute error, the mean error, the mean absolute percentage error, the mean percentage error. preds_test: a list of predictions: one list of predicted values for each window. windows: A list of starting/ending dates: one for each window.

learning.train_on_reference_points(df, w_train, ref_points, feats, target, random_state=0)[source]¶

Trains a regression model on a training set defined by segments of a dataframe. These segments are defined by a set of starting points and a parameter indicating their duration. In each segment, one subset of points is randomly chosen as the training set and the remaining points define the validation set.

Args:: df: Input dataframe. w_train: The duration, given as a number of days, of the segments where the model is trained. ref_points: A list containing the starting date of each segment where the model is trained. feats: A list of names of columns of df corresponding to the feature variables. target: A name of a column of df corresponding to the dependent variable. random_state: Seed for a random number generator, which is used in randomly selecting the validation set among the points in a fixed segment.
Returns:: model: The regression model. This is an instance of Pipeline. training_scores: An array containing scores for the training set. It contains the coefficient of determination “R squared”, the mean absolute error, the mean error, the mean absolute percentage error. validation_scores: An array containing scores for the validation set. It contains the coefficient of determination “R squared”, the mean absolute error, the mean error, the mean absolute percentage error.

Learning¶

Previous topic

Next topic

This Page