Preprocessing

preprocessing.add_noise_to_series(series, noise_max=9e-05)[source]

Add uniform noise to series.

Args:

series: The time series to be added noise. noise_max: The upper limit of the amount of noise that can be added to a time series point

Return:

DataFrame with noise

preprocessing.add_noise_to_series_md(df, noise_max=9e-05)[source]

Add uniform noise to a multidimensional time series that is given as a pandas DataFrame.

Args:

df: The DataFrame that contains the multidimensional time series. noise_max: The upper limit of the amount of noise that can be added to a time series point.

Return:

The DataFrame with noise to all the columns

preprocessing.change_granularity(df, granularity='30s', size=10000000, chunk=True)[source]

Changing the offset of a TimeSeries. We do this procedure by using chunk_interpolate. We divide our TimeSeries into pieces in order to interpolate them.

Args:

df: Date/Time DataFrame. size: The size/chunks we want to divide our /DataFrame according to the global index of the set. The Default price is 10 million. . granularity: The offset user wants to resample the Time Series chunk: If set True, It applies the chunk_interpolation

Return:

The interpolated DataFrame/TimeSeries

preprocessing.chunk_interpolate(df, size=1000000, method='linear', axis=0, limit_direction='both', limit=1)[source]

After Chunker makes the pieces according to index, we Interpolate them with args of pandas.interpolate() and then we Merge them back together. This step is crucial for the complete data interpolation without RAM problems especially in large DataSets.

Args:

df: Date/Time DataFrame or any Given DataFrame. size: The size/chunks we want to divide our /DataFrame according to the global index of the set. The Default price is 10 million.

Return:

The Interpolated DataFrame

preprocessing.chunker(seq, size)[source]

Dividing a file/DataFrame etc into pieces for better hadling of RAM.

Args:

seq: Sequence, Folder, Date/Time DataFrame or any Given DataFrame. size: The size/chunks we want to divide our Seq/Folder/DataFrame.

Return:

The divided groups

preprocessing.enumerate2(start, end, step=1)[source]
Args:

start: starting point end: ending point . step: step of the process

Return:

The interpolated DataFrame/TimeSeries

preprocessing.filter_col(df, col, less_than=None, bigger_than=None)[source]

Remove rows of the dataframe that they are under, over/both from a specific/two different input price/prices.

Args:

df: Date/Time DataFrame. col: The desired column to work on our DataFrame. less_than: Filtering the column dropping values below that price. bigger_than: Filtering the column dropping values above that price.

Return:

The Filtrated TimeSeries/DataFrame

preprocessing.filter_dates(df, start, end)[source]

Remove rows of the dataframe that are not in the [start, end] interval.

Args:

df:DataFrame that has a datetime index. start: Date that signifies the start of the interval. end: Date that signifies the end of the interval.

Returns:

The Filtrared TimeSeries/DataFrame

preprocessing.filter_df(df, filter_dict)[source]

Creates a filtered DataFrame with multiple columns.

Args:

df: Date/Time DataFrame or any Given DataFrame. filter_dict: A dictionary of columns user wants to filter

Return:

Filtered DataFrame

preprocessing.filter_dispersed(df, window, eps)[source]

We are looking at windows of consecutive row and calculate the mean and variance. For each window if the index of disperse or given column is in the given threshhold then the last row will remain in the data frame.

Args:

df: Date/Time DataFrame or any Given DataFrame. window: A small value in order to avoid dividing with Zero. eps: A small value in order to avoid dividing with Zero (See is_stable)

Return: The Filtered DataFrame

preprocessing.is_stable(*args, epsilon)[source]
Args:

epsilon: A small value in order to avoid dividing with Zero.

Return:

A boolean vector from the division of variance with mean of a column.

preprocessing.normalize(df)[source]

This function transforms an input dataframe by rescaling values to the range [0,1].

Args:

df: Date/Time DataFrame or any DataFrame given with a specific column to Normalize.

Return:

Normalized Array

preprocessing.scale_df(df)[source]

Scale each column of a dataframe to the [0, 1] range performing the min max scaling

Args:

df: The DataFrame to be scaled.

Return: Scaled DataFrame