quantbullet.research.jump_model#

Module for statistical jump models

Module Contents#

Classes#

DiscreteJumpModel

Statistical Jump Model with Discrete States

ContinuousJumpModel

Continuous Jump Model with Soft State Assignments

FeatureGenerator

Enrich univaraite time series with features

SimulationGenerator

Generate simulated returns that follows a Hidden Markov process.

TestingUtils

Parameters and plotting functions for testing

Attributes#

logger

quantbullet.research.jump_model.logger#
class quantbullet.research.jump_model.DiscreteJumpModel[source]#

Statistical Jump Model with Discrete States

fixed_states_optimize(y, s, k=2)[source]#

Optimize the parameters of a discrete jump model with states fixed first.

Parameters:
  • y (np.ndarray) – Observed data of shape (T x n_features).

  • s (np.ndarray) – State sequence of shape (T x 1).

  • theta_guess (np.ndarray) – Initial guess for theta of shape (k x n_features).

  • k (int) – Number of states.

Returns:

  • np.ndarray: Optimized parameters of shape (k x n_features).

  • float: Optimal value of the objective function.

Return type:

tuple

generate_loss_matrix(y, theta)[source]#

Generate the loss matrix for a discrete jump model for fixed theta

Parameters:
  • y (np.ndarray) – observed data (T x n_features)

  • theta (np.ndarray) – parameters (k x n_features)

  • k (int) – number of states

Returns:

loss matrix (T x k)

Return type:

loss (np.ndarray)

fixed_theta_optimize(lossMatrix, lambda_)[source]#

Optimize the state sequence of a discrete jump model with fixed parameters

Parameters:
  • lossMatrix (np.ndarray) – loss matrix (T x k)

  • lambda (float) – regularization parameter

Returns:

optimal state sequence (T,) v (float): optimal value of the objective function

Return type:

s (np.ndarray)

initialize_kmeans_plusplus(data, k)[source]#

Initialize the centroids using the k-means++ method.

Parameters:
  • data – ndarray of shape (n_samples, n_features)

  • k – number of clusters

Returns:

ndarray of shape (k, n_features)

Return type:

centroids

classify_data_to_states(data, centroids)[source]#

Classify data points to the states based on the centroids.

Parameters:
  • data – ndarray of shape (n_samples, n_features)

  • centroids – centroids or means of the states, ndarray of shape (k, n_features)

Returns:

ndarray of shape (n_samples,), indices of the states to which each data point is assigned

Return type:

state_assignments

infer_states_stats(ts_returns, states)[source]#

Compute the mean and standard deviation of returns for each state

Parameters:
  • ts_returns (np.ndarray) – observed returns (T x 1)

  • states (np.ndarray) – state sequence (T x 1)

Returns:

mean and standard deviation of returns for each state

Return type:

state_features (dict)

remapResults(optimized_s, optimized_theta, ts_returns)[source]#

Remap the results of the optimization.

We would like the states to be in increasing order of the volatility of returns. This is because vol has smaller variance than returns, a warning is triggered if the states identified by volatility and returns are different.

cleanResults(raw_result, ts_returns, rearrange=False)[source]#

Clean the results of the optimization.

This extracts the best results from the ten trials based on the loss.

single_run(y, k, lambda_)[source]#

Run a single trial of the optimization. Each trial uses a different initialization of the centroids.

Parameters:
  • y (np.ndarray) – observed data (T x n_features)

  • k (int) – number of states

  • lambda (float) – regularization parameter

Returns:

optimal state sequence (T x 1) loss (float): optimal value of the objective function cur_theta (np.ndarray): optimal parameters (k x n_features)

Return type:

cur_s (np.ndarray)

fit(y, k=2, lambda_=100, rearrange=False, n_trials=10)[source]#

fit discrete jump model

Note

A multiprocessing implementation is used to speed up the optimization Ten trials with k means++ initialization are ran

Parameters:
  • y (np.ndarray) – observed data (T x n_features)

  • k (int) – number of states

  • lambda (float) – regularization parameter

  • rearrange (bool) – whether to rearrange the states in increasing order of volatility

Returns:

optimal state sequence (T x 1) best_loss (float): optimal value of the objective function best_theta (np.ndarray): optimal parameters (k x n_features) optimized_s (list): state sequences from all trials (10 x T) optimized_loss (list): objective function values from all trials (10 x 1) optimized_theta (list): parameters from all trials (10 x k x n_features)

Return type:

best_s (np.ndarray)

evaluate(true, pred, plot=False)[source]#

Evaluate the model using balanced accuracy score

Parameters:
  • true (np.ndarray) – true state sequence (T x 1)

  • pred (np.ndarray) – predicted state sequence (T x 1)

  • plot (bool) – whether to plot the true and predicted state sequences

Returns:

evaluation results

Return type:

res (dict)

class quantbullet.research.jump_model.ContinuousJumpModel[source]#

Bases: DiscreteJumpModel

Continuous Jump Model with Soft State Assignments

fixed_states_optimize(y, s, k=None)[source]#

Optimize theta given fixed states

Parameters:
  • y – (T, n_features) array of observations

  • s – (T, k) array of state assignments

Returns:

(k, n_features) array of optimal parameters

Return type:

theta

Note

s is assumed to have each row sum to 1

generate_loss_matrix(y, theta)[source]#

Identical to the loss function in the discrete case

generate_C(k, grid_size=0.05)[source]#

Uniformly sample of states distributed on a grid

Parameters:

k (int) – number of states

Returns:

K x N matrix of states

Return type:

matrix (np.ndarray)

fixed_theta_optimize(lossMatrix, lambda_, C)[source]#

Optimize the state sequence of a continuous jump model with fixed parameters

Parameters:
  • lossMatrix (np.ndarray) – loss matrix (T x K)

  • C (np.ndarray) – K x N matrix of states

  • lambda (float) – regularization parameter

Returns:

optimal state sequence with probability dist (T x K) v_hat (float): loss value

Return type:

s_hat (np.ndarray)

fit(y, k=2, lambda_=100, rearrange=False, n_trials=10, max_iter=20)[source]#

fit discrete jump model

Note

A multiprocessing implementation is used to speed up the optimization Ten trials with k means++ initialization are ran

Parameters:
  • y (np.ndarray) – observed data (T x n_features)

  • k (int) – number of states

  • lambda (float) – regularization parameter

  • rearrange (bool) – whether to rearrange the states in increasing order of volatility

Returns:

optimal state sequence (T x 1) best_loss (float): optimal value of the objective function best_theta (np.ndarray): optimal parameters (k x n_features) optimized_s (list): state sequences from all trials (10 x T) optimized_loss (list): objective function values from all trials (10 x 1) optimized_theta (list): parameters from all trials (10 x k x n_features)

Return type:

best_s (np.ndarray)

class quantbullet.research.jump_model.FeatureGenerator[source]#

Enrich univaraite time series with features

enrich_features(time_series)[source]#

Enrich a single time series with features

Parameters:

time_series (np.ndarray) – time series (T x 1)

Returns:

features (T x n_features)

Return type:

features (np.ndarray)

standarize_features(X)[source]#

Standarize features using sklearn’s StandardScaler

class quantbullet.research.jump_model.SimulationGenerator[source]#

Generate simulated returns that follows a Hidden Markov process.

stationary_distribution(transition_matrix)[source]#

Computes the stationary distribution for a given Markov transition matrix.

Parameters:

transition_matrix (numpy array) – The Markov transition matrix.

Returns:

The stationary distribution.

Return type:

numpy array

simulate_markov(transition_matrix, initial_distribution, steps)[source]#

Simulates a Markov process.

Parameters:
  • transition_matrix (numpy array) – The Markov transition matrix.

  • initial_distribution (numpy array) – The initial state distribution.

  • steps (int) – The number of steps to simulate.

Returns:

The states at each step.

Return type:

states (list)

generate_conditional_data(states, parameters)[source]#

Generate data using normal distribution conditional on the states.

Parameters:
  • states (list) – The list of states

  • parameters (dict) – Parameters for each state with means and standard deviations

Returns:

Simulated data conditional on the states.

Return type:

data (list)

run(steps, transition_matrix, norm_params)[source]#

Run the simulation, return the simulated states and conditional data

Note

States are forced to cover all states, if not, re-run the simulation

Parameters:
  • steps (int) – number of steps to simulate

  • transition_matrix (np.ndarray) – transition matrix (k x k)

  • norm_params (dict) – parameters for the normal distribution for each state

Returns:

simulated states simulated_data (list): simulated data conditional on states

Return type:

simulated_states (list)

class quantbullet.research.jump_model.TestingUtils[source]#

Parameters and plotting functions for testing

daily()[source]#

Parameters for simulated daily return data, sourced from the paper

plot_returns(returns, shade_list=None)[source]#

Plot both the cumulative returns and returns on separate subplots sharing the x-axis.

Parameters:

returns (np.ndarray) – An array of returns.

plot_state_probs(states, prices)[source]#

plot the state probabilities and stock prices on the same plot :param states: An n x k array of state probabilities. :type states: np.ndarray :param prices: A series of prices, indexed by date. :type prices: pd.DataFrame

plot_averages(data_dict)[source]#

Plot the average of numbers for each key in the dictionary using a line plot with x-axis labels in the form of 10^x.

Parameters:

data_dict (dict) – A dictionary where keys are labels and values are lists of numbers.