quantbullet.research.jump_model
#
Module for statistical jump models
Module Contents#
Classes#
Statistical Jump Model with Discrete States |
|
Continuous Jump Model with Soft State Assignments |
|
Enrich univaraite time series with features |
|
Generate simulated returns that follows a Hidden Markov process. |
|
Parameters and plotting functions for testing |
Attributes#
- quantbullet.research.jump_model.logger#
- class quantbullet.research.jump_model.DiscreteJumpModel[source]#
Statistical Jump Model with Discrete States
- fixed_states_optimize(y, s, k=2)[source]#
Optimize the parameters of a discrete jump model with states fixed first.
- Parameters:
y (np.ndarray) – Observed data of shape (T x n_features).
s (np.ndarray) – State sequence of shape (T x 1).
theta_guess (np.ndarray) – Initial guess for theta of shape (k x n_features).
k (int) – Number of states.
- Returns:
np.ndarray: Optimized parameters of shape (k x n_features).
float: Optimal value of the objective function.
- Return type:
tuple
- generate_loss_matrix(y, theta)[source]#
Generate the loss matrix for a discrete jump model for fixed theta
- Parameters:
y (np.ndarray) – observed data (T x n_features)
theta (np.ndarray) – parameters (k x n_features)
k (int) – number of states
- Returns:
loss matrix (T x k)
- Return type:
loss (np.ndarray)
- fixed_theta_optimize(lossMatrix, lambda_)[source]#
Optimize the state sequence of a discrete jump model with fixed parameters
- Parameters:
lossMatrix (np.ndarray) – loss matrix (T x k)
lambda (float) – regularization parameter
- Returns:
optimal state sequence (T,) v (float): optimal value of the objective function
- Return type:
s (np.ndarray)
- initialize_kmeans_plusplus(data, k)[source]#
Initialize the centroids using the k-means++ method.
- Parameters:
data – ndarray of shape (n_samples, n_features)
k – number of clusters
- Returns:
ndarray of shape (k, n_features)
- Return type:
centroids
- classify_data_to_states(data, centroids)[source]#
Classify data points to the states based on the centroids.
- Parameters:
data – ndarray of shape (n_samples, n_features)
centroids – centroids or means of the states, ndarray of shape (k, n_features)
- Returns:
ndarray of shape (n_samples,), indices of the states to which each data point is assigned
- Return type:
state_assignments
- infer_states_stats(ts_returns, states)[source]#
Compute the mean and standard deviation of returns for each state
- Parameters:
ts_returns (np.ndarray) – observed returns (T x 1)
states (np.ndarray) – state sequence (T x 1)
- Returns:
mean and standard deviation of returns for each state
- Return type:
state_features (dict)
- remapResults(optimized_s, optimized_theta, ts_returns)[source]#
Remap the results of the optimization.
We would like the states to be in increasing order of the volatility of returns. This is because vol has smaller variance than returns, a warning is triggered if the states identified by volatility and returns are different.
- cleanResults(raw_result, ts_returns, rearrange=False)[source]#
Clean the results of the optimization.
This extracts the best results from the ten trials based on the loss.
- single_run(y, k, lambda_)[source]#
Run a single trial of the optimization. Each trial uses a different initialization of the centroids.
- Parameters:
y (np.ndarray) – observed data (T x n_features)
k (int) – number of states
lambda (float) – regularization parameter
- Returns:
optimal state sequence (T x 1) loss (float): optimal value of the objective function cur_theta (np.ndarray): optimal parameters (k x n_features)
- Return type:
cur_s (np.ndarray)
- fit(y, k=2, lambda_=100, rearrange=False, n_trials=10)[source]#
fit discrete jump model
Note
A multiprocessing implementation is used to speed up the optimization Ten trials with k means++ initialization are ran
- Parameters:
y (np.ndarray) – observed data (T x n_features)
k (int) – number of states
lambda (float) – regularization parameter
rearrange (bool) – whether to rearrange the states in increasing order of volatility
- Returns:
optimal state sequence (T x 1) best_loss (float): optimal value of the objective function best_theta (np.ndarray): optimal parameters (k x n_features) optimized_s (list): state sequences from all trials (10 x T) optimized_loss (list): objective function values from all trials (10 x 1) optimized_theta (list): parameters from all trials (10 x k x n_features)
- Return type:
best_s (np.ndarray)
- evaluate(true, pred, plot=False)[source]#
Evaluate the model using balanced accuracy score
- Parameters:
true (np.ndarray) – true state sequence (T x 1)
pred (np.ndarray) – predicted state sequence (T x 1)
plot (bool) – whether to plot the true and predicted state sequences
- Returns:
evaluation results
- Return type:
res (dict)
- class quantbullet.research.jump_model.ContinuousJumpModel[source]#
Bases:
DiscreteJumpModel
Continuous Jump Model with Soft State Assignments
- fixed_states_optimize(y, s, k=None)[source]#
Optimize theta given fixed states
- Parameters:
y – (T, n_features) array of observations
s – (T, k) array of state assignments
- Returns:
(k, n_features) array of optimal parameters
- Return type:
theta
Note
s is assumed to have each row sum to 1
- generate_C(k, grid_size=0.05)[source]#
Uniformly sample of states distributed on a grid
- Parameters:
k (int) – number of states
- Returns:
K x N matrix of states
- Return type:
matrix (np.ndarray)
- fixed_theta_optimize(lossMatrix, lambda_, C)[source]#
Optimize the state sequence of a continuous jump model with fixed parameters
- Parameters:
lossMatrix (np.ndarray) – loss matrix (T x K)
C (np.ndarray) – K x N matrix of states
lambda (float) – regularization parameter
- Returns:
optimal state sequence with probability dist (T x K) v_hat (float): loss value
- Return type:
s_hat (np.ndarray)
- fit(y, k=2, lambda_=100, rearrange=False, n_trials=10, max_iter=20)[source]#
fit discrete jump model
Note
A multiprocessing implementation is used to speed up the optimization Ten trials with k means++ initialization are ran
- Parameters:
y (np.ndarray) – observed data (T x n_features)
k (int) – number of states
lambda (float) – regularization parameter
rearrange (bool) – whether to rearrange the states in increasing order of volatility
- Returns:
optimal state sequence (T x 1) best_loss (float): optimal value of the objective function best_theta (np.ndarray): optimal parameters (k x n_features) optimized_s (list): state sequences from all trials (10 x T) optimized_loss (list): objective function values from all trials (10 x 1) optimized_theta (list): parameters from all trials (10 x k x n_features)
- Return type:
best_s (np.ndarray)
- class quantbullet.research.jump_model.FeatureGenerator[source]#
Enrich univaraite time series with features
- class quantbullet.research.jump_model.SimulationGenerator[source]#
Generate simulated returns that follows a Hidden Markov process.
- stationary_distribution(transition_matrix)[source]#
Computes the stationary distribution for a given Markov transition matrix.
- Parameters:
transition_matrix (numpy array) – The Markov transition matrix.
- Returns:
The stationary distribution.
- Return type:
numpy array
- simulate_markov(transition_matrix, initial_distribution, steps)[source]#
Simulates a Markov process.
- Parameters:
transition_matrix (numpy array) – The Markov transition matrix.
initial_distribution (numpy array) – The initial state distribution.
steps (int) – The number of steps to simulate.
- Returns:
The states at each step.
- Return type:
states (list)
- generate_conditional_data(states, parameters)[source]#
Generate data using normal distribution conditional on the states.
- Parameters:
states (list) – The list of states
parameters (dict) – Parameters for each state with means and standard deviations
- Returns:
Simulated data conditional on the states.
- Return type:
data (list)
- run(steps, transition_matrix, norm_params)[source]#
Run the simulation, return the simulated states and conditional data
Note
States are forced to cover all states, if not, re-run the simulation
- Parameters:
steps (int) – number of steps to simulate
transition_matrix (np.ndarray) – transition matrix (k x k)
norm_params (dict) – parameters for the normal distribution for each state
- Returns:
simulated states simulated_data (list): simulated data conditional on states
- Return type:
simulated_states (list)
- class quantbullet.research.jump_model.TestingUtils[source]#
Parameters and plotting functions for testing
- plot_returns(returns, shade_list=None)[source]#
Plot both the cumulative returns and returns on separate subplots sharing the x-axis.
- Parameters:
returns (np.ndarray) – An array of returns.