:py:mod:`quantbullet.research.jump_model` ========================================= .. py:module:: quantbullet.research.jump_model .. autoapi-nested-parse:: Module for statistical jump models Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: quantbullet.research.jump_model.DiscreteJumpModel quantbullet.research.jump_model.ContinuousJumpModel quantbullet.research.jump_model.FeatureGenerator quantbullet.research.jump_model.SimulationGenerator quantbullet.research.jump_model.TestingUtils Attributes ~~~~~~~~~~ .. autoapisummary:: quantbullet.research.jump_model.logger .. py:data:: logger .. py:class:: DiscreteJumpModel Statistical Jump Model with Discrete States .. py:method:: fixed_states_optimize(y, s, k=2) Optimize the parameters of a discrete jump model with states fixed first. :param y: Observed data of shape (T x n_features). :type y: np.ndarray :param s: State sequence of shape (T x 1). :type s: np.ndarray :param theta_guess: Initial guess for theta of shape (k x n_features). :type theta_guess: np.ndarray :param k: Number of states. :type k: int :returns: - np.ndarray: Optimized parameters of shape (k x n_features). - float: Optimal value of the objective function. :rtype: tuple .. py:method:: generate_loss_matrix(y, theta) Generate the loss matrix for a discrete jump model for fixed theta :param y: observed data (T x n_features) :type y: np.ndarray :param theta: parameters (k x n_features) :type theta: np.ndarray :param k: number of states :type k: int :returns: loss matrix (T x k) :rtype: loss (np.ndarray) .. py:method:: fixed_theta_optimize(lossMatrix, lambda_) Optimize the state sequence of a discrete jump model with fixed parameters :param lossMatrix: loss matrix (T x k) :type lossMatrix: np.ndarray :param lambda_: regularization parameter :type lambda_: float :returns: optimal state sequence (T,) v (float): optimal value of the objective function :rtype: s (np.ndarray) .. py:method:: initialize_kmeans_plusplus(data, k) Initialize the centroids using the k-means++ method. :param data: ndarray of shape (n_samples, n_features) :param k: number of clusters :returns: ndarray of shape (k, n_features) :rtype: centroids .. py:method:: classify_data_to_states(data, centroids) Classify data points to the states based on the centroids. :param data: ndarray of shape (n_samples, n_features) :param centroids: centroids or means of the states, ndarray of shape (k, n_features) :returns: ndarray of shape (n_samples,), indices of the states to which each data point is assigned :rtype: state_assignments .. py:method:: infer_states_stats(ts_returns, states) Compute the mean and standard deviation of returns for each state :param ts_returns: observed returns (T x 1) :type ts_returns: np.ndarray :param states: state sequence (T x 1) :type states: np.ndarray :returns: mean and standard deviation of returns for each state :rtype: state_features (dict) .. py:method:: remapResults(optimized_s, optimized_theta, ts_returns) Remap the results of the optimization. We would like the states to be in increasing order of the volatility of returns. This is because vol has smaller variance than returns, a warning is triggered if the states identified by volatility and returns are different. .. py:method:: cleanResults(raw_result, ts_returns, rearrange=False) Clean the results of the optimization. This extracts the best results from the ten trials based on the loss. .. py:method:: single_run(y, k, lambda_) Run a single trial of the optimization. Each trial uses a different initialization of the centroids. :param y: observed data (T x n_features) :type y: np.ndarray :param k: number of states :type k: int :param lambda_: regularization parameter :type lambda_: float :returns: optimal state sequence (T x 1) loss (float): optimal value of the objective function cur_theta (np.ndarray): optimal parameters (k x n_features) :rtype: cur_s (np.ndarray) .. py:method:: fit(y, k=2, lambda_=100, rearrange=False, n_trials=10) fit discrete jump model .. note:: A multiprocessing implementation is used to speed up the optimization Ten trials with k means++ initialization are ran :param y: observed data (T x n_features) :type y: np.ndarray :param k: number of states :type k: int :param lambda_: regularization parameter :type lambda_: float :param rearrange: whether to rearrange the states in increasing order of volatility :type rearrange: bool :returns: optimal state sequence (T x 1) best_loss (float): optimal value of the objective function best_theta (np.ndarray): optimal parameters (k x n_features) optimized_s (list): state sequences from all trials (10 x T) optimized_loss (list): objective function values from all trials (10 x 1) optimized_theta (list): parameters from all trials (10 x k x n_features) :rtype: best_s (np.ndarray) .. py:method:: evaluate(true, pred, plot=False) Evaluate the model using balanced accuracy score :param true: true state sequence (T x 1) :type true: np.ndarray :param pred: predicted state sequence (T x 1) :type pred: np.ndarray :param plot: whether to plot the true and predicted state sequences :type plot: bool :returns: evaluation results :rtype: res (dict) .. py:class:: ContinuousJumpModel Bases: :py:obj:`DiscreteJumpModel` Continuous Jump Model with Soft State Assignments .. py:method:: fixed_states_optimize(y, s, k=None) Optimize theta given fixed states :param y: (T, n_features) array of observations :param s: (T, k) array of state assignments :returns: (k, n_features) array of optimal parameters :rtype: theta .. note:: s is assumed to have each row sum to 1 .. py:method:: generate_loss_matrix(y, theta) Identical to the loss function in the discrete case .. py:method:: generate_C(k, grid_size=0.05) Uniformly sample of states distributed on a grid :param k: number of states :type k: int :returns: K x N matrix of states :rtype: matrix (np.ndarray) .. py:method:: fixed_theta_optimize(lossMatrix, lambda_, C) Optimize the state sequence of a continuous jump model with fixed parameters :param lossMatrix: loss matrix (T x K) :type lossMatrix: np.ndarray :param C: K x N matrix of states :type C: np.ndarray :param lambda_: regularization parameter :type lambda_: float :returns: optimal state sequence with probability dist (T x K) v_hat (float): loss value :rtype: s_hat (np.ndarray) .. py:method:: fit(y, k=2, lambda_=100, rearrange=False, n_trials=10, max_iter=20) fit discrete jump model .. note:: A multiprocessing implementation is used to speed up the optimization Ten trials with k means++ initialization are ran :param y: observed data (T x n_features) :type y: np.ndarray :param k: number of states :type k: int :param lambda_: regularization parameter :type lambda_: float :param rearrange: whether to rearrange the states in increasing order of volatility :type rearrange: bool :returns: optimal state sequence (T x 1) best_loss (float): optimal value of the objective function best_theta (np.ndarray): optimal parameters (k x n_features) optimized_s (list): state sequences from all trials (10 x T) optimized_loss (list): objective function values from all trials (10 x 1) optimized_theta (list): parameters from all trials (10 x k x n_features) :rtype: best_s (np.ndarray) .. py:class:: FeatureGenerator Enrich univaraite time series with features .. py:method:: enrich_features(time_series) Enrich a single time series with features :param time_series: time series (T x 1) :type time_series: np.ndarray :returns: features (T x n_features) :rtype: features (np.ndarray) .. py:method:: standarize_features(X) Standarize features using sklearn's StandardScaler .. py:class:: SimulationGenerator Generate simulated returns that follows a Hidden Markov process. .. py:method:: stationary_distribution(transition_matrix) Computes the stationary distribution for a given Markov transition matrix. :param transition_matrix: The Markov transition matrix. :type transition_matrix: numpy array :returns: The stationary distribution. :rtype: numpy array .. py:method:: simulate_markov(transition_matrix, initial_distribution, steps) Simulates a Markov process. :param transition_matrix: The Markov transition matrix. :type transition_matrix: numpy array :param initial_distribution: The initial state distribution. :type initial_distribution: numpy array :param steps: The number of steps to simulate. :type steps: int :returns: The states at each step. :rtype: states (list) .. py:method:: generate_conditional_data(states, parameters) Generate data using normal distribution conditional on the states. :param states: The list of states :type states: list :param parameters: Parameters for each state with means and standard deviations :type parameters: dict :returns: Simulated data conditional on the states. :rtype: data (list) .. py:method:: run(steps, transition_matrix, norm_params) Run the simulation, return the simulated states and conditional data .. note:: States are forced to cover all states, if not, re-run the simulation :param steps: number of steps to simulate :type steps: int :param transition_matrix: transition matrix (k x k) :type transition_matrix: np.ndarray :param norm_params: parameters for the normal distribution for each state :type norm_params: dict :returns: simulated states simulated_data (list): simulated data conditional on states :rtype: simulated_states (list) .. py:class:: TestingUtils Parameters and plotting functions for testing .. py:method:: daily() Parameters for simulated daily return data, sourced from the paper .. py:method:: plot_returns(returns, shade_list=None) Plot both the cumulative returns and returns on separate subplots sharing the x-axis. :param returns: An array of returns. :type returns: np.ndarray .. py:method:: plot_state_probs(states, prices) plot the state probabilities and stock prices on the same plot :param states: An n x k array of state probabilities. :type states: np.ndarray :param prices: A series of prices, indexed by date. :type prices: pd.DataFrame .. py:method:: plot_averages(data_dict) Plot the average of numbers for each key in the dictionary using a line plot with x-axis labels in the form of 10^x. :param data_dict: A dictionary where keys are labels and values are lists of numbers. :type data_dict: dict