:py:mod:`quantbullet.research.jump_model`
=========================================

.. py:module:: quantbullet.research.jump_model

.. autoapi-nested-parse::

   Module for statistical jump models


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   quantbullet.research.jump_model.DiscreteJumpModel
   quantbullet.research.jump_model.ContinuousJumpModel
   quantbullet.research.jump_model.FeatureGenerator
   quantbullet.research.jump_model.SimulationGenerator
   quantbullet.research.jump_model.TestingUtils


Attributes
~~~~~~~~~~

.. autoapisummary::

   quantbullet.research.jump_model.logger


.. py:data:: logger

   
.. py:class:: DiscreteJumpModel


   Statistical Jump Model with Discrete States

   .. py:method:: fixed_states_optimize(y, s, k=2)

      Optimize the parameters of a discrete jump model with states fixed first.

      :param y: Observed data of shape (T x n_features).
      :type y: np.ndarray
      :param s: State sequence of shape (T x 1).
      :type s: np.ndarray
      :param theta_guess: Initial guess for theta of shape (k x n_features).
      :type theta_guess: np.ndarray
      :param k: Number of states.
      :type k: int

      :returns:     - np.ndarray: Optimized parameters of shape (k x n_features).
                    - float: Optimal value of the objective function.
      :rtype: tuple


   .. py:method:: generate_loss_matrix(y, theta)

      Generate the loss matrix for a discrete jump model for fixed theta

      :param y: observed data (T x n_features)
      :type y: np.ndarray
      :param theta: parameters (k x n_features)
      :type theta: np.ndarray
      :param k: number of states
      :type k: int

      :returns: loss matrix (T x k)
      :rtype: loss (np.ndarray)


   .. py:method:: fixed_theta_optimize(lossMatrix, lambda_)

      Optimize the state sequence of a discrete jump model with fixed parameters

      :param lossMatrix: loss matrix (T x k)
      :type lossMatrix: np.ndarray
      :param lambda_: regularization parameter
      :type lambda_: float

      :returns: optimal state sequence (T,)
                v (float): optimal value of the objective function
      :rtype: s (np.ndarray)


   .. py:method:: initialize_kmeans_plusplus(data, k)

      Initialize the centroids using the k-means++ method.

      :param data: ndarray of shape (n_samples, n_features)
      :param k: number of clusters

      :returns: ndarray of shape (k, n_features)
      :rtype: centroids


   .. py:method:: classify_data_to_states(data, centroids)

      Classify data points to the states based on the centroids.

      :param data: ndarray of shape (n_samples, n_features)
      :param centroids: centroids or means of the states, ndarray of shape (k, n_features)

      :returns: ndarray of shape (n_samples,), indices of the states                 to which each data point is assigned
      :rtype: state_assignments


   .. py:method:: infer_states_stats(ts_returns, states)

      Compute the mean and standard deviation of returns for each state

      :param ts_returns: observed returns (T x 1)
      :type ts_returns: np.ndarray
      :param states: state sequence (T x 1)
      :type states: np.ndarray

      :returns: mean and standard deviation of returns for each state
      :rtype: state_features (dict)


   .. py:method:: remapResults(optimized_s, optimized_theta, ts_returns)

      Remap the results of the optimization.

      We would like the states to be in increasing order of the volatility of returns.
      This is because vol has smaller variance than returns, a warning is triggered if             the states identified by volatility and returns are different.


   .. py:method:: cleanResults(raw_result, ts_returns, rearrange=False)

      Clean the results of the optimization.

      This extracts the best results from the ten trials based on the loss.


   .. py:method:: single_run(y, k, lambda_)

      Run a single trial of the optimization. Each trial uses a different             initialization of the centroids.

      :param y: observed data (T x n_features)
      :type y: np.ndarray
      :param k: number of states
      :type k: int
      :param lambda_: regularization parameter
      :type lambda_: float

      :returns: optimal state sequence (T x 1)
                loss (float): optimal value of the objective function
                cur_theta (np.ndarray): optimal parameters (k x n_features)
      :rtype: cur_s (np.ndarray)


   .. py:method:: fit(y, k=2, lambda_=100, rearrange=False, n_trials=10)

      fit discrete jump model

      .. note::

         A multiprocessing implementation is used to speed up the optimization
         Ten trials with k means++ initialization are ran

      :param y: observed data (T x n_features)
      :type y: np.ndarray
      :param k: number of states
      :type k: int
      :param lambda_: regularization parameter
      :type lambda_: float
      :param rearrange: whether to rearrange the states in increasing order of                 volatility
      :type rearrange: bool

      :returns: optimal state sequence (T x 1)
                best_loss (float): optimal value of the objective function
                best_theta (np.ndarray): optimal parameters (k x n_features)
                optimized_s (list): state sequences from all trials (10 x T)
                optimized_loss (list): objective function values from all trials (10 x 1)
                optimized_theta (list): parameters from all trials (10 x k x n_features)
      :rtype: best_s (np.ndarray)


   .. py:method:: evaluate(true, pred, plot=False)

      Evaluate the model using balanced accuracy score

      :param true: true state sequence (T x 1)
      :type true: np.ndarray
      :param pred: predicted state sequence (T x 1)
      :type pred: np.ndarray
      :param plot: whether to plot the true and predicted state sequences
      :type plot: bool

      :returns: evaluation results
      :rtype: res (dict)


.. py:class:: ContinuousJumpModel


   Bases: :py:obj:`DiscreteJumpModel`

   Continuous Jump Model with Soft State Assignments

   .. py:method:: fixed_states_optimize(y, s, k=None)

      Optimize theta given fixed states

      :param y: (T, n_features) array of observations
      :param s: (T, k) array of state assignments

      :returns: (k, n_features) array of optimal parameters
      :rtype: theta

      .. note:: s is assumed to have each row sum to 1


   .. py:method:: generate_loss_matrix(y, theta)

      Identical to the loss function in the discrete case


   .. py:method:: generate_C(k, grid_size=0.05)

      Uniformly sample of states distributed on a grid

      :param k: number of states
      :type k: int

      :returns: K x N matrix of states
      :rtype: matrix (np.ndarray)


   .. py:method:: fixed_theta_optimize(lossMatrix, lambda_, C)

      Optimize the state sequence of a continuous jump model with fixed parameters

      :param lossMatrix: loss matrix (T x K)
      :type lossMatrix: np.ndarray
      :param C: K x N matrix of states
      :type C: np.ndarray
      :param lambda_: regularization parameter
      :type lambda_: float

      :returns: optimal state sequence with probability dist (T x K)
                v_hat (float): loss value
      :rtype: s_hat (np.ndarray)


   .. py:method:: fit(y, k=2, lambda_=100, rearrange=False, n_trials=10, max_iter=20)

      fit discrete jump model

      .. note::

         A multiprocessing implementation is used to speed up the optimization
         Ten trials with k means++ initialization are ran

      :param y: observed data (T x n_features)
      :type y: np.ndarray
      :param k: number of states
      :type k: int
      :param lambda_: regularization parameter
      :type lambda_: float
      :param rearrange: whether to rearrange the states in increasing order of                 volatility
      :type rearrange: bool

      :returns: optimal state sequence (T x 1)
                best_loss (float): optimal value of the objective function
                best_theta (np.ndarray): optimal parameters (k x n_features)
                optimized_s (list): state sequences from all trials (10 x T)
                optimized_loss (list): objective function values from all trials (10 x 1)
                optimized_theta (list): parameters from all trials (10 x k x n_features)
      :rtype: best_s (np.ndarray)


.. py:class:: FeatureGenerator


   Enrich univaraite time series with features

   .. py:method:: enrich_features(time_series)

      Enrich a single time series with features

      :param time_series: time series (T x 1)
      :type time_series: np.ndarray

      :returns: features (T x n_features)
      :rtype: features (np.ndarray)


   .. py:method:: standarize_features(X)

      Standarize features using sklearn's StandardScaler


.. py:class:: SimulationGenerator


   Generate simulated returns that follows a Hidden Markov process.

   .. py:method:: stationary_distribution(transition_matrix)

      Computes the stationary distribution for a given Markov transition matrix.

      :param transition_matrix: The Markov transition matrix.
      :type transition_matrix: numpy array

      :returns: The stationary distribution.
      :rtype: numpy array


   .. py:method:: simulate_markov(transition_matrix, initial_distribution, steps)

      Simulates a Markov process.

      :param transition_matrix: The Markov transition matrix.
      :type transition_matrix: numpy array
      :param initial_distribution: The initial state distribution.
      :type initial_distribution: numpy array
      :param steps: The number of steps to simulate.
      :type steps: int

      :returns: The states at each step.
      :rtype: states (list)


   .. py:method:: generate_conditional_data(states, parameters)

      Generate data using normal distribution conditional on the states.

      :param states: The list of states
      :type states: list
      :param parameters: Parameters for each state with means and                 standard deviations
      :type parameters: dict

      :returns: Simulated data conditional on the states.
      :rtype: data (list)


   .. py:method:: run(steps, transition_matrix, norm_params)

      Run the simulation, return the simulated states and conditional data

      .. note:: States are forced to cover all states, if not, re-run the simulation

      :param steps: number of steps to simulate
      :type steps: int
      :param transition_matrix: transition matrix (k x k)
      :type transition_matrix: np.ndarray
      :param norm_params: parameters for the normal distribution for each state
      :type norm_params: dict

      :returns: simulated states
                simulated_data (list): simulated data conditional on states
      :rtype: simulated_states (list)


.. py:class:: TestingUtils


   Parameters and plotting functions for testing

   .. py:method:: daily()

      Parameters for simulated daily return data, sourced from the paper


   .. py:method:: plot_returns(returns, shade_list=None)

      Plot both the cumulative returns and returns on separate subplots sharing the x-axis.

      :param returns: An array of returns.
      :type returns: np.ndarray


   .. py:method:: plot_state_probs(states, prices)

      plot the state probabilities and stock prices on the same plot
      :param states: An n x k array of state probabilities.
      :type states: np.ndarray
      :param prices: A series of prices, indexed by date.
      :type prices: pd.DataFrame


   .. py:method:: plot_averages(data_dict)

      Plot the average of numbers for each key in the dictionary             using a line plot with x-axis labels in the form of 10^x.

      :param data_dict: A dictionary where keys are labels                     and values are lists of numbers.
      :type data_dict: dict