🧰 Toolbox¤

humbldata.toolbox ¤

Context: Toolbox.

A category to group all of the technical indicators available in the Toolbox()

Technical indicators rely on statistical transformations of time series data. These are raw math operations.

toolbox_helpers ¤

Context: Toolbox || Category: Helpers.

These Toolbox() helpers are used in various calculations in the toolbox context. Most of the helpers will be mathematical transformations of data. These functions should be DUMB functions.

log_returns ¤

log_returns(data: Series | DataFrame | LazyFrame | None = None, _column_name: str = 'adj_close', *, _drop_nulls: bool = True, _sort: bool = True) -> Series | DataFrame | LazyFrame

Context: Toolbox || Category: Helpers || Command: log_returns.

This is a DUMB command. It can be used in any CONTEXT or CATEGORY. Calculates the logarithmic returns for a given Polars Series, DataFrame, or LazyFrame. Logarithmic returns are widely used in the financial industry to measure the rate of return on investments over time. This function supports calculations on both individual series and dataframes containing financial time series data.

Parameters:

Name	Type	Description	Default
`data`	`Series \| DataFrame \| LazyFrame`	The input data for which to calculate the log returns. Default is None.	`None`
`_drop_nulls`	`bool`	Whether to drop null values from the result. Default is True.	`True`
`_column_name`	`str`	The column name to use for log return calculations in DataFrame or LazyFrame. Default is "adj_close".	`'adj_close'`
`_sort`	`bool`	If True, sorts the DataFrame or LazyFrame by `date` and `symbol` before calculation. If you want a DUMB function, set to False. Default is True.	`True`

Returns:

Type	Description
`Series \| DataFrame \| LazyFrame`	The original `data`, with an extra column of `log returns` of the input data. The return type matches the input type.

Raises:

Type	Description
`HumblDataError`	If neither a series, DataFrame, nor LazyFrame is provided as input.

Examples:

>>> series = pl.Series([100, 105, 103])
>>> log_returns(data=series)
series([-inf, 0.048790, -0.019418])

>>> df = pl.DataFrame({"adj_close": [100, 105, 103]})
>>> log_returns(data=df)
shape: (3, 2)
┌───────────┬────────────┐
│ adj_close ┆ log_returns│
│ ---       ┆ ---        │
│ f64       ┆ f64        │
╞═══════════╪════════════╡
│ 100.0     ┆ NaN        │
├───────────┼────────────┤
│ 105.0     ┆ 0.048790   │
├───────────┼────────────┤
│ 103.0     ┆ -0.019418  │
└───────────┴────────────┘

Improvements

Add a parameter _sort_cols: list[str] | None = None to make the function even dumber. This way you could specify certain columns to sort by instead of using default date and symbol. If _sort_cols=None and _sort=True, then the function will use the default date and symbol columns for sorting.

Source code in src/humbldata/toolbox/toolbox_helpers.py

def log_returns(
    data: pl.Series | pl.DataFrame | pl.LazyFrame | None = None,
    _column_name: str = "adj_close",
    *,
    _drop_nulls: bool = True,
    _sort: bool = True,
) -> pl.Series | pl.DataFrame | pl.LazyFrame:
    """
    Context: Toolbox || Category: Helpers || **Command: log_returns**.

    This is a DUMB command. It can be used in any CONTEXT or CATEGORY.
    Calculates the logarithmic returns for a given Polars Series, DataFrame, or
    LazyFrame. Logarithmic returns are widely used in the financial
    industry to measure the rate of return on investments over time. This
    function supports calculations on both individual series and dataframes
    containing financial time series data.

    Parameters
    ----------
    data : pl.Series | pl.DataFrame | pl.LazyFrame, optional
        The input data for which to calculate the log returns. Default is None.
    _drop_nulls : bool, optional
        Whether to drop null values from the result. Default is True.
    _column_name : str, optional
        The column name to use for log return calculations in DataFrame or
        LazyFrame. Default is "adj_close".
    _sort : bool, optional
        If True, sorts the DataFrame or LazyFrame by `date` and `symbol` before
        calculation. If you want a DUMB function, set to False.
        Default is True.

    Returns
    -------
    pl.Series | pl.DataFrame | pl.LazyFrame
        The original `data`, with an extra column of `log returns` of the input
        data. The return type matches the input type.

    Raises
    ------
    HumblDataError
        If neither a series, DataFrame, nor LazyFrame is provided as input.

    Examples
    --------
    >>> series = pl.Series([100, 105, 103])
    >>> log_returns(data=series)
    series([-inf, 0.048790, -0.019418])

    >>> df = pl.DataFrame({"adj_close": [100, 105, 103]})
    >>> log_returns(data=df)
    shape: (3, 2)
    ┌───────────┬────────────┐
    │ adj_close ┆ log_returns│
    │ ---       ┆ ---        │
    │ f64       ┆ f64        │
    ╞═══════════╪════════════╡
    │ 100.0     ┆ NaN        │
    ├───────────┼────────────┤
    │ 105.0     ┆ 0.048790   │
    ├───────────┼────────────┤
    │ 103.0     ┆ -0.019418  │
    └───────────┴────────────┘

    Improvements
    -----------
    Add a parameter `_sort_cols: list[str] | None = None` to make the function even
    dumber. This way you could specify certain columns to sort by instead of
    using default `date` and `symbol`. If `_sort_cols=None` and `_sort=True`,
    then the function will use the default `date` and `symbol` columns for
    sorting.

    """
    # Calculation for Polars Series
    if isinstance(data, pl.Series):
        out = data.log().diff()
        if _drop_nulls:
            out = out.drop_nulls()
    # Calculation for Polars DataFrame or LazyFrame
    elif isinstance(data, pl.DataFrame | pl.LazyFrame):
        sort_cols = _set_sort_cols(data, "symbol", "date")
        if _sort and sort_cols:
            data = data.sort(sort_cols)
            for col in sort_cols:
                data = data.set_sorted(col)
        elif _sort and not sort_cols:
            msg = "Data must contain 'symbol' and 'date' columns for sorting."
            raise HumblDataError(msg)

        if "log_returns" not in data.collect_schema().names():
            out = data.with_columns(
                pl.col(_column_name).log().diff().alias("log_returns")
            )
        else:
            out = data
        if _drop_nulls:
            out = out.drop_nulls(subset="log_returns")
    else:
        msg = "No valid data type was provided for `log_returns()` calculation."
        raise HumblDataError(msg)

    return out

detrend ¤

detrend(data: DataFrame | LazyFrame | Series, _detrend_col: str = 'log_returns', _detrend_value_col: str | Series | None = 'window_mean', *, _sort: bool = False) -> DataFrame | LazyFrame | Series

Context: Toolbox || Category: Helpers || Command: detrend.

This is a DUMB command. It can be used in any CONTEXT or CATEGORY.

Detrends a column in a DataFrame, LazyFrame, or Series by subtracting the values of another column from it. Optionally sorts the data by 'symbol' and 'date' before detrending if _sort is True.

Parameters:

Name	Type	Description	Default
`data`	`Union[DataFrame, LazyFrame, Series]`	The data structure containing the columns to be processed.	required
`_detrend_col`	`str`	The name of the column from which values will be subtracted.	`'log_returns'`
`_detrend_value_col`	`str \| Series \| None`	The name of the column whose values will be subtracted OR if you pass a pl.Series to the `data` parameter, then you can use this to pass a second `pl.Series` to subtract from the first.	`'window_mean'`
`_sort`	`bool`	If True, sorts the data by 'symbol' and 'date' before detrending. Default is False.	`False`

Returns:

Type	Description
`Union[DataFrame, LazyFrame, Series]`	The detrended data structure with the same type as the input, with an added column named `f"detrended_{_detrend_col}"`.

Notes

Function doesn't use .over() in calculation. Once the data is sorted, subtracting _detrend_value_col from _detrend_col is a simple operation that doesn't need to be grouped, because the sorting has already aligned the rows for subtraction

Source code in src/humbldata/toolbox/toolbox_helpers.py

def detrend(
    data: pl.DataFrame | pl.LazyFrame | pl.Series,
    _detrend_col: str = "log_returns",
    _detrend_value_col: str | pl.Series | None = "window_mean",
    *,
    _sort: bool = False,
) -> pl.DataFrame | pl.LazyFrame | pl.Series:
    """
    Context: Toolbox || Category: Helpers || **Command: detrend**.

    This is a DUMB command. It can be used in any CONTEXT or CATEGORY.

    Detrends a column in a DataFrame, LazyFrame, or Series by subtracting the
    values of another column from it. Optionally sorts the data by 'symbol' and
    'date' before detrending if _sort is True.

    Parameters
    ----------
    data : Union[pl.DataFrame, pl.LazyFrame, pl.Series]
        The data structure containing the columns to be processed.
    _detrend_col : str
        The name of the column from which values will be subtracted.
    _detrend_value_col : str | pl.Series | None, optional
        The name of the column whose values will be subtracted OR if you
        pass a pl.Series to the `data` parameter, then you can use this to
        pass a second `pl.Series` to subtract from the first.
    _sort : bool, optional
        If True, sorts the data by 'symbol' and 'date' before detrending.
        Default is False.

    Returns
    -------
    Union[pl.DataFrame, pl.LazyFrame, pl.Series]
        The detrended data structure with the same type as the input,
        with an added column named `f"detrended_{_detrend_col}"`.

    Notes
    -----
    Function doesn't use `.over()` in calculation. Once the data is sorted,
    subtracting _detrend_value_col from _detrend_col is a simple operation
    that doesn't need to be grouped, because the sorting has already aligned
    the rows for subtraction
    """
    if isinstance(data, pl.DataFrame | pl.LazyFrame):
        sort_cols = _set_sort_cols(data, "symbol", "date")
        if _sort and sort_cols:
            data = data.sort(sort_cols)
            for col in sort_cols:
                data = data.set_sorted(col)
        elif _sort and not sort_cols:
            msg = "Data must contain 'symbol' and 'date' columns for sorting."
            raise HumblDataError(msg)

    if isinstance(data, pl.DataFrame | pl.LazyFrame):
        col_names = data.collect_schema().names()
        if _detrend_value_col not in col_names or _detrend_col not in col_names:
            msg = f"Both {_detrend_value_col} and {_detrend_col} must be columns in the data."
            raise HumblDataError(msg)
        detrended = data.with_columns(
            (pl.col(_detrend_col) - pl.col(_detrend_value_col)).alias(
                f"detrended_{_detrend_col}"
            )
        )
    elif isinstance(data, pl.Series):
        if not isinstance(_detrend_value_col, pl.Series):
            msg = "When 'data' is a Series, '_detrend_value_col' must also be a Series."
            raise HumblDataError(msg)
        detrended = data - _detrend_value_col
        detrended.rename(f"detrended_{_detrend_col}")

    return detrended

cum_sum ¤

cum_sum(data: DataFrame | LazyFrame | Series | None = None, _column_name: str = 'detrended_returns', *, _sort: bool = True, _mandelbrot_usage: bool = True) -> LazyFrame | DataFrame | Series

Context: Toolbox || Category: Helpers || Command: cum_sum.

This is a DUMB command. It can be used in any CONTEXT or CATEGORY.

Calculate the cumulative sum of a series or column in a DataFrame or LazyFrame.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame \| LazyFrame \| Series \| None`	The data to process.	`None`
`_column_name`	`str`	The name of the column to calculate the cumulative sum on, applicable if df is provided.	`'detrended_returns'`
`_sort`	`bool`	If True, sorts the DataFrame or LazyFrame by date and symbol before calculation. Default is True.	`True`
`_mandelbrot_usage`	`bool`	If True, performs additional checks specific to the Mandelbrot Channel calculation. This should be set to True when you have a cumulative deviate series, and False when not. Please check 'Notes' for more information. Default is True.	`True`

Returns:

Type	Description
`DataFrame \| LazyFrame \| Series`	The DataFrame or Series with the cumulative deviate series added as a new column or as itself.

Notes

This function is used to calculate the cumulative sum for the deviate series of detrended returns for the data in the pipeline for calc_mandelbrot_channel.

So, although it is calculating a cumulative sum, it is known as a cumulative deviate because it is a cumulative sum on a deviate series, meaning that the cumulative sum should = 0 for each window. The _mandelbrot_usage parameter allows for checks to ensure the data is suitable for Mandelbrot Channel calculations, i.e that the deviate series was calculated correctly by the end of each series being 0, meaning the trend (the mean over the window_index) was successfully removed from the data.

Source code in src/humbldata/toolbox/toolbox_helpers.py

def cum_sum(
    data: pl.DataFrame | pl.LazyFrame | pl.Series | None = None,
    _column_name: str = "detrended_returns",
    *,
    _sort: bool = True,
    _mandelbrot_usage: bool = True,
) -> pl.LazyFrame | pl.DataFrame | pl.Series:
    """
    Context: Toolbox || Category: Helpers || **Command: cum_sum**.

    This is a DUMB command. It can be used in any CONTEXT or CATEGORY.

    Calculate the cumulative sum of a series or column in a DataFrame or
    LazyFrame.

    Parameters
    ----------
    data : pl.DataFrame | pl.LazyFrame | pl.Series | None
        The data to process.
    _column_name : str
        The name of the column to calculate the cumulative sum on,
        applicable if df is provided.
    _sort : bool, optional
        If True, sorts the DataFrame or LazyFrame by date and symbol before
        calculation. Default is True.
    _mandelbrot_usage : bool, optional
        If True, performs additional checks specific to the Mandelbrot Channel
        calculation. This should be set to True when you have a cumulative
        deviate series, and False when not. Please check 'Notes' for more
        information. Default is True.

    Returns
    -------
    pl.DataFrame | pl.LazyFrame | pl.Series
        The DataFrame or Series with the cumulative deviate series added as a
        new column or as itself.

    Notes
    -----
    This function is used to calculate the cumulative sum for the deviate series
    of detrended returns for the data in the pipeline for
    `calc_mandelbrot_channel`.

    So, although it is calculating a cumulative sum, it is known as a cumulative
    deviate because it is a cumulative sum on a deviate series, meaning that the
    cumulative sum should = 0 for each window. The _mandelbrot_usage parameter
    allows for checks to ensure the data is suitable for Mandelbrot Channel
    calculations, i.e that the deviate series was calculated correctly by the
    end of each series being 0, meaning the trend (the mean over the
    window_index) was successfully removed from the data.
    """
    if isinstance(data, pl.DataFrame | pl.LazyFrame):
        sort_cols = _set_sort_cols(data, "symbol", "date")
        if _sort and sort_cols:
            data = data.sort(sort_cols)
            for col in sort_cols:
                data = data.set_sorted(col)

        over_cols = _set_over_cols(data, "symbol", "window_index")
        if over_cols:
            out = data.with_columns(
                pl.col(_column_name).cum_sum().over(over_cols).alias("cum_sum")
            )
        else:
            out = data.with_columns(
                pl.col(_column_name).cum_sum().alias("cum_sum")
            )
    elif isinstance(data, pl.Series):
        out = data.cum_sum().alias("cum_sum")
    else:
        msg = "No DataFrame/LazyFrame/Series was provided."
        raise HumblDataError(msg)

    if _mandelbrot_usage:
        _cumsum_check(out, _column_name="cum_sum")

    return out

std ¤

std(data: LazyFrame | DataFrame | Series, _column_name: str = 'cum_sum', *, _sort: bool = True) -> LazyFrame | DataFrame | Series

Context: Toolbox || Category: Helpers || Command: std.

Calculate the standard deviation of the cumulative deviate series within each window of the dataset.

Parameters:

Name	Type	Description	Default
`df`	`LazyFrame`	The LazyFrame from which to calculate the standard deviation.	required
`_column_name`	`str`	The name of the column from which to calculate the standard deviation, with "cumdev" as the default value.	`'cum_sum'`
`_sort`	`bool`	If True, sorts the DataFrame or LazyFrame by date and symbol before calculation. Default is True.	`True`

Returns:

Type	Description
`LazyFrame`	A LazyFrame with the standard deviation of the specified column for each window, added as a new column named "S".

Improvements

Just need to parametrize .over() call in the function if want an even dumber function, that doesn't calculate each window_index.

Source code in src/humbldata/toolbox/toolbox_helpers.py

def std(
    data: pl.LazyFrame | pl.DataFrame | pl.Series,
    _column_name: str = "cum_sum",
    *,
    _sort: bool = True,
) -> pl.LazyFrame | pl.DataFrame | pl.Series:
    """
    Context: Toolbox || Category: Helpers || **Command: std**.

    Calculate the standard deviation of the cumulative deviate series within
    each window of the dataset.

    Parameters
    ----------
    df : pl.LazyFrame
        The LazyFrame from which to calculate the standard deviation.
    _column_name : str, optional
        The name of the column from which to calculate the standard deviation,
        with "cumdev" as the default value.
    _sort : bool, optional
        If True, sorts the DataFrame or LazyFrame by date and symbol before
        calculation. Default is True.

    Returns
    -------
    pl.LazyFrame
        A LazyFrame with the standard deviation of the specified column for each
        window, added as a new column named "S".

    Improvements
    -----------
    Just need to parametrize `.over()` call in the function if want an even
    dumber function, that doesn't calculate each `window_index`.
    """
    if isinstance(data, pl.Series):
        out = data.std()
    elif isinstance(data, pl.DataFrame | pl.LazyFrame):
        sort_cols = _set_sort_cols(data, "symbol", "date")
        over_cols = _set_over_cols(data, "symbol", "window_index")
        if _sort and sort_cols:
            data = data.sort(sort_cols)
            for col in sort_cols:
                data = data.set_sorted(col)

        if over_cols:
            out = data.with_columns(
                [
                    pl.col(_column_name)
                    .std()
                    .over(over_cols)
                    .alias(f"{_column_name}_std"),  # used to be 'S'
                ]
            )
        else:
            out = data.with_columns(
                pl.col(_column_name).std().alias("S"),
            )

    return out

mean ¤

mean(data: DataFrame | LazyFrame | Series, _column_name: str = 'log_returns', *, _sort: bool = True) -> DataFrame | LazyFrame

Context: Toolbox || Category: Helpers || Function: mean.

This is a DUMB command. It can be used in any CONTEXT or CATEGORY.

This function calculates the mean of a column (<_column_name>) over a each window in the dataset, if there are any. This window is intended to be the window that is passed in the calc_mandelbrot_channel() function. The mean calculated is meant to be used as the mean of each window within the time series. This way, each block of windows has their own mean, which can then be used to normalize the data (i.e remove the mean) from each window section.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame \| LazyFrame`	The DataFrame or LazyFrame to calculate the mean on.	required
`_column_name`	`str`	The name of the column to calculate the mean on.	`'log_returns'`
`_sort`	`bool`	If True, sorts the DataFrame or LazyFrame by date before calculation. Default is False.	`True`

Returns:

Type	Description
`DataFrame \| LazyFrame`	The original DataFrame or LazyFrame with a `window_mean` & `date` column, which contains the mean of 'log_returns' per range/window.

Notes

Since this function is an aggregation function, it reduces the # of observations in the dataset,thus, unless I take each value and iterate each window_mean value to correlate to the row in the original dataframe, the function will return a dataframe WITHOUT the original data.

Source code in src/humbldata/toolbox/toolbox_helpers.py

def mean(
    data: pl.DataFrame | pl.LazyFrame | pl.Series,
    _column_name: str = "log_returns",
    *,
    _sort: bool = True,
) -> pl.DataFrame | pl.LazyFrame:
    """
    Context: Toolbox || Category: Helpers || **Function: mean**.

    This is a DUMB command. It can be used in any CONTEXT or CATEGORY.

    This function calculates the mean of a column (<_column_name>) over a
    each window in the dataset, if there are any.
    This window is intended to be the `window` that is passed in the
    `calc_mandelbrot_channel()` function. The mean calculated is meant to be
    used as the mean of each `window` within the time series. This
    way, each block of windows has their own mean, which can then be used to
    normalize the data (i.e remove the mean) from each window section.

    Parameters
    ----------
    data : pl.DataFrame | pl.LazyFrame
        The DataFrame or LazyFrame to calculate the mean on.
    _column_name : str
        The name of the column to calculate the mean on.
    _sort : bool
        If True, sorts the DataFrame or LazyFrame by date before calculation.
        Default is False.

    Returns
    -------
    pl.DataFrame | pl.LazyFrame
        The original DataFrame or LazyFrame with a `window_mean` & `date` column,
        which contains the mean of 'log_returns' per range/window.


    Notes
    -----
    Since this function is an aggregation function, it reduces the # of
    observations in the dataset,thus, unless I take each value and iterate each
    window_mean value to correlate to the row in the original dataframe, the
    function will return a dataframe WITHOUT the original data.

    """
    if isinstance(data, pl.Series):
        out = data.mean()
    else:
        if data is None:
            msg = "No DataFrame was passed to the `mean()` function."
            raise HumblDataError(msg)
        sort_cols = _set_sort_cols(data, "symbol", "date")
        over_cols = _set_over_cols(data, "symbol", "window_index")
        if _sort and sort_cols:  # Check if _sort is True
            data = data.sort(sort_cols)
            for col in sort_cols:
                data = data.set_sorted(col)
        if over_cols:
            out = data.with_columns(
                pl.col(_column_name).mean().over(over_cols).alias("window_mean")
            )
        else:
            out = data.with_columns(pl.col(_column_name).mean().alias("mean"))
        if _sort and sort_cols:
            out = out.sort(sort_cols)
    return out

range_ ¤

range_(data: LazyFrame | DataFrame | Series, _column_name: str = 'cum_sum', *, _sort: bool = True) -> LazyFrame | DataFrame | Series

Context: Toolbox || Category: Technical || Sub-Category: MandelBrot Channel || Sub-Category: Helpers || Function: mandelbrot_range.

Calculate the range (max - min) of the cumulative deviate values of a specified column in a DataFrame for each window in the dataset, if there are any.

Parameters:

Name	Type	Description	Default
`data`	`LazyFrame`	The DataFrame to calculate the range from.	required
`_column_name`	`str`	The column to calculate the range from, by default "cumdev".	`'cum_sum'`

Returns:

Type	Description
`LazyFrame \| DataFrame`	A DataFrame with the range of the specified column for each window.

Source code in src/humbldata/toolbox/toolbox_helpers.py

def range_(
    data: pl.LazyFrame | pl.DataFrame | pl.Series,
    _column_name: str = "cum_sum",
    *,
    _sort: bool = True,
) -> pl.LazyFrame | pl.DataFrame | pl.Series:
    """
    Context: Toolbox || Category: Technical || Sub-Category: MandelBrot Channel || Sub-Category: Helpers || **Function: mandelbrot_range**.

    Calculate the range (max - min) of the cumulative deviate values of a
    specified column in a DataFrame for each window in the dataset, if there are any.

    Parameters
    ----------
    data : pl.LazyFrame
        The DataFrame to calculate the range from.
    _column_name : str, optional
        The column to calculate the range from, by default "cumdev".

    Returns
    -------
    pl.LazyFrame | pl.DataFrame
        A DataFrame with the range of the specified column for each window.
    """
    if isinstance(data, pl.Series):
        out = data.max() - data.min()

    if isinstance(data, pl.LazyFrame | pl.DataFrame):
        sort_cols = _set_sort_cols(data, "symbol", "date")
        over_cols = _set_over_cols(data, "symbol", "window_index")
        if _sort and sort_cols:
            data = data.sort(sort_cols)
            for col in sort_cols:
                data = data.set_sorted(col)
        if over_cols:
            out = (
                data.with_columns(
                    [
                        pl.col(_column_name)
                        .min()
                        .over(over_cols)
                        .alias(f"{_column_name}_min"),
                        pl.col(_column_name)
                        .max()
                        .over(over_cols)
                        .alias(f"{_column_name}_max"),
                    ]
                )
                .sort(sort_cols)
                .with_columns(
                    (
                        pl.col(f"{_column_name}_max")
                        - pl.col(f"{_column_name}_min")
                    ).alias(f"{_column_name}_range"),  # used to be 'R'
                )
            )
    else:
        out = (
            data.with_columns(
                [
                    pl.col(_column_name).min().alias(f"{_column_name}_min"),
                    pl.col(_column_name).max().alias(f"{_column_name}_max"),
                ]
            )
            .sort(sort_cols)
            .with_columns(
                (
                    pl.col(f"{_column_name}_max")
                    - pl.col(f"{_column_name}_min")
                ).alias(f"{_column_name}_range"),
            )
        )

    return out

toolbox_controller ¤

Context: Toolbox.

The Toolbox Controller Module.

Toolbox ¤

Bases: ToolboxQueryParams

A top-level controller for data analysis tools in humblDATA.

This module serves as the primary controller, routing user-specified ToolboxQueryParams as core arguments that are used to fetch time series data.

The Toolbox controller also gives access to all sub-modules adn their functions.

It is designed to facilitate the collection of data across various types such as stocks, options, or alternative time series by requiring minimal input from the user.

Submodules

The Toolbox controller is composed of the following submodules:

technical:
quantitative:
fundamental:

Parameters:

Name	Type	Description	Default
`symbol`	`str`	The symbol or ticker of the stock.	required
`interval`	`str`	The interval of the data. Defaults to '1d'.	required
`start_date`	`str`	The start date for the data query.	required
`end_date`	`str`	The end date for the data query.	required
`provider`	`str`	The provider to use for the data query. Defaults to 'yfinance'.	required

Parameter Notes

The parameters (symbol, interval, start_date, end_date) are the ToolboxQueryParams. They are used for data collection further down the pipeline in other commands. Intended to execute operations on core data sets. This approach enables composable and standardized querying while accommodating data-specific collection logic.

Source code in src/humbldata/toolbox/toolbox_controller.py

class Toolbox(ToolboxQueryParams):
    """

    A top-level <context> controller for data analysis tools in `humblDATA`.

    This module serves as the primary controller, routing user-specified
    ToolboxQueryParams as core arguments that are used to fetch time series
    data.

    The `Toolbox` controller also gives access to all sub-modules adn their
    functions.

    It is designed to facilitate the collection of data across various types such as
    stocks, options, or alternative time series by requiring minimal input from the user.

    Submodules
    ----------
    The `Toolbox` controller is composed of the following submodules:

    - `technical`:
    - `quantitative`:
    - `fundamental`:

    Parameters
    ----------
    symbol : str
        The symbol or ticker of the stock.
    interval : str, optional
        The interval of the data. Defaults to '1d'.
    start_date : str
        The start date for the data query.
    end_date : str
        The end date for the data query.
    provider : str, optional
        The provider to use for the data query. Defaults to 'yfinance'.

    Parameter Notes
    -----
    The parameters (`symbol`, `interval`, `start_date`, `end_date`)
    are the `ToolboxQueryParams`. They are used for data collection further
    down the pipeline in other commands. Intended to execute operations on core
    data sets. This approach enables composable and standardized querying while
    accommodating data-specific collection logic.
    """

    def __init__(self, *args, **kwargs):
        """
        Initialize the Toolbox module.

        This method does not take any parameters and does not return anything.
        """
        super().__init__(*args, **kwargs)

    @property
    def technical(self):
        """
        The technical submodule of the Toolbox controller.

        Access to all the technical indicators. WHen the Toolbox class is
        instatiated the parameters are initialized with the ToolboxQueryParams
        class, which hold all the fields needed for the context_params, like the
        symbol, interval, start_date, and end_date.
        """
        return Technical(context_params=self)

    @property
    def fundamental(self):
        """
        The fundamental submodule of the Toolbox controller.

        Access to all the Fundamental indicators. When the Toolbox class is
        instantiated the parameters are initialized with the ToolboxQueryParams
        class, which hold all the fields needed for the context_params, like the
        symbol, interval, start_date, and end_date.
        """
        return Fundamental(context_params=self)

init ¤

__init__(*args, **kwargs)

Initialize the Toolbox module.

This method does not take any parameters and does not return anything.

Source code in src/humbldata/toolbox/toolbox_controller.py

def __init__(self, *args, **kwargs):
    """
    Initialize the Toolbox module.

    This method does not take any parameters and does not return anything.
    """
    super().__init__(*args, **kwargs)

technical `property` ¤

technical

The technical submodule of the Toolbox controller.

Access to all the technical indicators. WHen the Toolbox class is instatiated the parameters are initialized with the ToolboxQueryParams class, which hold all the fields needed for the context_params, like the symbol, interval, start_date, and end_date.

fundamental `property` ¤

fundamental

The fundamental submodule of the Toolbox controller.

Access to all the Fundamental indicators. When the Toolbox class is instantiated the parameters are initialized with the ToolboxQueryParams class, which hold all the fields needed for the context_params, like the symbol, interval, start_date, and end_date.

fundamental ¤

Context: Toolbox || Category: Fundamental.

A category to group all of the fundamental indicators available in the Toolbox().

Fundamental indicators relies on earnings data, valuation models of companies, balance sheet metrics etc...

fundamental_controller ¤

Context: Toolbox || Category: Fundamental.

A controller to manage and compile all of the Fundamental models available in the toolbox context. This will be passed as a @property to the toolbox() class, giving access to the Fundamental module and its functions.

Fundamental ¤

Module for all Fundamental analysis.

Attributes:

Name	Type	Description
`context_params`	`ToolboxQueryParams`	The standard query parameters for toolbox data.

Methods:

Name	Description
`humbl_compass`	Execute the HumblCompass command.

Source code in src/humbldata/toolbox/fundamental/fundamental_controller.py

class Fundamental:
    """
    Module for all Fundamental analysis.

    Attributes
    ----------
    context_params : ToolboxQueryParams
        The standard query parameters for toolbox data.

    Methods
    -------
    humbl_compass(command_params: HumblCompassQueryParams)
        Execute the HumblCompass command.

    """

    def __init__(self, context_params: ToolboxQueryParams):
        self.context_params = context_params

    def humbl_compass(self, **kwargs):
        """
        Execute the HumblCompass command.

        Parameters
        ----------
        country : str
            The country or group of countries to analyze
        recommendations : bool, optional
            Whether to include investment recommendations based on the HUMBL regime
        chart : bool, optional
            Whether to return a chart object
        template : str, optional
            The template/theme to use for the plotly figure
        z_score : str, optional
            The time window for z-score calculation

        Returns
        -------
        HumblObject
            The HumblObject containing the transformed data and metadata
        """
        from humbldata.core.standard_models.toolbox.fundamental.humbl_compass import (
            HumblCompassFetcher,
            HumblCompassQueryParams,
        )

        # Convert kwargs to HumblCompassQueryParams
        command_params = HumblCompassQueryParams(**kwargs)

        # Instantiate the Fetcher with the query parameters
        fetcher = HumblCompassFetcher(
            context_params=self.context_params, command_params=command_params
        )

        # Use the fetcher to get the data
        return fetcher.fetch_data()

humbl_compass ¤

humbl_compass(**kwargs)

Execute the HumblCompass command.

Parameters:

Name	Type	Description	Default
`country`	`str`	The country or group of countries to analyze	required
`recommendations`	`bool`	Whether to include investment recommendations based on the HUMBL regime	required
`chart`	`bool`	Whether to return a chart object	required
`template`	`str`	The template/theme to use for the plotly figure	required
`z_score`	`str`	The time window for z-score calculation	required

Returns:

Type	Description
`HumblObject`	The HumblObject containing the transformed data and metadata

Source code in src/humbldata/toolbox/fundamental/fundamental_controller.py

def humbl_compass(self, **kwargs):
    """
    Execute the HumblCompass command.

    Parameters
    ----------
    country : str
        The country or group of countries to analyze
    recommendations : bool, optional
        Whether to include investment recommendations based on the HUMBL regime
    chart : bool, optional
        Whether to return a chart object
    template : str, optional
        The template/theme to use for the plotly figure
    z_score : str, optional
        The time window for z-score calculation

    Returns
    -------
    HumblObject
        The HumblObject containing the transformed data and metadata
    """
    from humbldata.core.standard_models.toolbox.fundamental.humbl_compass import (
        HumblCompassFetcher,
        HumblCompassQueryParams,
    )

    # Convert kwargs to HumblCompassQueryParams
    command_params = HumblCompassQueryParams(**kwargs)

    # Instantiate the Fetcher with the query parameters
    fetcher = HumblCompassFetcher(
        context_params=self.context_params, command_params=command_params
    )

    # Use the fetcher to get the data
    return fetcher.fetch_data()

humbl_compass ¤

helpers ¤

Context: Toolbox || Category: Fundamental || Command: humbl_compass.

The HumblCompass Helpers Module.

model ¤

Context: Toolbox || Category: Fundamental || Command: humbl_compass.

The humbl_compass Command Module. This is typically used in the .transform_data() method of the HumblCompassFetcher class.

humbl_compass ¤

humbl_compass()

Context: Toolbox || Category: Fundamental ||| Command: humbl_compass.

Execute the humbl_compass command.

Parameters:

Name	Type	Description	Default
`Returns`			required

Source code in src/humbldata/toolbox/fundamental/humbl_compass/model.py

def humbl_compass():
    """
    Context: Toolbox || Category: Fundamental ||| **Command: humbl_compass**.

    Execute the humbl_compass command.

    Parameters
    ----------

    Returns
    -------
    """
    pass

view ¤

Context: Toolbox || Category: Fundamental || Command: humbl_compass.

The HumblCompass View Module.

create_humbl_compass_plot ¤

create_humbl_compass_plot(data: DataFrame, template: ChartTemplate = ChartTemplate.plotly) -> Figure

Generate a HumblCompass plot from the provided data.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	The dataframe containing the data to be plotted.	required
`template`	`ChartTemplate`	The template to be used for styling the plot.	`plotly`

Returns:

Type	Description
`Figure`	A plotly figure object representing the HumblCompass plot.

Source code in src/humbldata/toolbox/fundamental/humbl_compass/view.py

def create_humbl_compass_plot(
    data: pl.DataFrame,
    template: ChartTemplate = ChartTemplate.plotly,
) -> go.Figure:
    """
    Generate a HumblCompass plot from the provided data.

    Parameters
    ----------
    data : pl.DataFrame
        The dataframe containing the data to be plotted.
    template : ChartTemplate
        The template to be used for styling the plot.

    Returns
    -------
    go.Figure
        A plotly figure object representing the HumblCompass plot.
    """
    # Sort data by date and create a color scale
    data = data.sort("date_month_start")
    full_color_scale = sequential.Reds
    custom_colorscale = sample_colorscale(full_color_scale, [0.2, 0.8])

    fig = go.Figure()

    # Calculate the range for x and y axes based on data
    x_min, x_max = data["cpi_3m_delta"].min(), data["cpi_3m_delta"].max()
    y_min, y_max = data["cli_3m_delta"].min(), data["cli_3m_delta"].max()

    # Ensure minimum range of -0.3 to 0.3 on both axes
    x_min = min(x_min if x_min is not None else 0, -0.3)
    x_max = max(x_max if x_max is not None else 0, 0.3)
    y_min = min(y_min if y_min is not None else 0, -0.3)
    y_max = max(y_max if y_max is not None else 0, 0.3)

    # Add some padding to the ranges (e.g., 10% on each side)
    x_padding = max((x_max - x_min) * 0.1, 0.05)  # Ensure minimum padding
    y_padding = max((y_max - y_min) * 0.1, 0.05)  # Ensure minimum padding

    # Calculate tick values (e.g., every 0.1)
    x_ticks = [
        round(i * 0.1, 1)
        for i in range(int(x_min * 10) - 1, int(x_max * 10) + 2)
    ]
    y_ticks = [
        round(i * 0.1, 1)
        for i in range(int(y_min * 10) - 1, int(y_max * 10) + 2)
    ]

    # Add colored quadrants from -10 to 10
    quadrants = [
        {
            "x": [0, 10],
            "y": [0, 10],
            "fillcolor": "rgba(173, 216, 230, 0.3)",
        },  # Light blue
        {
            "x": [-10, 0],
            "y": [0, 10],
            "fillcolor": "rgba(144, 238, 144, 0.3)",
        },  # Green
        {
            "x": [0, 10],
            "y": [-10, 0],
            "fillcolor": "rgba(255, 165, 0, 0.3)",
        },  # Orange
        {
            "x": [-10, 0],
            "y": [-10, 0],
            "fillcolor": "rgba(255, 99, 71, 0.3)",
        },  # Red
    ]

    for quadrant in quadrants:
        fig.add_shape(
            type="rect",
            x0=quadrant["x"][0],
            y0=quadrant["y"][0],
            x1=quadrant["x"][1],
            y1=quadrant["y"][1],
            fillcolor=quadrant["fillcolor"],
            line_color="rgba(0,0,0,0)",
            layer="below",
        )

    # Create a color array based on the date order
    color_array = list(range(len(data)))

    fig.add_trace(
        go.Scatter(
            x=data["cpi_3m_delta"],
            y=data["cli_3m_delta"],
            mode="lines+markers+text",
            name="HumblCompass Data",
            text=[
                d.strftime("%b %Y") if isinstance(d, datetime.date) else ""
                for d in data["date_month_start"]
            ],
            textposition="top center",
            textfont={"size": 10, "color": "white"},
            marker={
                "size": 10,
                "color": color_array,
                "colorscale": custom_colorscale,
                "showscale": False,
            },
            line={
                "color": "white",
                "shape": "spline",
                "smoothing": 1.3,
            },
            hovertemplate="<b>%{text}</b><br>CPI 3m Δ: %{x:.2f}<br>CLI 3m Δ: %{y:.2f}<extra></extra>",
        )
    )

    # Add axis lines with tick marks
    fig.add_shape(
        type="line",
        x0=x_min - x_padding,
        y0=0,
        x1=x_max + x_padding,
        y1=0,
        line=dict(color="white", width=1),
    )
    fig.add_shape(
        type="line",
        x0=0,
        y0=y_min - y_padding,
        x1=0,
        y1=y_max + y_padding,
        line=dict(color="white", width=1),
    )

    # Add tick marks and labels to the x-axis
    for x in x_ticks:
        if x != 0:  # Skip the center point
            fig.add_shape(
                type="line",
                x0=x,
                y0=-0.005,
                x1=x,
                y1=0.005,
                line=dict(color="white", width=1),
            )
            fig.add_annotation(
                x=x,
                y=0,
                text=f"{x:.1f}",
                showarrow=False,
                yshift=-15,
                font=dict(size=8, color="white"),
            )

    # Add tick marks and labels to the y-axis
    for y in y_ticks:
        if y != 0:  # Skip the center point
            fig.add_shape(
                type="line",
                x0=-0.005,
                y0=y,
                x1=0.005,
                y1=y,
                line=dict(color="white", width=1),
            )
            fig.add_annotation(
                x=0,
                y=y,
                text=f"{y:.1f}",
                showarrow=False,
                xshift=-15,
                font=dict(size=8, color="white"),
            )

    # Calculate the center of each visible quadrant
    x_center_pos = (x_max + x_padding + 0) / 2
    x_center_neg = (x_min - x_padding + 0) / 2
    y_center_pos = (y_max + y_padding + 0) / 2
    y_center_neg = (y_min - y_padding + 0) / 2

    # Add quadrant labels
    quadrant_labels = [
        {
            "text": "humblBOOM",
            "x": x_center_neg,
            "y": y_center_pos,
            "color": "rgba(144, 238, 144, 0.5)",  # Changed opacity to 0.5
        },
        {
            "text": "humblBOUNCE",
            "x": x_center_pos,
            "y": y_center_pos,
            "color": "rgba(173, 216, 230, 0.5)",  # Changed opacity to 0.5
        },
        {
            "text": "humblBLOAT",
            "x": x_center_pos,
            "y": y_center_neg,
            "color": "rgba(255, 165, 0, 0.5)",  # Changed opacity to 0.5
        },
        {
            "text": "humblBUST",
            "x": x_center_neg,
            "y": y_center_neg,
            "color": "rgba(255, 99, 71, 0.5)",  # Changed opacity to 0.5
        },
    ]

    for label in quadrant_labels:
        fig.add_annotation(
            x=label["x"],
            y=label["y"],
            text=label["text"],
            showarrow=False,
            font={"size": 20, "color": label["color"]},
            opacity=0.5,  # Changed opacity to 0.5
        )

    # Add custom watermark
    fig.add_annotation(
        x=0,
        y=0,
        text="humblDATA",
        showarrow=False,
        font={"size": 40, "color": "rgba(255, 255, 255, 0.1)"},
        textangle=-25,
        xanchor="center",
        yanchor="middle",
        xref="x",
        yref="y",
    )

    # Create a copy of the template without the watermark
    custom_template = pio.templates[template.value].to_plotly_json()
    if (
        "layout" in custom_template
        and "annotations" in custom_template["layout"]
    ):
        custom_template["layout"]["annotations"] = [
            ann
            for ann in custom_template["layout"]["annotations"]
            if ann.get("name") != "draft watermark"
        ]

    # Update layout
    fig.update_layout(
        title="humblCOMPASS: CLI 3m Delta vs CPI 3m Delta",
        title_font_color="white",
        xaxis_title="Inflation (CPI) 3-Month Delta",
        yaxis_title="Growth (CLI) 3-Month Delta",
        xaxis={
            "color": "white",
            "showgrid": False,
            "zeroline": False,
            "range": [x_min - x_padding, x_max + x_padding],
            "showticklabels": False,  # Hide default tick labels
            "ticks": "",  # Hide default ticks
        },
        yaxis={
            "color": "white",
            "showgrid": False,
            "zeroline": False,
            "range": [y_min - y_padding, y_max + y_padding],
            "showticklabels": False,  # Hide default tick labels
            "ticks": "",  # Hide default ticks
        },
        template=custom_template,  # Use the custom template without watermark
        hovermode="closest",
        plot_bgcolor="rgba(0,0,0,0)",
        paper_bgcolor="rgba(0,0,0,0)",
        font={"color": "white"},
        margin={"l": 50, "r": 50, "t": 50, "b": 50},
    )

    return fig

generate_plots ¤

generate_plots(data: LazyFrame, template: ChartTemplate = ChartTemplate.plotly) -> List[Chart]

Context: Toolbox || Category: Fundamental || Command: humbl_compass || Function: generate_plots().

Generate plots from the given dataframe.

Parameters:

Name	Type	Description	Default
`data`	`LazyFrame`	The LazyFrame containing the data to be plotted.	required
`template`	`ChartTemplate`	The template/theme to use for the plotly figure.	`plotly`

Returns:

Type	Description
`List[Chart]`	A list of Chart objects, each representing a plot.

Source code in src/humbldata/toolbox/fundamental/humbl_compass/view.py

def generate_plots(
    data: pl.LazyFrame,
    template: ChartTemplate = ChartTemplate.plotly,
) -> List[Chart]:
    """
    Context: Toolbox || Category: Fundamental || Command: humbl_compass || **Function: generate_plots()**.

    Generate plots from the given dataframe.

    Parameters
    ----------
    data : pl.LazyFrame
        The LazyFrame containing the data to be plotted.
    template : ChartTemplate
        The template/theme to use for the plotly figure.

    Returns
    -------
    List[Chart]
        A list of Chart objects, each representing a plot.
    """
    collected_data = data.collect()
    plot = create_humbl_compass_plot(collected_data, template)
    return [Chart(content=plot.to_json(), fig=plot)]

technical ¤

technical_controller ¤

Context: Toolbox || Category: Technical.

A controller to manage and compile all of the technical indicator models available. This will be passed as a @property to the Toolbox() class, giving access to the technical module and its functions.

Technical ¤

Module for all technical analysis.

Attributes:

Name	Type	Description
`context_params`	`ToolboxQueryParams`	The standard query parameters for toolbox data.

Methods:

Name	Description
`mandelbrot_channel`	Calculate the rescaled range statistics.

Source code in src/humbldata/toolbox/technical/technical_controller.py

class Technical:
    """
    Module for all technical analysis.

    Attributes
    ----------
    context_params : ToolboxQueryParams
        The standard query parameters for toolbox data.

    Methods
    -------
    mandelbrot_channel(command_params: MandelbrotChannelQueryParams)
        Calculate the rescaled range statistics.

    """

    def __init__(self, context_params: ToolboxQueryParams):
        self.context_params = context_params

    def mandelbrot_channel(self, **kwargs: MandelbrotChannelQueryParams):
        """
        Calculate the Mandelbrot Channel.

        Parameters
        ----------
        window : str, optional
            The width of the window used for splitting the data into sections for
            detrending. Defaults to "1mo".
        rv_adjustment : bool, optional
            Whether to adjust the calculation for realized volatility. If True, the
            data is filtered to only include observations in the same volatility bucket
            that the stock is currently in. Defaults to True.
        rv_method : str, optional
            The method to calculate the realized volatility. Only need to define
            when rv_adjustment is True. Defaults to "std".
        rs_method : Literal["RS", "RS_min", "RS_max", "RS_mean"], optional
            The method to use for Range/STD calculation. This is either min, max
            or mean of all RS ranges per window. If not defined, just used the
            most recent RS window. Defaults to "RS".
        rv_grouped_mean : bool, optional
            Whether to calculate the mean value of realized volatility over
            multiple window lengths. Defaults to False.
        live_price : bool, optional
            Whether to calculate the ranges using the current live price, or the
            most recent 'close' observation. Defaults to False.
        historical : bool, optional
            Whether to calculate the Historical Mandelbrot Channel (over-time), and
            return a time-series of channels from the start to the end date. If
            False, the Mandelbrot Channel calculation is done aggregating all of the
            data into one observation. If True, then it will enable daily
            observations over-time. Defaults to False.
        chart : bool, optional
            Whether to return a chart object. Defaults to False.
        template : str, optional
            The template/theme to use for the plotly figure. Defaults to "humbl_dark".

        Returns
        -------
        HumblObject
            An object containing the Mandelbrot Channel data and metadata.
        """
        from humbldata.core.standard_models.toolbox.technical.mandelbrot_channel import (
            MandelbrotChannelFetcher,
        )

        # Instantiate the Fetcher with the query parameters
        fetcher = MandelbrotChannelFetcher(
            context_params=self.context_params, command_params=kwargs
        )

        # Use the fetcher to get the data
        return fetcher.fetch_data()

mandelbrot_channel ¤

mandelbrot_channel(**kwargs: MandelbrotChannelQueryParams)

Calculate the Mandelbrot Channel.

Parameters:

Name	Type	Description	Default
`window`	`str`	The width of the window used for splitting the data into sections for detrending. Defaults to "1mo".	required
`rv_adjustment`	`bool`	Whether to adjust the calculation for realized volatility. If True, the data is filtered to only include observations in the same volatility bucket that the stock is currently in. Defaults to True.	required
`rv_method`	`str`	The method to calculate the realized volatility. Only need to define when rv_adjustment is True. Defaults to "std".	required
`rs_method`	`Literal[RS, RS_min, RS_max, RS_mean]`	The method to use for Range/STD calculation. This is either min, max or mean of all RS ranges per window. If not defined, just used the most recent RS window. Defaults to "RS".	required
`rv_grouped_mean`	`bool`	Whether to calculate the mean value of realized volatility over multiple window lengths. Defaults to False.	required
`live_price`	`bool`	Whether to calculate the ranges using the current live price, or the most recent 'close' observation. Defaults to False.	required
`historical`	`bool`	Whether to calculate the Historical Mandelbrot Channel (over-time), and return a time-series of channels from the start to the end date. If False, the Mandelbrot Channel calculation is done aggregating all of the data into one observation. If True, then it will enable daily observations over-time. Defaults to False.	required
`chart`	`bool`	Whether to return a chart object. Defaults to False.	required
`template`	`str`	The template/theme to use for the plotly figure. Defaults to "humbl_dark".	required

Returns:

Type	Description
`HumblObject`	An object containing the Mandelbrot Channel data and metadata.

Source code in src/humbldata/toolbox/technical/technical_controller.py

def mandelbrot_channel(self, **kwargs: MandelbrotChannelQueryParams):
    """
    Calculate the Mandelbrot Channel.

    Parameters
    ----------
    window : str, optional
        The width of the window used for splitting the data into sections for
        detrending. Defaults to "1mo".
    rv_adjustment : bool, optional
        Whether to adjust the calculation for realized volatility. If True, the
        data is filtered to only include observations in the same volatility bucket
        that the stock is currently in. Defaults to True.
    rv_method : str, optional
        The method to calculate the realized volatility. Only need to define
        when rv_adjustment is True. Defaults to "std".
    rs_method : Literal["RS", "RS_min", "RS_max", "RS_mean"], optional
        The method to use for Range/STD calculation. This is either min, max
        or mean of all RS ranges per window. If not defined, just used the
        most recent RS window. Defaults to "RS".
    rv_grouped_mean : bool, optional
        Whether to calculate the mean value of realized volatility over
        multiple window lengths. Defaults to False.
    live_price : bool, optional
        Whether to calculate the ranges using the current live price, or the
        most recent 'close' observation. Defaults to False.
    historical : bool, optional
        Whether to calculate the Historical Mandelbrot Channel (over-time), and
        return a time-series of channels from the start to the end date. If
        False, the Mandelbrot Channel calculation is done aggregating all of the
        data into one observation. If True, then it will enable daily
        observations over-time. Defaults to False.
    chart : bool, optional
        Whether to return a chart object. Defaults to False.
    template : str, optional
        The template/theme to use for the plotly figure. Defaults to "humbl_dark".

    Returns
    -------
    HumblObject
        An object containing the Mandelbrot Channel data and metadata.
    """
    from humbldata.core.standard_models.toolbox.technical.mandelbrot_channel import (
        MandelbrotChannelFetcher,
    )

    # Instantiate the Fetcher with the query parameters
    fetcher = MandelbrotChannelFetcher(
        context_params=self.context_params, command_params=kwargs
    )

    # Use the fetcher to get the data
    return fetcher.fetch_data()

mandelbrot_channel ¤

helpers ¤

Context: Toolbox || Category: Technical || Sub-Category: MandelBrot Channel || Sub-Category: Helpers.

These Toolbox() helpers are used in various calculations in the toolbox context. Most of the helpers will be mathematical transformations of data. These functions should be DUMB functions.

add_window_index ¤

add_window_index(data: LazyFrame | DataFrame, window: str) -> LazyFrame | DataFrame

Context: Toolbox || Category: MandelBrot Channel || Sub-Category: Helpers || Command: **add_window_index**.

Add a column to the dataframe indicating the window grouping for each row in a time series.

Parameters:

Name	Type	Description	Default
`data`	`LazyFrame \| DataFrame`	The input data frame or lazy frame to which the window index will be added.	required
`window`	`str`	The window size as a string, used to determine the grouping of rows into windows.	required

Returns:

Type	Description
`LazyFrame \| DataFrame`	The original data frame or lazy frame with an additional column named "window_index" indicating the window grouping for each row.

Notes

This function is essential for calculating the Mandelbrot Channel, where the dataset is split into numerous 'windows', and statistics are calculated for each window.
The function adds a dummy symbol column if the data contains only one symbol, to avoid errors in the group_by_dynamic() function.
It is utilized within the log_mean() and calc_mandelbrot_channel() functions for window-based calculations.

Examples:

>>> data = pl.DataFrame({"date": ["2021-01-01", "2021-01-02"], "symbol": ["AAPL", "AAPL"], "value": [1, 2]})
>>> window = "1d"
>>> add_window_index(data, window)
shape: (2, 4)
┌────────────┬────────┬───────┬──────────────┐
│ date       ┆ symbol ┆ value ┆ window_index │
│ ---        ┆ ---    ┆ ---   ┆ ---          │
│ date       ┆ str    ┆ i64   ┆ i64          │
╞════════════╪════════╪═══════╪══════════════╡
│ 2021-01-01 ┆ AAPL   ┆ 1     ┆ 0            │
├────────────┼────────┼───────┼──────────────┤
│ 2021-01-02 ┆ AAPL   ┆ 2     ┆ 1            │
└────────────┴────────┴───────┴──────────────┘

Source code in src/humbldata/toolbox/technical/mandelbrot_channel/helpers.py

def add_window_index(
    data: pl.LazyFrame | pl.DataFrame, window: str
) -> pl.LazyFrame | pl.DataFrame:
    """
        Context: Toolbox || Category: MandelBrot Channel || Sub-Category: Helpers || Command: **add_window_index**.

    Add a column to the dataframe indicating the window grouping for each row in
    a time series.

    Parameters
    ----------
    data : pl.LazyFrame | pl.DataFrame
        The input data frame or lazy frame to which the window index will be
        added.
    window : str
        The window size as a string, used to determine the grouping of rows into
        windows.

    Returns
    -------
    pl.LazyFrame | pl.DataFrame
        The original data frame or lazy frame with an additional column named
        "window_index" indicating
        the window grouping for each row.

    Notes
    -----
    - This function is essential for calculating the Mandelbrot Channel, where
    the dataset is split into
    numerous 'windows', and statistics are calculated for each window.
    - The function adds a dummy `symbol` column if the data contains only one
    symbol, to avoid errors in the `group_by_dynamic()` function.
    - It is utilized within the `log_mean()` and `calc_mandelbrot_channel()`
    functions for window-based calculations.

    Examples
    --------
    >>> data = pl.DataFrame({"date": ["2021-01-01", "2021-01-02"], "symbol": ["AAPL", "AAPL"], "value": [1, 2]})
    >>> window = "1d"
    >>> add_window_index(data, window)
    shape: (2, 4)
    ┌────────────┬────────┬───────┬──────────────┐
    │ date       ┆ symbol ┆ value ┆ window_index │
    │ ---        ┆ ---    ┆ ---   ┆ ---          │
    │ date       ┆ str    ┆ i64   ┆ i64          │
    ╞════════════╪════════╪═══════╪══════════════╡
    │ 2021-01-01 ┆ AAPL   ┆ 1     ┆ 0            │
    ├────────────┼────────┼───────┼──────────────┤
    │ 2021-01-02 ┆ AAPL   ┆ 2     ┆ 1            │
    └────────────┴────────┴───────┴──────────────┘
    """

    def _create_monthly_window_index(col: str, k: int = 1):
        year_diff = pl.col(col).last().dt.year() - pl.col(col).dt.year()
        month_diff = pl.col(col).last().dt.month() - pl.col(col).dt.month()
        day_indicator = pl.col(col).dt.day() > pl.col(col).last().dt.day()
        return (12 * year_diff + month_diff - day_indicator) // k

    # Clean the window into standardized strings (i.e "1month"/"1 month" = "1mo")
    window = _window_format(window, _return_timedelta=False)  # returns `str`

    if "w" in window or "d" in window:
        msg = "The window cannot include 'd' or 'w', the window needs to be larger than 1 month!"
        raise HumblDataError(msg)

    window_monthly = _window_format_monthly(window)

    data = data.with_columns(
        _create_monthly_window_index(col="date", k=window_monthly)
        .alias("window_index")
        .over("symbol")
    )

    return data

vol_buckets ¤

vol_buckets(data: DataFrame | LazyFrame, lo_quantile: float = 0.4, hi_quantile: float = 0.8, _column_name_volatility: str = 'realized_volatility', *, _boundary_group_down: bool = False) -> LazyFrame

Context: Toolbox || Category: MandelBrot Channel || Sub-Category: Helpers || Command: vol_buckets.

Splitting data observations into 3 volatility buckets: low, mid and high. The function does this for each symbol present in the data.

Parameters:

Name	Type	Description	Default
`data`	`LazyFrame \| DataFrame`	The input dataframe or lazy frame.	required
`lo_quantile`	`float`	The lower quantile for bucketing. Default is 0.4.	`0.4`
`hi_quantile`	`float`	The higher quantile for bucketing. Default is 0.8.	`0.8`
`_column_name_volatility`	`str`	The name of the column to apply volatility bucketing. Default is "realized_volatility".	`'realized_volatility'`
`_boundary_group_down`	`bool`	If True, then group boundary values down to the lower bucket, using `vol_buckets_alt()` If False, then group boundary values up to the higher bucket, using the Polars `.qcut()` method. Default is False.	`False`

Returns:

Type	Description
`LazyFrame`	The `data` with an additional column: `vol_bucket`

Source code in src/humbldata/toolbox/technical/mandelbrot_channel/helpers.py

def vol_buckets(
    data: pl.DataFrame | pl.LazyFrame,
    lo_quantile: float = 0.4,
    hi_quantile: float = 0.8,
    _column_name_volatility: str = "realized_volatility",
    *,
    _boundary_group_down: bool = False,
) -> pl.LazyFrame:
    """
    Context: Toolbox || Category: MandelBrot Channel || Sub-Category: Helpers || Command: **vol_buckets**.

    Splitting data observations into 3 volatility buckets: low, mid and high.
    The function does this for each `symbol` present in the data.

    Parameters
    ----------
    data : pl.LazyFrame | pl.DataFrame
        The input dataframe or lazy frame.
    lo_quantile : float
        The lower quantile for bucketing. Default is 0.4.
    hi_quantile : float
        The higher quantile for bucketing. Default is 0.8.
    _column_name_volatility : str
        The name of the column to apply volatility bucketing. Default is
        "realized_volatility".
    _boundary_group_down: bool = False
        If True, then group boundary values down to the lower bucket, using
        `vol_buckets_alt()` If False, then group boundary values up to the
        higher bucket, using the Polars `.qcut()` method.
        Default is False.

    Returns
    -------
    pl.LazyFrame
        The `data` with an additional column: `vol_bucket`
    """
    _check_required_columns(data, _column_name_volatility, "symbol")

    if not _boundary_group_down:
        # Grouping Boundary Values in Higher Bucket
        out = data.lazy().with_columns(
            pl.col(_column_name_volatility)
            .qcut(
                [lo_quantile, hi_quantile],
                labels=["low", "mid", "high"],
                left_closed=False,
                allow_duplicates=True,
            )
            .over("symbol")
            .alias("vol_bucket")
            .cast(pl.Utf8)
        )
    else:
        out = vol_buckets_alt(
            data, lo_quantile, hi_quantile, _column_name_volatility
        )

    return out

vol_buckets_alt ¤

vol_buckets_alt(data: DataFrame | LazyFrame, lo_quantile: float = 0.4, hi_quantile: float = 0.8, _column_name_volatility: str = 'realized_volatility') -> LazyFrame

Context: Toolbox || Category: MandelBrot Channel || Sub-Category: Helpers || Command: vol_buckets_alt.

This is an alternative implementation of vol_buckets() using expressions, and not using .qcut(). The biggest difference is how the function groups values on the boundaries of quantiles. This function groups boundary values down Splitting data observations into 3 volatility buckets: low, mid and high. The function does this for each symbol present in the data.

Parameters:

Name	Type	Description	Default
`data`	`LazyFrame \| DataFrame`	The input dataframe or lazy frame.	required
`lo_quantile`	`float`	The lower quantile for bucketing. Default is 0.4.	`0.4`
`hi_quantile`	`float`	The higher quantile for bucketing. Default is 0.8.	`0.8`
`_column_name_volatility`	`str`	The name of the column to apply volatility bucketing. Default is "realized_volatility".	`'realized_volatility'`

Returns:

Type	Description
`LazyFrame`	The `data` with an additional column: `vol_bucket`

Notes

The biggest difference is how the function groups values on the boundaries of quantiles. This function groups boundary values down to the lower bucket. So, if there is a value that lies on the mid/low border, this function will group it with low, whereas vol_buckets() will group it with mid

This function is also slightly less performant.

Source code in src/humbldata/toolbox/technical/mandelbrot_channel/helpers.py

def vol_buckets_alt(
    data: pl.DataFrame | pl.LazyFrame,
    lo_quantile: float = 0.4,
    hi_quantile: float = 0.8,
    _column_name_volatility: str = "realized_volatility",
) -> pl.LazyFrame:
    """
    Context: Toolbox || Category: MandelBrot Channel || Sub-Category: Helpers || Command: **vol_buckets_alt**.

    This is an alternative implementation of `vol_buckets()` using expressions,
    and not using `.qcut()`.
    The biggest difference is how the function groups values on the boundaries
    of quantiles. This function groups boundary values down
    Splitting data observations into 3 volatility buckets: low, mid and high.
    The function does this for each `symbol` present in the data.

    Parameters
    ----------
    data : pl.LazyFrame | pl.DataFrame
        The input dataframe or lazy frame.
    lo_quantile : float
        The lower quantile for bucketing. Default is 0.4.
    hi_quantile : float
        The higher quantile for bucketing. Default is 0.8.
    _column_name_volatility : str
        The name of the column to apply volatility bucketing. Default is "realized_volatility".

    Returns
    -------
    pl.LazyFrame
        The `data` with an additional column: `vol_bucket`

    Notes
    -----
    The biggest difference is how the function groups values on the boundaries
    of quantiles. This function __groups boundary values down__ to the lower bucket.
    So, if there is a value that lies on the mid/low border, this function will
    group it with `low`, whereas `vol_buckets()` will group it with `mid`

    This function is also slightly less performant.
    """
    # Calculate low and high quantiles for each symbol
    low_vol = pl.col(_column_name_volatility).quantile(lo_quantile)
    high_vol = pl.col(_column_name_volatility).quantile(hi_quantile)

    # Determine the volatility bucket for each row using expressions
    vol_bucket = (
        pl.when(pl.col(_column_name_volatility) <= low_vol)
        .then(pl.lit("low"))
        .when(pl.col(_column_name_volatility) <= high_vol)
        .then(pl.lit("mid"))
        .otherwise(pl.lit("high"))
        .alias("vol_bucket")
    )

    # Add the volatility bucket column to the data
    out = data.lazy().with_columns(vol_bucket.over("symbol"))

    return out

vol_filter ¤

vol_filter(data: DataFrame | LazyFrame) -> LazyFrame

Context: Toolbox || Category: MandelBrot Channel || Sub-Category: Helpers || Command: vol_filter.

If _rv_adjustment is True, then filter the data to only include rows that are in the same vol_bucket as the latest row for each symbol.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame \| LazyFrame`	The input dataframe or lazy frame. This should be the output of `vol_buckets()` function in `calc_mandelbrot_channel()`.	required

Returns:

Type	Description
`LazyFrame`	The data with only observations in the same volatility bucket as the most recent data observation

Source code in src/humbldata/toolbox/technical/mandelbrot_channel/helpers.py

def vol_filter(
    data: pl.DataFrame | pl.LazyFrame,
) -> pl.LazyFrame:
    """
    Context: Toolbox || Category: MandelBrot Channel || Sub-Category: Helpers || Command: **vol_filter**.

    If `_rv_adjustment` is True, then filter the data to only include rows
    that are in the same vol_bucket as the latest row for each symbol.

    Parameters
    ----------
    data : pl.DataFrame | pl.LazyFrame
        The input dataframe or lazy frame. This should be the output of
        `vol_buckets()` function in `calc_mandelbrot_channel()`.

    Returns
    -------
    pl.LazyFrame
        The data with only observations in the same volatility bucket as the
        most recent data observation
    """
    _check_required_columns(data, "vol_bucket", "symbol")

    data = data.lazy().with_columns(
        pl.col("vol_bucket").last().over("symbol").alias("last_vol_bucket")
    )

    out = data.filter(
        (pl.col("vol_bucket") == pl.col("last_vol_bucket")).over("symbol")
    ).drop("last_vol_bucket")

    return out

price_range ¤

price_range(data: LazyFrame | DataFrame, recent_price_data: DataFrame | LazyFrame | None = None, rs_method: Literal['RS', 'RS_mean', 'RS_max', 'RS_min'] = 'RS', _detrended_returns: str = 'detrended_log_returns', _column_name_cum_sum_max: str = 'cum_sum_max', _column_name_cum_sum_min: str = 'cum_sum_min', *, _rv_adjustment: bool = False, _sort: bool = True, **kwargs) -> LazyFrame

Context: Toolbox || Category: MandelBrot Channel || Sub-Category: Helpers || Command: price_range.

Calculate the price range for a given dataset using the Mandelbrot method.

This function computes the price range based on the recent price data, cumulative sum max and min, and RS method specified. It supports adjustments for real volatility and sorting of the data based on symbols and dates.

Parameters:

Name	Type	Description	Default
`data`	`LazyFrame \| DataFrame`	The dataset containing the financial data.	required
`recent_price_data`	`DataFrame \| LazyFrame \| None`	The dataset containing the most recent price data. If None, the most recent prices are extracted from `data`.	`None`
`rs_method`	`Literal['RS', 'RS_mean', 'RS_max', 'RS_min']`	The RS value to use. Must be one of 'RS', 'RS_mean', 'RS_max', 'RS_min'. RS is the column that is the Range/STD of the detrended returns.	`"RS"`
`_detrended_returns`	`str`	The column name for detrended returns in `data`	`"detrended_log_returns"`
`_column_name_cum_sum_max`	`str`	The column name for cumulative sum max in `data`	`"cum_sum_max"`
`_column_name_cum_sum_min`	`str`	The column name for cumulative sum min in `data`	`"cum_sum_min"`
`_rv_adjustment`	`bool`	If True, calculated the `std()` for all observations (since they have already been filtered by volatility bucket). If False, then calculates the `std()` for the most recent `window_index` and uses that to adjust the price range.	`False`
`_sort`	`bool`	If True, sorts the data based on symbols and dates.	`True`
`**kwargs`		Arbitrary keyword arguments.	`{}`

Returns:

Type	Description
`LazyFrame`	The dataset with calculated price range, including columns for top and bottom prices.

Raises:

Type	Description
`HumblDataError`	If the RS method specified is not supported.

Examples:

>>> price_range_data = price_range(data, recent_price_data=None, rs_method="RS")
>>> print(price_range_data.columns)
['symbol', 'bottom_price', 'recent_price', 'top_price']

Notes

For rs_method, you should know how this affects the mandelbrot channel that is produced. Selecting RS uses the most recent RS value to calculate the price range, whereas selecting RS_mean, RS_max, or RS_min uses the mean, max, or min of the RS values, respectively.

Source code in src/humbldata/toolbox/technical/mandelbrot_channel/helpers.py

def price_range(
    data: pl.LazyFrame | pl.DataFrame,
    recent_price_data: pl.DataFrame | pl.LazyFrame | None = None,
    rs_method: Literal["RS", "RS_mean", "RS_max", "RS_min"] = "RS",
    _detrended_returns: str = "detrended_log_returns",  # Parameterized detrended_returns column
    _column_name_cum_sum_max: str = "cum_sum_max",
    _column_name_cum_sum_min: str = "cum_sum_min",
    *,
    _rv_adjustment: bool = False,
    _sort: bool = True,
    **kwargs,
) -> pl.LazyFrame:
    """
    Context: Toolbox || Category: MandelBrot Channel || Sub-Category: Helpers || Command: **price_range**.

    Calculate the price range for a given dataset using the Mandelbrot method.

    This function computes the price range based on the recent price data,
    cumulative sum max and min, and RS method specified. It supports adjustments
    for real volatility and sorting of the data based on symbols and dates.

    Parameters
    ----------
    data : pl.LazyFrame | pl.DataFrame
        The dataset containing the financial data.
    recent_price_data : pl.DataFrame | pl.LazyFrame | None
        The dataset containing the most recent price data. If None, the most recent prices are extracted from `data`.
    rs_method : Literal["RS", "RS_mean", "RS_max", "RS_min"], default "RS"
        The RS value to use. Must be one of 'RS', 'RS_mean', 'RS_max', 'RS_min'.
        RS is the column that is the Range/STD of the detrended returns.
    _detrended_returns : str, default "detrended_log_returns"
        The column name for detrended returns in `data`
    _column_name_cum_sum_max : str, default "cum_sum_max"
        The column name for cumulative sum max in `data`
    _column_name_cum_sum_min : str, default "cum_sum_min"
        The column name for cumulative sum min in `data`
    _rv_adjustment : bool, default False
        If True, calculated the `std()` for all observations (since they have
        already been filtered by volatility bucket). If False, then calculates
        the `std()` for the most recent `window_index`
        and uses that to adjust the price range.
    _sort : bool, default True
        If True, sorts the data based on symbols and dates.
    **kwargs
        Arbitrary keyword arguments.

    Returns
    -------
    pl.LazyFrame
        The dataset with calculated price range, including columns for top and
        bottom prices.

    Raises
    ------
    HumblDataError
        If the RS method specified is not supported.

    Examples
    --------
    >>> price_range_data = price_range(data, recent_price_data=None, rs_method="RS")
    >>> print(price_range_data.columns)
    ['symbol', 'bottom_price', 'recent_price', 'top_price']

    Notes
    -----
    For `rs_method`, you should know how this affects the mandelbrot channel
    that is produced. Selecting RS uses the most recent RS value to calculate
    the price range, whereas selecting RS_mean, RS_max, or RS_min uses the mean,
    max, or min of the RS values, respectively.
    """
    # Check if RS_method is one of the allowed values
    if rs_method not in RS_METHODS:
        msg = "RS_method must be one of 'RS', 'RS_mean', 'RS_max', 'RS_min'"
        raise HumblDataError(msg)

    if isinstance(data, pl.DataFrame):
        data = data.lazy()

    sort_cols = _set_sort_cols(data, "symbol", "date")
    if _sort:
        data.sort(sort_cols)

    # Define Polars Expressions ================================================
    last_cum_sum_max = (
        pl.col(_column_name_cum_sum_max).last().alias("last_cum_sum_max")
    )
    last_cum_sum_min = (
        pl.col(_column_name_cum_sum_min).last().alias("last_cum_sum_min")
    )
    # Define a conditional expression for std_detrended_returns based on _rv_adjustment
    std_detrended_returns_expr = (
        pl.col(_detrended_returns).std().alias(f"std_{_detrended_returns}")
        if _rv_adjustment
        else pl.col(_detrended_returns)
        .filter(pl.col("window_index") == pl.col("window_index").min())
        .std()
        .alias(f"std_{_detrended_returns}")
    )
    # if rv_adjustment isnt used, then use the most recent window will be used
    # for calculating the price_range
    date_expr = pl.col("date").max()
    # ===========================================================================

    if rs_method == "RS":
        rs_expr = pl.col("RS").last().alias("RS")
    elif rs_method == "RS_mean":
        rs_expr = pl.col("RS").mean().alias("RS_mean")
    elif rs_method == "RS_max":
        rs_expr = pl.col("RS").max().alias("RS_max")
    elif rs_method == "RS_min":
        rs_expr = pl.col("RS").min().alias("RS_min")

    if recent_price_data is None:
        # if no recent_prices_data is passed, then pull the most recent prices from the data
        recent_price_expr = pl.col("close").last().alias("recent_price")
        # Perform a single group_by operation to calculate both STD of detrended returns and RS statistics
        price_range_data = (
            data.group_by("symbol")
            .agg(
                [
                    date_expr,
                    # Conditional STD calculation based on _rv_adjustment
                    std_detrended_returns_expr,
                    # Recent Price Data
                    recent_price_expr,
                    # cum_sum_max/min last
                    last_cum_sum_max,
                    last_cum_sum_min,
                    # RS statistics
                    rs_expr,
                ]
            )
            # Join with recent_price_data on symbol
            .with_columns(
                (
                    pl.col(rs_method)
                    * pl.col("std_detrended_log_returns")
                    * pl.col("recent_price")
                ).alias("price_range")
            )
            .sort("symbol")
        )
    else:
        price_range_data = (
            data.group_by("symbol")
            .agg(
                [
                    date_expr,
                    # Conditional STD calculation based on _rv_adjustment
                    std_detrended_returns_expr,
                    # cum_sum_max/min last
                    last_cum_sum_max,
                    last_cum_sum_min,
                    # RS statistics
                    rs_expr,
                ]
            )
            # Join with recent_price_data on symbol
            .join(recent_price_data.lazy(), on="symbol")
            .with_columns(
                (
                    pl.col(rs_method)
                    * pl.col("std_detrended_log_returns")
                    * pl.col("recent_price")
                ).alias("price_range")
            )
            .sort("symbol")
        )
    # Relative Position Modifier
    out = _price_range_engine(price_range_data)

    return out

model ¤

Context: Toolbox || Category: Technical || Command: calc_mandelbrot_channel.

A command to generate a Mandelbrot Channel for any time series.

calc_mandelbrot_channel ¤

calc_mandelbrot_channel(data: DataFrame | LazyFrame, window: str = '1m', rv_method: str = 'std', rs_method: Literal['RS', 'RS_mean', 'RS_max', 'RS_min'] = 'RS', *, rv_adjustment: bool = True, rv_grouped_mean: bool = True, live_price: bool = True, **kwargs) -> LazyFrame

Context: Toolbox || Category: Technical || Command: calc_mandelbrot_channel.

This command calculates the Mandelbrot Channel for a given time series, utilizing various parameters to adjust the calculation. The Mandelbrot Channel provides insights into the volatility and price range of a stock over a specified window.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame \| LazyFrame`	The time series data for which to calculate the Mandelbrot Channel. There needs to be a `close` and `date` column.	required
`window`	`str`	The window size for the calculation, specified as a string. This determines the period over which the channel is calculated.	`'1m'`
`rv_adjustment`	`bool`	Adjusts the calculation for realized volatility. If True, filters the data to include only observations within the current volatility bucket of the stock.	`True`
`rv_grouped_mean`	`bool`	Determines whether to use the grouped mean in the realized volatility calculation.	`True`
`rv_method`	`str`	Specifies the method for calculating realized volatility, applicable only if `rv_adjustment` is True.	`'std'`
`rs_method`	`Literal['RS', 'RS_mean', 'RS_max', 'RS_min']`	Defines the method for calculating the range over standard deviation, affecting the width of the Mandelbrot Channel. Options include RS, RS_mean, RS_min, and RS_max.	`'RS'`
`live_price`	`bool`	Indicates whether to incorporate live price data into the calculation, which may extend the calculation time by 1-3 seconds.	`True`
`**kwargs`		Additional keyword arguments to pass to the function, if you want to change the behavior or pass parameters to internal functions.	`{}`

Returns:

Type	Description
`LazyFrame`	A LazyFrame containing the calculated Mandelbrot Channel data for the specified time series.

Notes

The function returns a pl.LazyFrame; remember to call .collect() on the result to obtain a DataFrame. This lazy evaluation strategy postpones the calculation until it is explicitly requested.

Example

To calculate the Mandelbrot Channel for a yearly window with adjustments for realized volatility using the 'yz' method, and incorporating live price data:

mandelbrot_channel = calc_mandelbrot_channel(
    data,
    window="1y",
    rv_adjustment=True,
    rv_method="yz",
    rv_grouped_mean=False,
    rs_method="RS",
    live_price=True
).collect()

Source code in src/humbldata/toolbox/technical/mandelbrot_channel/model.py

def calc_mandelbrot_channel(  # noqa: PLR0913
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    rv_method: str = "std",
    rs_method: Literal["RS", "RS_mean", "RS_max", "RS_min"] = "RS",
    *,
    rv_adjustment: bool = True,
    rv_grouped_mean: bool = True,
    live_price: bool = True,
    **kwargs,
) -> pl.LazyFrame:
    """
    Context: Toolbox || Category: Technical || **Command: calc_mandelbrot_channel**.

    This command calculates the Mandelbrot Channel for a given time series, utilizing various parameters to adjust the calculation. The Mandelbrot Channel provides insights into the volatility and price range of a stock over a specified window.

    Parameters
    ----------
    data: pl.DataFrame | pl.LazyFrame
        The time series data for which to calculate the Mandelbrot Channel.
        There needs to be a `close` and `date` column.
    window: str, default "1m"
        The window size for the calculation, specified as a string. This
        determines the period over which the channel is calculated.
    rv_adjustment: bool, default True
        Adjusts the calculation for realized volatility. If True, filters the
        data to include only observations within the current volatility bucket
        of the stock.
    rv_grouped_mean: bool, default True
        Determines whether to use the grouped mean in the realized volatility
        calculation.
    rv_method: str, default "std"
        Specifies the method for calculating realized volatility, applicable
        only if `rv_adjustment` is True.
    rs_method: str, default "RS"
        Defines the method for calculating the range over standard deviation,
        affecting the width of the Mandelbrot Channel. Options include RS,
        RS_mean, RS_min, and RS_max.
    live_price: bool, default True
        Indicates whether to incorporate live price data into the calculation,
        which may extend the calculation time by 1-3 seconds.
    **kwargs
        Additional keyword arguments to pass to the function, if you want to
        change the behavior or pass parameters to internal functions.

    Returns
    -------
    pl.LazyFrame
        A LazyFrame containing the calculated Mandelbrot Channel data for the specified time series.

    Notes
    -----
    The function returns a pl.LazyFrame; remember to call `.collect()` on the result to obtain a DataFrame. This lazy evaluation strategy postpones the calculation until it is explicitly requested.

    Example
    -------
    To calculate the Mandelbrot Channel for a yearly window with adjustments for realized volatility using the 'yz' method, and incorporating live price data:

    ```python
    mandelbrot_channel = calc_mandelbrot_channel(
        data,
        window="1y",
        rv_adjustment=True,
        rv_method="yz",
        rv_grouped_mean=False,
        rs_method="RS",
        live_price=True
    ).collect()
    ```
    """
    # Setup ====================================================================
    # window_datetime = _window_format(window, _return_timedelta=True)
    sort_cols = _set_sort_cols(data, "symbol", "date")

    data = data.lazy()
    # Step 1: Collect Price Data -----------------------------------------------
    # Step X: Add window bins --------------------------------------------------
    # We want date grouping, non-overlapping window bins
    data1 = add_window_index(data, window=window)

    # Step X: Calculate Log Returns + Rvol -------------------------------------
    if "log_returns" not in data1.collect_schema().names():
        data2 = log_returns(data1, _column_name="close")
    else:
        data2 = data1

    # Step X: Calculate Log Mean Series ----------------------------------------
    if isinstance(data2, pl.DataFrame | pl.LazyFrame):
        data3 = mean(data2)
    else:
        msg = "A series was passed to `mean()` calculation. Please provide a DataFrame or LazyFrame."
        raise HumblDataError(msg)
    # Step X: Calculate Mean De-trended Series ---------------------------------
    data4 = detrend(
        data3, _detrend_value_col="window_mean", _detrend_col="log_returns"
    )
    # Step X: Calculate Cumulative Deviate Series ------------------------------
    data5 = cum_sum(data4, _column_name="detrended_log_returns")
    # Step X: Calculate Mandelbrot Range ---------------------------------------
    data6 = range_(data5, _column_name="cum_sum")
    # Step X: Calculate Standard Deviation -------------------------------------
    data7 = std(data6, _column_name="cum_sum")
    # Step X: Calculate Range (R) & Standard Deviation (S) ---------------------
    if rv_adjustment:
        # Step 8.1: Calculate Realized Volatility ------------------------------
        data7 = calc_realized_volatility(
            data=data7,
            window=window,
            method=rv_method,
            grouped_mean=rv_grouped_mean,
        )
        # rename col for easy selection
        for col in data7.collect_schema().names():
            if "volatility_pct" in col:
                data7 = data7.rename({col: "realized_volatility"})
        # Step 8.2: Calculate Volatility Bucket Stats --------------------------
        data7 = vol_buckets(data=data7, lo_quantile=0.3, hi_quantile=0.65)
        data7 = vol_filter(
            data7
        )  # removes rows that arent in the same vol bucket

    # Step X: Calculate RS -----------------------------------------------------
    data8 = data7.sort(sort_cols).with_columns(
        (pl.col("cum_sum_range") / pl.col("cum_sum_std")).alias("RS")
    )

    # Step X: Collect Recent Prices --------------------------------------------
    if live_price:
        symbols = (
            data.select("symbol").unique().sort("symbol").collect().to_series()
        )
        recent_prices = get_latest_price(symbols)
    else:
        recent_prices = None

    # Step X: Calculate Rescaled Price Range ----------------------------------
    out = price_range(
        data=data8,
        recent_price_data=recent_prices,
        rs_method=rs_method,
        _rv_adjustment=rv_adjustment,
    )

    return out

acalc_mandelbrot_channel `async` ¤

acalc_mandelbrot_channel(data: DataFrame | LazyFrame, window: str = '1m', rv_method: str = 'std', rs_method: Literal['RS', 'RS_mean', 'RS_max', 'RS_min'] = 'RS', *, rv_adjustment: bool = True, rv_grouped_mean: bool = True, live_price: bool = True, **kwargs) -> DataFrame | LazyFrame

Context: Toolbox || Category: Technical || Sub-Category: Mandelbrot Channel || Command: acalc_mandelbrot_channel.

Asynchronous wrapper for calc_mandelbrot_channel. This function allows calc_mandelbrot_channel to be called in an async context.

Notes

This does not make calc_mandelbrot_channel() non-blocking or asynchronous.

Source code in src/humbldata/toolbox/technical/mandelbrot_channel/model.py

async def acalc_mandelbrot_channel(  # noqa: PLR0913
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    rv_method: str = "std",
    rs_method: Literal["RS", "RS_mean", "RS_max", "RS_min"] = "RS",
    *,
    rv_adjustment: bool = True,
    rv_grouped_mean: bool = True,
    live_price: bool = True,
    **kwargs,
) -> pl.DataFrame | pl.LazyFrame:
    """
    Context: Toolbox || Category: Technical || Sub-Category: Mandelbrot Channel || **Command: acalc_mandelbrot_channel**.

    Asynchronous wrapper for calc_mandelbrot_channel.
    This function allows calc_mandelbrot_channel to be called in an async context.

    Notes
    -----
    This does not make `calc_mandelbrot_channel()` non-blocking or asynchronous.
    """
    # Directly call the synchronous calc_mandelbrot_channel function

    return calc_mandelbrot_channel(
        data=data,
        window=window,
        rv_adjustment=rv_adjustment,
        rv_method=rv_method,
        rs_method=rs_method,
        rv_grouped_mean=rv_grouped_mean,
        live_price=live_price,
        **kwargs,
    )

calc_mandelbrot_channel_historical ¤

calc_mandelbrot_channel_historical(data: DataFrame | LazyFrame, window: str = '1m', rv_method: str = 'std', rs_method: Literal['RS', 'RS_mean', 'RS_max', 'RS_min'] = 'RS', *, rv_adjustment: bool = True, rv_grouped_mean: bool = True, live_price: bool = True, **kwargs) -> LazyFrame

Context: Toolbox || Category: Technical || Sub-Category: Mandelbrot Channel || Command: calc_mandelbrot_channel_historical.

This function calculates the Mandelbrot Channel for historical data.

Synchronous wrapper for the asynchronous Mandelbrot Channel historical calculation.

Parameters:

Name	Type	Description	Default
`The`			required
`Please`			required
`description`			required

Returns:

Type	Description
`LazyFrame`	A LazyFrame containing the historical Mandelbrot Channel calculations.

Source code in src/humbldata/toolbox/technical/mandelbrot_channel/model.py

def calc_mandelbrot_channel_historical(  # noqa: PLR0913
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    rv_method: str = "std",
    rs_method: Literal["RS", "RS_mean", "RS_max", "RS_min"] = "RS",
    *,
    rv_adjustment: bool = True,
    rv_grouped_mean: bool = True,
    live_price: bool = True,
    **kwargs,
) -> pl.LazyFrame:
    """
    Context: Toolbox || Category: Technical || Sub-Category: Mandelbrot Channel || **Command: calc_mandelbrot_channel_historical**.

    This function calculates the Mandelbrot Channel for historical data.

    Synchronous wrapper for the asynchronous Mandelbrot Channel historical calculation.

    Parameters
    ----------
    The parameters for this function are the same as those for calc_mandelbrot_channel().
    Please refer to the documentation of calc_mandelbrot_channel() for a detailed
    description of each parameter.

    Returns
    -------
    pl.LazyFrame
        A LazyFrame containing the historical Mandelbrot Channel calculations.
    """
    return run_async(
        _acalc_mandelbrot_channel_historical_engine(
            data=data,
            window=window,
            rv_adjustment=rv_adjustment,
            rv_method=rv_method,
            rs_method=rs_method,
            rv_grouped_mean=rv_grouped_mean,
            live_price=live_price,
            **kwargs,
        )
    )

calc_mandelbrot_channel_historical_mp ¤

calc_mandelbrot_channel_historical_mp(data: DataFrame | LazyFrame, window: str = '1m', rv_adjustment: bool = True, rv_method: str = 'std', rs_method: Literal['RS', 'RS_mean', 'RS_max', 'RS_min'] = 'RS', *, rv_grouped_mean: bool = True, live_price: bool = True, n_processes: int = 1, **kwargs) -> LazyFrame

Calculate the Mandelbrot Channel historically using multiprocessing.

Parameters:

n_processes : int, optional Number of processes to use. If None, it uses all available cores.

Other parameters are the same as calc_mandelbrot_channel_historical.

Source code in src/humbldata/toolbox/technical/mandelbrot_channel/model.py

def calc_mandelbrot_channel_historical_mp(
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    rv_adjustment: bool = True,
    rv_method: str = "std",
    rs_method: Literal["RS", "RS_mean", "RS_max", "RS_min"] = "RS",
    *,
    rv_grouped_mean: bool = True,
    live_price: bool = True,
    n_processes: int = 1,
    **kwargs,
) -> pl.LazyFrame:
    """
    Calculate the Mandelbrot Channel historically using multiprocessing.

    Parameters:
    -----------
    n_processes : int, optional
        Number of processes to use. If None, it uses all available cores.

    Other parameters are the same as calc_mandelbrot_channel_historical.
    """
    window_days = _window_format(window, _return_timedelta=True)
    start_date = data.lazy().select(pl.col("date")).min().collect().row(0)[0]
    start_date = start_date + window_days
    end_date = data.lazy().select("date").max().collect().row(0)[0]

    if start_date >= end_date:
        msg = f"You set <historical=True> \n\
        This calculation needs *at least* one window of data. \n\
        The (start date + window) is: {start_date} and the dataset ended: {end_date}. \n\
        Please adjust dates accordingly."
        raise HumblDataError(msg)

    dates = (
        data.lazy()
        .select(pl.col("date"))
        .filter(pl.col("date") >= start_date)
        .unique()
        .sort("date")
        .collect()
        .to_series()
    )

    # Prepare the partial function with all arguments except the date
    calc_func = partial(
        _calc_mandelbrot_for_date,
        data=data,
        window=window,
        rv_adjustment=rv_adjustment,
        rv_method=rv_method,
        rs_method=rs_method,
        rv_grouped_mean=rv_grouped_mean,
        live_price=live_price,
        **kwargs,
    )

    # Use multiprocessing to calculate in parallel
    with multiprocessing.Pool(processes=n_processes) as pool:
        results = pool.map(calc_func, dates)

    # Combine results
    out = pl.concat(results, how="vertical").sort(["symbol", "date"])

    return out.lazy()

calc_mandelbrot_channel_historical_concurrent ¤

calc_mandelbrot_channel_historical_concurrent(data: DataFrame | LazyFrame, window: str = '1m', rv_method: str = 'std', rs_method: Literal['RS', 'RS_mean', 'RS_max', 'RS_min'] = 'RS', *, rv_adjustment: bool = True, rv_grouped_mean: bool = True, live_price: bool = True, max_workers: int | None = None, use_processes: bool = False, **kwargs) -> LazyFrame

Calculate the Mandelbrot Channel historically using concurrent.futures.

Parameters:

max_workers : int, optional Maximum number of workers to use. If None, it uses the default for ProcessPoolExecutor or ThreadPoolExecutor (usually the number of processors on the machine, multiplied by 5). use_processes : bool, default True If True, use ProcessPoolExecutor, otherwise use ThreadPoolExecutor.

Other parameters are the same as calc_mandelbrot_channel_historical.

Source code in src/humbldata/toolbox/technical/mandelbrot_channel/model.py

def calc_mandelbrot_channel_historical_concurrent(
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    rv_method: str = "std",
    rs_method: Literal["RS", "RS_mean", "RS_max", "RS_min"] = "RS",
    *,
    rv_adjustment: bool = True,
    rv_grouped_mean: bool = True,
    live_price: bool = True,
    max_workers: int | None = None,
    use_processes: bool = False,
    **kwargs,
) -> pl.LazyFrame:
    """
    Calculate the Mandelbrot Channel historically using concurrent.futures.

    Parameters:
    -----------
    max_workers : int, optional
        Maximum number of workers to use. If None, it uses the default for ProcessPoolExecutor
        or ThreadPoolExecutor (usually the number of processors on the machine, multiplied by 5).
    use_processes : bool, default True
        If True, use ProcessPoolExecutor, otherwise use ThreadPoolExecutor.

    Other parameters are the same as calc_mandelbrot_channel_historical.
    """
    window_days = _window_format(window, _return_timedelta=True)
    start_date = data.lazy().select(pl.col("date")).min().collect().row(0)[0]
    start_date = start_date + window_days
    end_date = data.lazy().select("date").max().collect().row(0)[0]

    if start_date >= end_date:
        msg = f"You set <historical=True> \n\
        This calculation needs *at least* one window of data. \n\
        The (start date + window) is: {start_date} and the dataset ended: {end_date}. \n\
        Please adjust dates accordingly."
        raise HumblDataError(msg)

    dates = (
        data.lazy()
        .select(pl.col("date"))
        .filter(pl.col("date") >= start_date)
        .unique()
        .sort("date")
        .collect()
        .to_series()
    )

    # Prepare the partial function with all arguments except the date
    calc_func = partial(
        _calc_mandelbrot_for_date,
        data=data,
        window=window,
        rv_adjustment=rv_adjustment,
        rv_method=rv_method,
        rs_method=rs_method,
        rv_grouped_mean=rv_grouped_mean,
        live_price=live_price,
        **kwargs,
    )

    # Choose the appropriate executor
    executor_class = (
        concurrent.futures.ProcessPoolExecutor
        if use_processes
        else concurrent.futures.ThreadPoolExecutor
    )

    # Use concurrent.futures to calculate in parallel
    with executor_class(max_workers=max_workers) as executor:
        futures = [executor.submit(calc_func, date) for date in dates]
        results = [
            future.result()
            for future in concurrent.futures.as_completed(futures)
        ]

    # Combine results
    out = pl.concat(results, how="vertical").sort(["symbol", "date"])

    return out.lazy()

view ¤

create_historical_plot ¤

create_historical_plot(data: DataFrame, symbol: str, template: ChartTemplate = ChartTemplate.plotly) -> Figure

Generate a historical plot for a given symbol from the provided data.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	The dataframe containing historical data including dates, bottom prices, close prices, and top prices.	required
`symbol`	`str`	The symbol for which the historical plot is to be generated.	required
`template`	`ChartTemplate`	The template to be used for styling the plot.	`plotly`

Returns:

Type	Description
`Figure`	A plotly figure object representing the historical data of the given symbol.

Source code in src/humbldata/toolbox/technical/mandelbrot_channel/view.py

def create_historical_plot(
    data: pl.DataFrame,
    symbol: str,
    template: ChartTemplate = ChartTemplate.plotly,
) -> go.Figure:
    """
    Generate a historical plot for a given symbol from the provided data.

    Parameters
    ----------
    data : pl.DataFrame
        The dataframe containing historical data including dates, bottom prices, close prices, and top prices.
    symbol : str
        The symbol for which the historical plot is to be generated.
    template : ChartTemplate
        The template to be used for styling the plot.

    Returns
    -------
    go.Figure
        A plotly figure object representing the historical data of the given symbol.
    """
    filtered_data = data.filter(pl.col("symbol") == symbol)

    fig = go.Figure()
    fig.add_trace(
        go.Scatter(
            x=filtered_data.select("date").to_series(),
            y=filtered_data.select("bottom_price").to_series(),
            name="Bottom Price",
            line=dict(color="green"),
        )
    )
    fig.add_trace(
        go.Scatter(
            x=filtered_data.select("date").to_series(),
            y=filtered_data.select("recent_price").to_series(),
            name="Recent Price",
            line=dict(color="blue"),
        )
    )
    fig.add_trace(
        go.Scatter(
            x=filtered_data.select("date").to_series(),
            y=filtered_data.select("top_price").to_series(),
            name="Top Price",
            line=dict(color="red"),
        )
    )
    fig.update_layout(
        title=f"Historical Mandelbrot Channel for {symbol}",
        xaxis_title="Date",
        yaxis_title="Price",
        template=template,
    )
    return fig

create_current_plot ¤

create_current_plot(data: DataFrame, equity_data: DataFrame, symbol: str, template: ChartTemplate = ChartTemplate.plotly) -> Figure

Generate a current plot for a given symbol from the provided data and equity data.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	The dataframe containing historical data including top and bottom prices.	required
`equity_data`	`DataFrame`	The dataframe containing current equity data including dates and close prices.	required
`symbol`	`str`	The symbol for which the current plot is to be generated.	required
`template`	`ChartTemplate`	The template to be used for styling the plot.	`plotly`

Returns:

Type	Description
`Figure`	A plotly figure object representing the current data of the given symbol.

Source code in src/humbldata/toolbox/technical/mandelbrot_channel/view.py

def create_current_plot(
    data: pl.DataFrame,
    equity_data: pl.DataFrame,
    symbol: str,
    template: ChartTemplate = ChartTemplate.plotly,
) -> go.Figure:
    """
    Generate a current plot for a given symbol from the provided data and equity data.

    Parameters
    ----------
    data : pl.DataFrame
        The dataframe containing historical data including top and bottom prices.
    equity_data : pl.DataFrame
        The dataframe containing current equity data including dates and close prices.
    symbol : str
        The symbol for which the current plot is to be generated.
    template : ChartTemplate
        The template to be used for styling the plot.

    Returns
    -------
    go.Figure
        A plotly figure object representing the current data of the given symbol.
    """
    filtered_data = data.filter(pl.col("symbol") == symbol)
    equity_data = equity_data.filter(pl.col("symbol") == symbol)
    fig = go.Figure()
    fig.add_trace(
        go.Scatter(
            x=equity_data.select("date").to_series(),
            y=equity_data.select("close").to_series(),
            name="Recent Price",
            line=dict(color="blue"),
        )
    )
    fig.add_hline(
        y=filtered_data.select("top_price").row(0)[0],
        line=dict(color="red", width=2),
        name="Top Price",
    )
    fig.add_hline(
        y=filtered_data.select("bottom_price").row(0)[0],
        line=dict(color="green", width=2),
        name="Bottom Price",
    )
    fig.update_layout(
        title=f"Current Mandelbrot Channel for {symbol}",
        xaxis_title="Date",
        yaxis_title="Price",
        template=template,
    )
    return fig

is_historical_data ¤

is_historical_data(data: DataFrame) -> bool

Check if the provided dataframe contains historical data based on the uniqueness of dates.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	The dataframe to check for historical data presence.	required

Returns:

Type	Description
`bool`	Returns True if the dataframe contains historical data (more than one unique date), otherwise False.

Source code in src/humbldata/toolbox/technical/mandelbrot_channel/view.py

def is_historical_data(data: pl.DataFrame) -> bool:
    """
    Check if the provided dataframe contains historical data based on the uniqueness of dates.

    Parameters
    ----------
    data : pl.DataFrame
        The dataframe to check for historical data presence.

    Returns
    -------
    bool
        Returns True if the dataframe contains historical data (more than one unique date), otherwise False.
    """
    return data.select("date").to_series().unique().shape[0] > 1

generate_plot_for_symbol ¤

generate_plot_for_symbol(data: DataFrame, equity_data: DataFrame, symbol: str, template: ChartTemplate = ChartTemplate.plotly) -> Chart

Generate a plot for a specific symbol that is filtered from the original DF.

This function will check if the data provided is a Historical or Current Mandelbrot Channel data. If it is historical, it will generate a historical plot. If it is current, it will generate a current plot.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	The dataframe containing Mandelbrot channel data for all symbols.	required
`equity_data`	`DataFrame`	The dataframe containing equity data for all symbols.	required
`symbol`	`str`	The symbol for which to generate the plot.	required
`template`	`ChartTemplate`	The template/theme to use for the plotly figure. Options are: "humbl_light", "humbl_dark", "plotly_light", "plotly_dark", "ggplot2", "seaborn", "simple_white", "none"	`plotly`

Returns:

Type	Description
`Chart`	A Chart object containing the generated plot for the specified symbol.

Source code in src/humbldata/toolbox/technical/mandelbrot_channel/view.py

def generate_plot_for_symbol(
    data: pl.DataFrame,
    equity_data: pl.DataFrame,
    symbol: str,
    template: ChartTemplate = ChartTemplate.plotly,
) -> Chart:
    """
    Generate a plot for a specific symbol that is filtered from the original DF.

    This function will check if the data provided is a Historical or Current
    Mandelbrot Channel data. If it is historical, it will generate a historical
    plot. If it is current, it will generate a current plot.

    Parameters
    ----------
    data : pl.DataFrame
        The dataframe containing Mandelbrot channel data for all symbols.
    equity_data : pl.DataFrame
        The dataframe containing equity data for all symbols.
    symbol : str
        The symbol for which to generate the plot.
    template : ChartTemplate
        The template/theme to use for the plotly figure. Options are:
        "humbl_light", "humbl_dark", "plotly_light", "plotly_dark", "ggplot2", "seaborn", "simple_white", "none"

    Returns
    -------
    Chart
        A Chart object containing the generated plot for the specified symbol.

    """
    if is_historical_data(data):
        out = create_historical_plot(data, symbol, template)
    else:
        out = create_current_plot(data, equity_data, symbol, template)

    return Chart(
        content=out.to_json(), fig=out
    )  # TODO: use to_json() instead of to_plotly_json()

generate_plots ¤

generate_plots(data: LazyFrame, equity_data: LazyFrame, template: ChartTemplate = ChartTemplate.plotly) -> list[Chart]

Context: Toolbox || Category: Technical || Subcategory: Mandelbrot Channel || Command: generate_plots().

Generate plots for each unique symbol in the given dataframes.

Parameters:

Name	Type	Description	Default
`data`	`LazyFrame`	The LazyFrame containing the symbols and MandelbrotChannelData	required
`equity_data`	`LazyFrame`	The LazyFrame containing equity data for the symbols.	required
`template`	`ChartTemplate`	The template/theme to use for the plotly figure.	`plotly`

Returns:

Type	Description
`list[Chart]`	A list of Chart objects, each representing a plot for a unique symbol.

Source code in src/humbldata/toolbox/technical/mandelbrot_channel/view.py

def generate_plots(
    data: pl.LazyFrame,
    equity_data: pl.LazyFrame,
    template: ChartTemplate = ChartTemplate.plotly,
) -> list[Chart]:
    """
    Context: Toolbox || Category: Technical || Subcategory: Mandelbrot Channel || **Command: generate_plots()**.

    Generate plots for each unique symbol in the given dataframes.

    Parameters
    ----------
    data : pl.LazyFrame
        The LazyFrame containing the symbols and MandelbrotChannelData
    equity_data : pl.LazyFrame
        The LazyFrame containing equity data for the symbols.
    template : ChartTemplate
        The template/theme to use for the plotly figure.

    Returns
    -------
    list[Chart]
        A list of Chart objects, each representing a plot for a unique symbol.

    """
    symbols = data.select("symbol").unique().collect().to_series()

    plots = [
        generate_plot_for_symbol(
            data.collect(), equity_data.collect(), symbol, template
        )
        for symbol in symbols
    ]
    return plots

volatility ¤

realized_volatility_helpers ¤

Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers.

All of the volatility estimators used in calc_realized_volatility(). These are various methods to calculate the realized volatility of financial data.

std ¤

std(data: DataFrame | LazyFrame | Series, window: str = '1m', trading_periods=252, _drop_nulls: bool = True, _avg_trading_days: bool = False, _column_name_returns: str = 'log_returns', _sort: bool = True) -> LazyFrame | Series

Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers || Command: _std.

This function computes the standard deviation of returns, which is a common measure of volatility.It calculates the rolling standard deviation for a given window size, optionally adjusting for the average number of trading days and scaling the result to an annualized volatility percentage.

Parameters:

Name	Type	Description	Default
`data`	`Union[DataFrame, LazyFrame, Series]`	The input data containing the returns. It can be a DataFrame, LazyFrame, or Series.	required
`window`	`str`	The rolling window size for calculating the standard deviation. The default is "1m" (one month).	`'1m'`
`trading_periods`	`int`	The number of trading periods in a year, used for annualizing the volatility. The default is 252.	`252`
`_drop_nulls`	`bool`	If True, null values will be dropped from the result. The default is True.	`True`
`_avg_trading_days`	`bool`	If True, the average number of trading days will be used when calculating the window size. The default is True.	`False`
`_column_name_returns`	`str`	The name of the column containing the returns. This parameter is used when `data` is a DataFrame or LazyFrame. The default is "log_returns".	`'log_returns'`

Returns:

Type	Description
`Union[DataFrame, LazyFrame, Series]`	The input data structure with an additional column for the rolling standard deviation of returns, or the modified Series with the rolling standard deviation values.

Source code in src/humbldata/toolbox/technical/volatility/realized_volatility_helpers.py

def std(
    data: pl.DataFrame | pl.LazyFrame | pl.Series,
    window: str = "1m",
    trading_periods=252,
    _drop_nulls: bool = True,
    _avg_trading_days: bool = False,
    _column_name_returns: str = "log_returns",
    _sort: bool = True,
) -> pl.LazyFrame | pl.Series:
    """
    Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers || **Command: _std**.

    This function computes the standard deviation of returns, which is a common
    measure of volatility.It calculates the rolling standard deviation for a
    given window size, optionally adjusting for the average number of trading
    days and scaling the result to an annualized volatility percentage.

    Parameters
    ----------
    data : Union[pl.DataFrame, pl.LazyFrame, pl.Series]
        The input data containing the returns. It can be a DataFrame, LazyFrame,
        or Series.
    window : str, optional
        The rolling window size for calculating the standard deviation.
        The default is "1m" (one month).
    trading_periods : int, optional
        The number of trading periods in a year, used for annualizing the
        volatility. The default is 252.
    _drop_nulls : bool, optional
        If True, null values will be dropped from the result.
        The default is True.
    _avg_trading_days : bool, optional
        If True, the average number of trading days will be used when
        calculating the window size. The default is True.
    _column_name_returns : str, optional
        The name of the column containing the returns. This parameter is used
        when `data` is a DataFrame or LazyFrame. The default is "log_returns".

    Returns
    -------
    Union[pl.DataFrame, pl.LazyFrame, pl.Series]
        The input data structure with an additional column for the rolling
        standard deviation of returns, or the modified Series with the rolling
        standard deviation values.
    """
    window_timedelta = _window_format(
        window, _return_timedelta=True, _avg_trading_days=_avg_trading_days
    )
    if isinstance(data, pl.Series):
        return data.rolling_std(
            window_size=window_timedelta.days, min_periods=1
        )
    sort_cols = _set_sort_cols(data, "symbol", "date")
    if _sort and sort_cols:
        data = data.lazy().sort(sort_cols)
        for col in sort_cols:
            data = data.set_sorted(col)

    # convert window_timedelta to days to use fixed window
    result = data.lazy().with_columns(
        (
            pl.col(_column_name_returns).rolling_std_by(
                window_size=window_timedelta,
                min_periods=2,  # using min_periods=2, bc if min_periods=1, the first value will be 0.
                by="date",
            )
            * math.sqrt(trading_periods)
            * 100
        ).alias(f"std_volatility_pct_{window_timedelta.days}D")
    )
    if _drop_nulls:
        return result.drop_nulls(
            subset=f"std_volatility_pct_{window_timedelta.days}D"
        )
    return result

parkinson ¤

parkinson(data: DataFrame | LazyFrame, window: str = '1m', _column_name_high: str = 'high', _column_name_low: str = 'low', *, _drop_nulls: bool = True, _avg_trading_days: bool = False, _sort: bool = True) -> LazyFrame

Calculate Parkinson's volatility over a specified window.

Parkinson's volatility is a measure that uses the stock's high and low prices of the day rather than just close to close prices. It is particularly useful for capturing large price movements during the day.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame \| LazyFrame`	The input data containing the stock prices.	required
`window`	`int`	The rolling window size for calculating volatility, by default 30.	`'1m'`
`trading_periods`	`int`	The number of trading periods in a year, by default 252.	required
`_column_name_high`	`str`	The name of the column containing the high prices, by default "high".	`'high'`
`_column_name_low`	`str`	The name of the column containing the low prices, by default "low".	`'low'`
`_drop_nulls`	`bool`	Whether to drop null values from the result, by default True.	`True`
`_avg_trading_days`	`bool`	Whether to use the average number of trading days when calculating the window size, by default True.	`False`

Returns:

Type	Description
`DataFrame \| LazyFrame`	The calculated Parkinson's volatility, with an additional column "parkinson_volatility_pct_{window_int}D" indicating the percentage volatility.

Notes

This function requires the input data to have 'high' and 'low' columns to calculate the logarithm of their ratio, which is squared and scaled by a constant to estimate volatility. The result is then annualized and expressed as a percentage.

Usage

If you pass "1m as a window argument and _avg_trading_days=False. The result will be 30. If _avg_trading_days=True, the result will be 21.

Examples:

>>> data = pl.DataFrame({'high': [120, 125], 'low': [115, 120]})
>>> _parkinson(data)
A DataFrame with the calculated Parkinson's volatility.

Source code in src/humbldata/toolbox/technical/volatility/realized_volatility_helpers.py

def parkinson(
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    _column_name_high: str = "high",
    _column_name_low: str = "low",
    *,
    _drop_nulls: bool = True,
    _avg_trading_days: bool = False,
    _sort: bool = True,
) -> pl.LazyFrame:
    """
    Calculate Parkinson's volatility over a specified window.

    Parkinson's volatility is a measure that uses the stock's high and low prices
    of the day rather than just close to close prices. It is particularly useful
    for capturing large price movements during the day.

    Parameters
    ----------
    data : pl.DataFrame | pl.LazyFrame
        The input data containing the stock prices.
    window : int, optional
        The rolling window size for calculating volatility, by default 30.
    trading_periods : int, optional
        The number of trading periods in a year, by default 252.
    _column_name_high : str, optional
        The name of the column containing the high prices, by default "high".
    _column_name_low : str, optional
        The name of the column containing the low prices, by default "low".
    _drop_nulls : bool, optional
        Whether to drop null values from the result, by default True.
    _avg_trading_days : bool, optional
        Whether to use the average number of trading days when calculating the
        window size, by default True.

    Returns
    -------
    pl.DataFrame | pl.LazyFrame
        The calculated Parkinson's volatility, with an additional column
        "parkinson_volatility_pct_{window_int}D"
        indicating the percentage volatility.

    Notes
    -----
    This function requires the input data to have 'high' and 'low' columns to
    calculate
    the logarithm of their ratio, which is squared and scaled by a constant to
    estimate
    volatility. The result is then annualized and expressed as a percentage.

    Usage
    -----
    If you pass `"1m` as a `window` argument and  `_avg_trading_days=False`.
    The result will be `30`. If `_avg_trading_days=True`, the result will be
    `21`.

    Examples
    --------
    >>> data = pl.DataFrame({'high': [120, 125], 'low': [115, 120]})
    >>> _parkinson(data)
    A DataFrame with the calculated Parkinson's volatility.
    """
    sort_cols = _set_sort_cols(data, "symbol", "date")
    if _sort and sort_cols:
        data = data.lazy().sort(sort_cols)
        for col in sort_cols:
            data = data.set_sorted(col)

    var1 = 1.0 / (4.0 * math.log(2.0))
    var2 = (
        data.lazy()
        .select((pl.col(_column_name_high) / pl.col(_column_name_low)).log())
        .collect()
        .to_series()
    )
    rs = var1 * var2**2

    window_int: int = _window_format(
        window, _return_timedelta=True, _avg_trading_days=_avg_trading_days
    ).days
    result = data.lazy().with_columns(
        (
            rs.rolling_map(_annual_vol, window_size=window_int, min_periods=1)
            * 100
        ).alias(f"parkinson_volatility_pct_{window_int}D")
    )
    if _drop_nulls:
        return result.drop_nulls(
            subset=f"parkinson_volatility_pct_{window_int}D"
        )

    return result

garman_klass ¤

garman_klass(data: DataFrame | LazyFrame, window: str = '1m', _column_name_high: str = 'high', _column_name_low: str = 'low', _column_name_open: str = 'open', _column_name_close: str = 'close', _drop_nulls: bool = True, _avg_trading_days: bool = False, _sort: bool = True) -> LazyFrame

Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers || Command: _garman_klass.

Calculates the Garman-Klass volatility for a given dataset.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame \| LazyFrame`	The input data containing the price information.	required
`window`	`str`	The rolling window size for volatility calculation, by default "1m".	`'1m'`
`_column_name_high`	`str`	The name of the column containing the high prices, by default "high".	`'high'`
`_column_name_low`	`str`	The name of the column containing the low prices, by default "low".	`'low'`
`_column_name_open`	`str`	The name of the column containing the opening prices, by default "open".	`'open'`
`_column_name_close`	`str`	The name of the column containing the adjusted closing prices, by default "close".	`'close'`
`_drop_nulls`	`bool`	Whether to drop null values from the result, by default True.	`True`
`_avg_trading_days`	`bool`	Whether to use the average number of trading days when calculating the window size, by default True.	`False`

Returns:

Type	Description
`DataFrame \| LazyFrame \| Series`	The calculated Garman-Klass volatility, with an additional column "volatility_pct" indicating the percentage volatility.

Notes

Garman-Klass volatility extends Parkinson’s volatility by considering the opening and closing prices in addition to the high and low prices. This approach provides a more accurate estimation of volatility, especially in markets with significant activity at the opening and closing of trading sessions.

Source code in src/humbldata/toolbox/technical/volatility/realized_volatility_helpers.py

def garman_klass(
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    _column_name_high: str = "high",
    _column_name_low: str = "low",
    _column_name_open: str = "open",
    _column_name_close: str = "close",
    _drop_nulls: bool = True,
    _avg_trading_days: bool = False,
    _sort: bool = True,
) -> pl.LazyFrame:
    """
    Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers || **Command: _garman_klass**.

    Calculates the Garman-Klass volatility for a given dataset.

    Parameters
    ----------
    data : pl.DataFrame | pl.LazyFrame
        The input data containing the price information.
    window : str, optional
        The rolling window size for volatility calculation, by default "1m".
    _column_name_high : str, optional
        The name of the column containing the high prices, by default "high".
    _column_name_low : str, optional
        The name of the column containing the low prices, by default "low".
    _column_name_open : str, optional
        The name of the column containing the opening prices, by default "open".
    _column_name_close : str, optional
        The name of the column containing the adjusted closing prices, by
        default "close".
    _drop_nulls : bool, optional
        Whether to drop null values from the result, by default True.
    _avg_trading_days : bool, optional
        Whether to use the average number of trading days when calculating the
        window size, by default True.

    Returns
    -------
    pl.DataFrame | pl.LazyFrame | pl.Series
        The calculated Garman-Klass volatility, with an additional column
        "volatility_pct" indicating the percentage volatility.

    Notes
    -----
    Garman-Klass volatility extends Parkinson’s volatility by considering the
    opening and closing prices in addition to the high and low prices. This
    approach provides a more accurate estimation of volatility, especially in
    markets with significant activity at the opening and closing of trading
    sessions.
    """
    sort_cols = _set_sort_cols(data, "symbol", "date")
    if _sort and sort_cols:
        data = data.lazy().sort(sort_cols)
        for col in sort_cols:
            data = data.set_sorted(col)
    log_hi_lo = (
        data.lazy()
        .select((pl.col(_column_name_high) / pl.col(_column_name_low)).log())
        .collect()
        .to_series()
    )
    log_close_open = (
        data.lazy()
        .select((pl.col(_column_name_close) / pl.col(_column_name_open)).log())
        .collect()
        .to_series()
    )
    rs: pl.Series = 0.5 * log_hi_lo**2 - (2 * np.log(2) - 1) * log_close_open**2

    window_int: int = _window_format(
        window, _return_timedelta=True, _avg_trading_days=_avg_trading_days
    ).days
    result = data.lazy().with_columns(
        (
            rs.rolling_map(_annual_vol, window_size=window_int, min_periods=1)
            * 100
        ).alias(f"gk_volatility_pct_{window_int}D")
    )
    if _drop_nulls:
        return result.drop_nulls(subset=f"gk_volatility_pct_{window_int}D")
    return result

hodges_tompkins ¤

hodges_tompkins(data: DataFrame | LazyFrame | Series, window: str = '1m', trading_periods=252, _column_name_returns: str = 'log_returns', *, _drop_nulls: bool = True, _avg_trading_days: bool = False, _sort: bool = True) -> LazyFrame | Series

Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers || Command: _hodges_tompkins.

Hodges-Tompkins volatility is a bias correction for estimation using an overlapping data sample that produces unbiased estimates and a substantial gain in efficiency.

Source code in src/humbldata/toolbox/technical/volatility/realized_volatility_helpers.py

def hodges_tompkins(
    data: pl.DataFrame | pl.LazyFrame | pl.Series,
    window: str = "1m",
    trading_periods=252,
    _column_name_returns: str = "log_returns",
    *,
    _drop_nulls: bool = True,
    _avg_trading_days: bool = False,
    _sort: bool = True,
) -> pl.LazyFrame | pl.Series:
    """
    Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers || **Command: _hodges_tompkins**.

    Hodges-Tompkins volatility is a bias correction for estimation using an
    overlapping data sample that produces unbiased estimates and a
    substantial gain in efficiency.
    """
    # When calculating rv_mean, need a different adjustment factor,
    # so window doesn't influence the Volatility_mean
    # RV_MEAN

    # Define Window Size
    window_timedelta = _window_format(
        window, _return_timedelta=True, _avg_trading_days=_avg_trading_days
    )
    # Calculate STD, assigned to `vol`
    if isinstance(data, pl.Series):
        vol = data.rolling_std(window_size=window_timedelta.days, min_periods=1)
    else:
        sort_cols = _set_sort_cols(data, "symbol", "date")
        if _sort and sort_cols:
            data = data.lazy().sort(sort_cols)
            for col in sort_cols:
                data = data.set_sorted(col)
        vol = data.lazy().select(
            pl.col(_column_name_returns).rolling_std_by(
                window_size=window_timedelta, min_periods=1, by="date"
            )
            * np.sqrt(trading_periods)
        )

    # Assign window size to h for adjustment
    h: int = window_timedelta.days

    if isinstance(data, pl.Series):
        count = data.len()
    elif isinstance(data, pl.LazyFrame):
        count = data.collect().shape[0]
    else:
        count = data.shape[0]

    n = (count - h) + 1
    adj_factor = 1.0 / (1.0 - (h / n) + ((h**2 - 1) / (3 * n**2)))

    if isinstance(data, pl.Series):
        return (vol * adj_factor) * 100
    else:
        result = data.lazy().with_columns(
            ((vol.collect() * adj_factor) * 100)
            .to_series()
            .alias(f"ht_volatility_pct_{h}D")
        )
    if _drop_nulls:
        result = result.drop_nulls(subset=f"ht_volatility_pct_{h}D")
    return result

rogers_satchell ¤

rogers_satchell(data: DataFrame | LazyFrame, window: str = '1m', _column_name_high: str = 'high', _column_name_low: str = 'low', _column_name_open: str = 'open', _column_name_close: str = 'close', _drop_nulls: bool = True, _avg_trading_days: bool = False, _sort: bool = True) -> LazyFrame

Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers || Command: _rogers_satchell.

Rogers-Satchell is an estimator for measuring the volatility of securities with an average return not equal to zero. Unlike Parkinson and Garman-Klass estimators, Rogers-Satchell incorporates a drift term (mean return not equal to zero). This function calculates the Rogers-Satchell volatility estimator over a specified window and optionally drops null values from the result.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame \| LazyFrame`	The input data for which to calculate the Rogers-Satchell volatility estimator. This can be either a DataFrame or a LazyFrame. There need to be OHLC columns present in the data.	required
`window`	`str`	The window over which to calculate the volatility estimator. The window is specified as a string, such as "1m" for one month.	`"1m"`
`_column_name_high`	`str`	The name of the column representing the high prices in the data.	`"high"`
`_column_name_low`	`str`	The name of the column representing the low prices in the data.	`"low"`
`_column_name_open`	`str`	The name of the column representing the opening prices in the data.	`"open"`
`_column_name_close`	`str`	The name of the column representing the adjusted closing prices in the data.	`"close"`
`_drop_nulls`	`bool`	Whether to drop null values from the result. If True, rows with null values in the calculated volatility column will be removed from the output.	`True`
`_avg_trading_days`	`bool`	Indicates whether to use the average number of trading days per window. This affects how the window size is interpreted. i.e instead of "1mo" returning `timedelta(days=31)`, it will return `timedelta(days=21)`.	`True`

Returns:

Type	Description
`DataFrame \| LazyFrame`	The input data with an additional column containing the calculated Rogers-Satchell volatility estimator. The return type matches the input type (DataFrame or LazyFrame).

Source code in src/humbldata/toolbox/technical/volatility/realized_volatility_helpers.py

def rogers_satchell(
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    _column_name_high: str = "high",
    _column_name_low: str = "low",
    _column_name_open: str = "open",
    _column_name_close: str = "close",
    _drop_nulls: bool = True,
    _avg_trading_days: bool = False,
    _sort: bool = True,
) -> pl.LazyFrame:
    """
    Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers || **Command: _rogers_satchell**.

    Rogers-Satchell is an estimator for measuring the volatility of
    securities with an average return not equal to zero. Unlike Parkinson
    and Garman-Klass estimators, Rogers-Satchell incorporates a drift term
    (mean return not equal to zero). This function calculates the
    Rogers-Satchell volatility estimator over a specified window and optionally
    drops null values from the result.

    Parameters
    ----------
    data : pl.DataFrame | pl.LazyFrame
        The input data for which to calculate the Rogers-Satchell volatility
        estimator. This can be either a DataFrame or a LazyFrame. There need to
        be OHLC columns present in the data.
    window : str, default "1m"
        The window over which to calculate the volatility estimator. The
        window is specified as a string, such as "1m" for one month.
    _column_name_high : str, default "high"
        The name of the column representing the high prices in the data.
    _column_name_low : str, default "low"
        The name of the column representing the low prices in the data.
    _column_name_open : str, default "open"
        The name of the column representing the opening prices in the data.
    _column_name_close : str, default "close"
        The name of the column representing the adjusted closing prices in the
        data.
    _drop_nulls : bool, default True
        Whether to drop null values from the result. If True, rows with null
        values in the calculated volatility column will be removed from the
        output.
    _avg_trading_days : bool, default True
        Indicates whether to use the average number of trading days per window.
        This affects how the window size is interpreted. i.e instead of "1mo"
        returning `timedelta(days=31)`, it will return `timedelta(days=21)`.

    Returns
    -------
    pl.DataFrame | pl.LazyFrame
        The input data with an additional column containing the calculated
        Rogers-Satchell volatility estimator. The return type matches the input
        type (DataFrame or LazyFrame).
    """
    # Check if all required columns are present in the DataFrame
    _check_required_columns(
        data,
        _column_name_high,
        _column_name_low,
        _column_name_open,
        _column_name_close,
    )
    sort_cols = _set_sort_cols(data, "symbol", "date")
    if _sort and sort_cols:
        data = data.lazy().sort(sort_cols)
        for col in sort_cols:
            data = data.set_sorted(col)
    # assign window
    window_int: int = _window_format(
        window=window,
        _return_timedelta=True,
        _avg_trading_days=_avg_trading_days,
    ).days

    data = (
        data.lazy()
        .with_columns(
            [
                (pl.col(_column_name_high) / pl.col(_column_name_open))
                .log()
                .alias("log_ho"),
                (pl.col(_column_name_low) / pl.col(_column_name_open))
                .log()
                .alias("log_lo"),
                (pl.col(_column_name_close) / pl.col(_column_name_open))
                .log()
                .alias("log_co"),
            ]
        )
        .with_columns(
            (
                pl.col("log_ho") * (pl.col("log_ho") - pl.col("log_co"))
                + pl.col("log_lo") * (pl.col("log_lo") - pl.col("log_co"))
            ).alias("rs")
        )
    )
    result = data.lazy().with_columns(
        (
            pl.col("rs").rolling_map(
                _annual_vol, window_size=window_int, min_periods=1
            )
            * 100
        ).alias(f"rs_volatility_pct_{window_int}D")
    )
    if _drop_nulls:
        result = result.drop_nulls(subset=f"rs_volatility_pct_{window_int}D")
    return result

yang_zhang ¤

yang_zhang(data: DataFrame | LazyFrame, window: str = '1m', trading_periods: int = 252, _column_name_high: str = 'high', _column_name_low: str = 'low', _column_name_open: str = 'open', _column_name_close: str = 'close', _avg_trading_days: bool = False, _drop_nulls: bool = True, _sort: bool = True) -> LazyFrame

Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers || Command: _yang_zhang.

Yang-Zhang volatility is the combination of the overnight (close-to-open volatility), a weighted average of the Rogers-Satchell volatility and the day’s open-to-close volatility.

Source code in src/humbldata/toolbox/technical/volatility/realized_volatility_helpers.py

def yang_zhang(
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    trading_periods: int = 252,
    _column_name_high: str = "high",
    _column_name_low: str = "low",
    _column_name_open: str = "open",
    _column_name_close: str = "close",
    _avg_trading_days: bool = False,
    _drop_nulls: bool = True,
    _sort: bool = True,
) -> pl.LazyFrame:
    """
    Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers || **Command: _yang_zhang**.

    Yang-Zhang volatility is the combination of the overnight
    (close-to-open volatility), a weighted average of the Rogers-Satchell
    volatility and the day’s open-to-close volatility.
    """
    # check required columns
    _check_required_columns(
        data,
        _column_name_high,
        _column_name_low,
        _column_name_open,
        _column_name_close,
    )
    sort_cols = _set_sort_cols(data, "symbol", "date")
    if _sort and sort_cols:
        data = data.lazy().sort(sort_cols)
        for col in sort_cols:
            data = data.set_sorted(col)

    # assign window
    window_int: int = _window_format(
        window=window,
        _return_timedelta=True,
        _avg_trading_days=_avg_trading_days,
    ).days

    data = (
        data.lazy()
        .with_columns(
            [
                (pl.col(_column_name_high) / pl.col(_column_name_open))
                .log()
                .alias("log_ho"),
                (pl.col(_column_name_low) / pl.col(_column_name_open))
                .log()
                .alias("log_lo"),
                (pl.col(_column_name_close) / pl.col(_column_name_open))
                .log()
                .alias("log_co"),
                (pl.col(_column_name_open) / pl.col(_column_name_close).shift())
                .log()
                .alias("log_oc"),
                (
                    pl.col(_column_name_close)
                    / pl.col(_column_name_close).shift()
                )
                .log()
                .alias("log_cc"),
            ]
        )
        .with_columns(
            [
                (pl.col("log_oc") ** 2).alias("log_oc_sq"),
                (pl.col("log_cc") ** 2).alias("log_cc_sq"),
                (
                    pl.col("log_ho") * (pl.col("log_ho") - pl.col("log_co"))
                    + pl.col("log_lo") * (pl.col("log_lo") - pl.col("log_co"))
                ).alias("rs"),
            ]
        )
    )

    k = 0.34 / (1.34 + (window_int + 1) / (window_int - 1))
    data = _yang_zhang_engine(data=data, window=window_int)
    result = (
        data.lazy()
        .with_columns(
            (
                (
                    pl.col("open_vol")
                    + k * pl.col("close_vol")
                    + (1 - k) * pl.col("window_rs")
                ).sqrt()
                * np.sqrt(trading_periods)
                * 100
            ).alias(f"yz_volatility_pct_{window_int}D")
        )
        .select(
            pl.exclude(
                [
                    "log_ho",
                    "log_lo",
                    "log_co",
                    "log_oc",
                    "log_cc",
                    "log_oc_sq",
                    "log_cc_sq",
                    "rs",
                    "close_vol",
                    "open_vol",
                    "window_rs",
                ]
            )
        )
    )
    if _drop_nulls:
        return result.drop_nulls(subset=f"yz_volatility_pct_{window_int}D")
    return result

squared_returns ¤

squared_returns(data: DataFrame | LazyFrame, window: str = '1m', trading_periods: int = 252, _drop_nulls: bool = True, _avg_trading_days: bool = False, _column_name_returns: str = 'log_returns', _sort: bool = True) -> LazyFrame

Calculate squared returns over a rolling window.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame \| LazyFrame`	The input data containing the price information.	required
`window`	`str`	The rolling window size for calculating squared returns, by default "1m".	`'1m'`
`trading_periods`	`int`	The number of trading periods in a year, used for scaling the result. The default is 252.	`252`
`_drop_nulls`	`bool`	Whether to drop null values from the result, by default True.	`True`
`_column_name_returns`	`str`	The name of the column containing the price data, by default "close".	`'log_returns'`

Returns:

Type	Description
`DataFrame \| LazyFrame`	The input data structure with an additional column for the rolling squared returns.

Source code in src/humbldata/toolbox/technical/volatility/realized_volatility_helpers.py

def squared_returns(
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    trading_periods: int = 252,
    _drop_nulls: bool = True,
    _avg_trading_days: bool = False,
    _column_name_returns: str = "log_returns",
    _sort: bool = True,
) -> pl.LazyFrame:
    """
    Calculate squared returns over a rolling window.

    Parameters
    ----------
    data : pl.DataFrame | pl.LazyFrame
        The input data containing the price information.
    window : str, optional
        The rolling window size for calculating squared returns, by default "1m".
    trading_periods : int, optional
        The number of trading periods in a year, used for scaling the result.
        The default is 252.
    _drop_nulls : bool, optional
        Whether to drop null values from the result, by default True.
    _column_name_returns : str, optional
        The name of the column containing the price data, by default "close".

    Returns
    -------
    pl.DataFrame | pl.LazyFrame
        The input data structure with an additional column for the rolling
        squared returns.
    """
    _check_required_columns(data, _column_name_returns)

    sort_cols = _set_sort_cols(data, "symbol", "date")
    if _sort and sort_cols:
        data = data.lazy().sort(sort_cols)
        for col in sort_cols:
            data = data.set_sorted(col)

    # assign window
    window_int: int = _window_format(
        window=window,
        _return_timedelta=True,
        _avg_trading_days=_avg_trading_days,
    ).days

    data = data.lazy().with_columns(
        ((pl.col(_column_name_returns) * 100) ** 2).alias("sq_log_returns_pct")
    )
    # Calculate rolling squared returns
    result = (
        data.lazy()
        .with_columns(
            pl.col("sq_log_returns_pct")
            .rolling_mean(window_size=window_int, min_periods=1)
            .alias(f"sq_volatility_pct_{window_int}D")
        )
        .drop("sq_log_returns_pct")
    )
    if _drop_nulls:
        result = result.drop_nulls(subset=f"sq_volatility_pct_{window_int}D")
    return result

realized_volatility_model ¤

Context: Toolbox || Category: Technical || Command: calc_realized_volatility.

A command to generate Realized Volatility for any time series. A complete set of volatility estimators based on Euan Sinclair's Volatility Trading

calc_realized_volatility ¤

calc_realized_volatility(data: DataFrame | LazyFrame, window: str = '1m', method: Literal['std', 'parkinson', 'garman_klass', 'gk', 'hodges_tompkins', 'ht', 'rogers_satchell', 'rs', 'yang_zhang', 'yz', 'squared_returns', 'sq'] = 'std', grouped_mean: list[int] | None = None, _trading_periods: int = 252, _column_name_returns: str = 'log_returns', _column_name_close: str = 'close', _column_name_high: str = 'high', _column_name_low: str = 'low', _column_name_open: str = 'open', *, _sort: bool = True) -> LazyFrame | DataFrame

Context: Toolbox || Category: Technical || Command: calc_realized_volatility.

Calculates the Realized Volatility for a given time series based on the provided standard and extra parameters. This function adds ONE rolling volatility column to the input DataFrame.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame \| LazyFrame`	The time series data for which to calculate the Realized Volatility.	required
`window`	`str`	The window size for a rolling volatility calculation, default is `"1m"` (1 month).	`'1m'`
`method`	`Literal['std', 'parkinson', 'garman_klass', 'hodges_tompkins', 'rogers_satchell', 'yang_zhang', 'squared_returns']`	The volatility estimator to use. You can also use abbreviations to access the same methods. The abbreviations are: `gk` for `garman_klass`, `ht` for `hodges_tompkins`, `rs` for `rogers_satchell`, `yz` for `yang_zhang`, `sq` for `squared_returns`.	`'std'`
`grouped_mean`	`list[int] \| None`	A list of window sizes to use for calculating volatility. If provided, the volatility method will be calculated across these various windows, and then an averaged value of all the windows will be returned. If `None`, a single window size specified by `window` parameter will be used.	`None`
`_sort`	`bool`	If True, the data will be sorted before calculation. Default is True.	`True`
`_trading_periods`	`int`	The number of trading periods in a year, default is 252 (the typical number of trading days in a year).	`252`
`_column_name_returns`	`str`	The name of the column containing the returns. Default is "log_returns".	`'log_returns'`
`_column_name_close`	`str`	The name of the column containing the close prices. Default is "close".	`'close'`
`_column_name_high`	`str`	The name of the column containing the high prices. Default is "high".	`'high'`
`_column_name_low`	`str`	The name of the column containing the low prices. Default is "low".	`'low'`
`_column_name_open`	`str`	The name of the column containing the open prices. Default is "open".	`'open'`

Returns:

Type	Description
`VolatilityData`	The calculated Realized Volatility data for the given time series.

Notes

Rolling calculations are used to show a time series of recent volatility that captures only a certain number of data points. The window size is used to determine the number of data points to use in the calculation. We do this because when looking at the volatility of a stock, you get a better insight (more granular) into the characteristics of the volatility seeing how 1-month or 3-month rolling volatility looked over time.
This function does not accept pl.Series because the methods used to calculate volatility require, high, low, close, open columns for the data. It would be too cumbersome to pass each series needed for the calculation as a separate argument. Therefore, the function only accepts pl.DataFrame or pl.LazyFrame as input.

Source code in src/humbldata/toolbox/technical/volatility/realized_volatility_model.py

def calc_realized_volatility(
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    method: Literal[  # used to be rvol_method
        "std",
        "parkinson",
        "garman_klass",
        "gk",
        "hodges_tompkins",
        "ht",
        "rogers_satchell",
        "rs",
        "yang_zhang",
        "yz",
        "squared_returns",
        "sq",
    ] = "std",
    grouped_mean: list[int] | None = None,  # used to be rv_mean
    _trading_periods: int = 252,
    _column_name_returns: str = "log_returns",
    _column_name_close: str = "close",
    _column_name_high: str = "high",
    _column_name_low: str = "low",
    _column_name_open: str = "open",
    *,
    _sort: bool = True,
) -> pl.LazyFrame | pl.DataFrame:
    """
    Context: Toolbox || Category: Technical || **Command: calc_realized_volatility**.

    Calculates the Realized Volatility for a given time series based on the
    provided standard and extra parameters. This function adds ONE rolling
    volatility column to the input DataFrame.

    Parameters
    ----------
    data : pl.DataFrame | pl.LazyFrame
        The time series data for which to calculate the Realized Volatility.
    window : str
        The window size for a rolling volatility calculation, default is `"1m"`
        (1 month).
    method : Literal["std", "parkinson", "garman_klass", "hodges_tompkins","rogers_satchell", "yang_zhang", "squared_returns"]
        The volatility estimator to use. You can also use abbreviations to
        access the same methods. The abbreviations are: `gk` for `garman_klass`,
        `ht` for `hodges_tompkins`, `rs` for `rogers_satchell`, `yz` for
        `yang_zhang`, `sq` for `squared_returns`.
    grouped_mean : list[int] | None
        A list of window sizes to use for calculating volatility. If provided,
        the volatility method will be calculated across these various windows,
        and then an averaged value of all the windows will be returned. If `None`,
        a single window size specified by `window` parameter will be used.
    _sort : bool
        If True, the data will be sorted before calculation. Default is True.
    _trading_periods : int
        The number of trading periods in a year, default is 252 (the typical
        number of trading days in a year).
    _column_name_returns : str
        The name of the column containing the returns. Default is "log_returns".
    _column_name_close : str
        The name of the column containing the close prices. Default is "close".
    _column_name_high : str
        The name of the column containing the high prices. Default is "high".
    _column_name_low : str
        The name of the column containing the low prices. Default is "low".
    _column_name_open : str
        The name of the column containing the open prices. Default is "open".

    Returns
    -------
    VolatilityData
        The calculated Realized Volatility data for the given time series.

    Notes
    -----
    - Rolling calculations are used to show a time series of recent volatility
    that captures only a certain number of data points. The window size is
    used to determine the number of data points to use in the calculation. We do
    this because when looking at the volatility of a stock, you get a better
    insight (more granular) into the characteristics of the volatility seeing how 1-month or
    3-month rolling volatility looked over time.

    - This function does not accept `pl.Series` because the methods used to
    calculate volatility require, high, low, close, open columns for the data.
    It would be too cumbersome to pass each series needed for the calculation
    as a separate argument. Therefore, the function only accepts `pl.DataFrame`
    or `pl.LazyFrame` as input.
    """  # noqa: W505
    # Step 1: Get the correct realized volatility function =====================
    func = VOLATILITY_METHODS.get(method)
    if not func:
        msg = f"Volatility method: '{method}' is not supported."
        raise HumblDataError(msg)

    # Step 2: Get the names of the parameters that the function accepts ========
    func_params = inspect.signature(func).parameters

    # Step 3: Filter out the parameters not accepted by the function ===========
    args_to_pass = {
        key: value for key, value in locals().items() if key in func_params
    }

    # Step 4: Calculate Realized Volatility ====================================
    if grouped_mean:
        # calculate volatility over multiple windows and average the result, add to a new column
        print("🚧 WIP!")
    else:
        out = func(**args_to_pass)

    return out

quantitative ¤

Context: Toolbox || Category: Quantitative.

Quantitative indicators rely on statistical transformations of time series data.

🧰 Toolbox¤

humbldata.toolbox ¤

toolbox_helpers ¤

log_returns ¤

detrend ¤

cum_sum ¤

std ¤

mean ¤

range_ ¤

toolbox_controller ¤

Toolbox ¤

__init__ ¤

technical property ¤

fundamental property ¤

fundamental ¤

fundamental_controller ¤

Fundamental ¤

humbl_compass ¤

humbl_compass ¤

helpers ¤

model ¤

humbl_compass ¤

view ¤

create_humbl_compass_plot ¤

generate_plots ¤

technical ¤

technical_controller ¤

Technical ¤

mandelbrot_channel ¤

mandelbrot_channel ¤

helpers ¤

add_window_index ¤

vol_buckets ¤

vol_buckets_alt ¤

vol_filter ¤

price_range ¤

model ¤

calc_mandelbrot_channel ¤

acalc_mandelbrot_channel async ¤

calc_mandelbrot_channel_historical ¤

calc_mandelbrot_channel_historical_mp ¤

calc_mandelbrot_channel_historical_concurrent ¤

view ¤

create_historical_plot ¤

create_current_plot ¤

is_historical_data ¤

generate_plot_for_symbol ¤

generate_plots ¤

volatility ¤

realized_volatility_helpers ¤

std ¤

parkinson ¤

garman_klass ¤

hodges_tompkins ¤

rogers_satchell ¤

yang_zhang ¤

squared_returns ¤

realized_volatility_model ¤

calc_realized_volatility ¤

quantitative ¤

init ¤

technical `property` ¤

fundamental `property` ¤

acalc_mandelbrot_channel `async` ¤