Skip to content

🧰 Toolbox¤

humbldata.toolbox ¤

Context: Toolbox.

A category to group all of the technical indicators available in the Toolbox()

Technical indicators rely on statistical transformations of time series data. These are raw math operations.

toolbox_helpers ¤

Context: Toolbox || Category: Helpers.

These Toolbox() helpers are used in various calculations in the toolbox context. Most of the helpers will be mathematical transformations of data. These functions should be DUMB functions.

log_returns ¤

log_returns(data: Series | DataFrame | LazyFrame | None = None, _column_name: str = 'adj_close', *, _drop_nulls: bool = True, _sort: bool = True) -> Series | DataFrame | LazyFrame

Context: Toolbox || Category: Helpers || Command: log_returns.

This is a DUMB command. It can be used in any CONTEXT or CATEGORY. Calculates the logarithmic returns for a given Polars Series, DataFrame, or LazyFrame. Logarithmic returns are widely used in the financial industry to measure the rate of return on investments over time. This function supports calculations on both individual series and dataframes containing financial time series data.

Parameters:

Name Type Description Default
data Series | DataFrame | LazyFrame

The input data for which to calculate the log returns. Default is None.

None
_drop_nulls bool

Whether to drop null values from the result. Default is True.

True
_column_name str

The column name to use for log return calculations in DataFrame or LazyFrame. Default is "adj_close".

'adj_close'
_sort bool

If True, sorts the DataFrame or LazyFrame by date and symbol before calculation. If you want a DUMB function, set to False. Default is True.

True

Returns:

Type Description
Series | DataFrame | LazyFrame

The original data, with an extra column of log returns of the input data. The return type matches the input type.

Raises:

Type Description
HumblDataError

If neither a series, DataFrame, nor LazyFrame is provided as input.

Examples:

>>> series = pl.Series([100, 105, 103])
>>> log_returns(data=series)
series([-inf, 0.048790, -0.019418])
>>> df = pl.DataFrame({"adj_close": [100, 105, 103]})
>>> log_returns(data=df)
shape: (3, 2)
┌───────────┬────────────┐
│ adj_close ┆ log_returns│
│ ---       ┆ ---        │
│ f64       ┆ f64        │
╞═══════════╪════════════╡
│ 100.0     ┆ NaN        │
├───────────┼────────────┤
│ 105.0     ┆ 0.048790   │
├───────────┼────────────┤
│ 103.0     ┆ -0.019418  │
└───────────┴────────────┘
Improvements

Add a parameter _sort_cols: list[str] | None = None to make the function even dumber. This way you could specify certain columns to sort by instead of using default date and symbol. If _sort_cols=None and _sort=True, then the function will use the default date and symbol columns for sorting.

Source code in src/humbldata/toolbox/toolbox_helpers.py
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
def log_returns(
    data: pl.Series | pl.DataFrame | pl.LazyFrame | None = None,
    _column_name: str = "adj_close",
    *,
    _drop_nulls: bool = True,
    _sort: bool = True,
) -> pl.Series | pl.DataFrame | pl.LazyFrame:
    """
    Context: Toolbox || Category: Helpers || **Command: log_returns**.

    This is a DUMB command. It can be used in any CONTEXT or CATEGORY.
    Calculates the logarithmic returns for a given Polars Series, DataFrame, or
    LazyFrame. Logarithmic returns are widely used in the financial
    industry to measure the rate of return on investments over time. This
    function supports calculations on both individual series and dataframes
    containing financial time series data.

    Parameters
    ----------
    data : pl.Series | pl.DataFrame | pl.LazyFrame, optional
        The input data for which to calculate the log returns. Default is None.
    _drop_nulls : bool, optional
        Whether to drop null values from the result. Default is True.
    _column_name : str, optional
        The column name to use for log return calculations in DataFrame or
        LazyFrame. Default is "adj_close".
    _sort : bool, optional
        If True, sorts the DataFrame or LazyFrame by `date` and `symbol` before
        calculation. If you want a DUMB function, set to False.
        Default is True.

    Returns
    -------
    pl.Series | pl.DataFrame | pl.LazyFrame
        The original `data`, with an extra column of `log returns` of the input
        data. The return type matches the input type.

    Raises
    ------
    HumblDataError
        If neither a series, DataFrame, nor LazyFrame is provided as input.

    Examples
    --------
    >>> series = pl.Series([100, 105, 103])
    >>> log_returns(data=series)
    series([-inf, 0.048790, -0.019418])

    >>> df = pl.DataFrame({"adj_close": [100, 105, 103]})
    >>> log_returns(data=df)
    shape: (3, 2)
    ┌───────────┬────────────┐
    │ adj_close ┆ log_returns│
    │ ---       ┆ ---        │
    │ f64       ┆ f64        │
    ╞═══════════╪════════════╡
    │ 100.0     ┆ NaN        │
    ├───────────┼────────────┤
    │ 105.0     ┆ 0.048790   │
    ├───────────┼────────────┤
    │ 103.0     ┆ -0.019418  │
    └───────────┴────────────┘

    Improvements
    -----------
    Add a parameter `_sort_cols: list[str] | None = None` to make the function even
    dumber. This way you could specify certain columns to sort by instead of
    using default `date` and `symbol`. If `_sort_cols=None` and `_sort=True`,
    then the function will use the default `date` and `symbol` columns for
    sorting.

    """
    # Calculation for Polars Series
    if isinstance(data, pl.Series):
        out = data.log().diff()
        if _drop_nulls:
            out = out.drop_nulls()
    # Calculation for Polars DataFrame or LazyFrame
    elif isinstance(data, pl.DataFrame | pl.LazyFrame):
        sort_cols = _set_sort_cols(data, "symbol", "date")
        if _sort and sort_cols:
            data = data.sort(sort_cols)
            for col in sort_cols:
                data = data.set_sorted(col)
        elif _sort and not sort_cols:
            msg = "Data must contain 'symbol' and 'date' columns for sorting."
            raise HumblDataError(msg)

        if "log_returns" not in data.collect_schema().names():
            out = data.with_columns(
                pl.col(_column_name).log().diff().alias("log_returns")
            )
        else:
            out = data
        if _drop_nulls:
            out = out.drop_nulls(subset="log_returns")
    else:
        msg = "No valid data type was provided for `log_returns()` calculation."
        raise HumblDataError(msg)

    return out

detrend ¤

detrend(data: DataFrame | LazyFrame | Series, _detrend_col: str = 'log_returns', _detrend_value_col: str | Series | None = 'window_mean', *, _sort: bool = False) -> DataFrame | LazyFrame | Series

Context: Toolbox || Category: Helpers || Command: detrend.

This is a DUMB command. It can be used in any CONTEXT or CATEGORY.

Detrends a column in a DataFrame, LazyFrame, or Series by subtracting the values of another column from it. Optionally sorts the data by 'symbol' and 'date' before detrending if _sort is True.

Parameters:

Name Type Description Default
data Union[DataFrame, LazyFrame, Series]

The data structure containing the columns to be processed.

required
_detrend_col str

The name of the column from which values will be subtracted.

'log_returns'
_detrend_value_col str | Series | None

The name of the column whose values will be subtracted OR if you pass a pl.Series to the data parameter, then you can use this to pass a second pl.Series to subtract from the first.

'window_mean'
_sort bool

If True, sorts the data by 'symbol' and 'date' before detrending. Default is False.

False

Returns:

Type Description
Union[DataFrame, LazyFrame, Series]

The detrended data structure with the same type as the input, with an added column named f"detrended_{_detrend_col}".

Notes

Function doesn't use .over() in calculation. Once the data is sorted, subtracting _detrend_value_col from _detrend_col is a simple operation that doesn't need to be grouped, because the sorting has already aligned the rows for subtraction

Source code in src/humbldata/toolbox/toolbox_helpers.py
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
def detrend(
    data: pl.DataFrame | pl.LazyFrame | pl.Series,
    _detrend_col: str = "log_returns",
    _detrend_value_col: str | pl.Series | None = "window_mean",
    *,
    _sort: bool = False,
) -> pl.DataFrame | pl.LazyFrame | pl.Series:
    """
    Context: Toolbox || Category: Helpers || **Command: detrend**.

    This is a DUMB command. It can be used in any CONTEXT or CATEGORY.

    Detrends a column in a DataFrame, LazyFrame, or Series by subtracting the
    values of another column from it. Optionally sorts the data by 'symbol' and
    'date' before detrending if _sort is True.

    Parameters
    ----------
    data : Union[pl.DataFrame, pl.LazyFrame, pl.Series]
        The data structure containing the columns to be processed.
    _detrend_col : str
        The name of the column from which values will be subtracted.
    _detrend_value_col : str | pl.Series | None, optional
        The name of the column whose values will be subtracted OR if you
        pass a pl.Series to the `data` parameter, then you can use this to
        pass a second `pl.Series` to subtract from the first.
    _sort : bool, optional
        If True, sorts the data by 'symbol' and 'date' before detrending.
        Default is False.

    Returns
    -------
    Union[pl.DataFrame, pl.LazyFrame, pl.Series]
        The detrended data structure with the same type as the input,
        with an added column named `f"detrended_{_detrend_col}"`.

    Notes
    -----
    Function doesn't use `.over()` in calculation. Once the data is sorted,
    subtracting _detrend_value_col from _detrend_col is a simple operation
    that doesn't need to be grouped, because the sorting has already aligned
    the rows for subtraction
    """
    if isinstance(data, pl.DataFrame | pl.LazyFrame):
        sort_cols = _set_sort_cols(data, "symbol", "date")
        if _sort and sort_cols:
            data = data.sort(sort_cols)
            for col in sort_cols:
                data = data.set_sorted(col)
        elif _sort and not sort_cols:
            msg = "Data must contain 'symbol' and 'date' columns for sorting."
            raise HumblDataError(msg)

    if isinstance(data, pl.DataFrame | pl.LazyFrame):
        col_names = data.collect_schema().names()
        if _detrend_value_col not in col_names or _detrend_col not in col_names:
            msg = f"Both {_detrend_value_col} and {_detrend_col} must be columns in the data."
            raise HumblDataError(msg)
        detrended = data.with_columns(
            (pl.col(_detrend_col) - pl.col(_detrend_value_col)).alias(
                f"detrended_{_detrend_col}"
            )
        )
    elif isinstance(data, pl.Series):
        if not isinstance(_detrend_value_col, pl.Series):
            msg = "When 'data' is a Series, '_detrend_value_col' must also be a Series."
            raise HumblDataError(msg)
        detrended = data - _detrend_value_col
        detrended.rename(f"detrended_{_detrend_col}")

    return detrended

cum_sum ¤

cum_sum(data: DataFrame | LazyFrame | Series | None = None, _column_name: str = 'detrended_returns', *, _sort: bool = True, _mandelbrot_usage: bool = True) -> LazyFrame | DataFrame | Series

Context: Toolbox || Category: Helpers || Command: cum_sum.

This is a DUMB command. It can be used in any CONTEXT or CATEGORY.

Calculate the cumulative sum of a series or column in a DataFrame or LazyFrame.

Parameters:

Name Type Description Default
data DataFrame | LazyFrame | Series | None

The data to process.

None
_column_name str

The name of the column to calculate the cumulative sum on, applicable if df is provided.

'detrended_returns'
_sort bool

If True, sorts the DataFrame or LazyFrame by date and symbol before calculation. Default is True.

True
_mandelbrot_usage bool

If True, performs additional checks specific to the Mandelbrot Channel calculation. This should be set to True when you have a cumulative deviate series, and False when not. Please check 'Notes' for more information. Default is True.

True

Returns:

Type Description
DataFrame | LazyFrame | Series

The DataFrame or Series with the cumulative deviate series added as a new column or as itself.

Notes

This function is used to calculate the cumulative sum for the deviate series of detrended returns for the data in the pipeline for calc_mandelbrot_channel.

So, although it is calculating a cumulative sum, it is known as a cumulative deviate because it is a cumulative sum on a deviate series, meaning that the cumulative sum should = 0 for each window. The _mandelbrot_usage parameter allows for checks to ensure the data is suitable for Mandelbrot Channel calculations, i.e that the deviate series was calculated correctly by the end of each series being 0, meaning the trend (the mean over the window_index) was successfully removed from the data.

Source code in src/humbldata/toolbox/toolbox_helpers.py
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
def cum_sum(
    data: pl.DataFrame | pl.LazyFrame | pl.Series | None = None,
    _column_name: str = "detrended_returns",
    *,
    _sort: bool = True,
    _mandelbrot_usage: bool = True,
) -> pl.LazyFrame | pl.DataFrame | pl.Series:
    """
    Context: Toolbox || Category: Helpers || **Command: cum_sum**.

    This is a DUMB command. It can be used in any CONTEXT or CATEGORY.

    Calculate the cumulative sum of a series or column in a DataFrame or
    LazyFrame.

    Parameters
    ----------
    data : pl.DataFrame | pl.LazyFrame | pl.Series | None
        The data to process.
    _column_name : str
        The name of the column to calculate the cumulative sum on,
        applicable if df is provided.
    _sort : bool, optional
        If True, sorts the DataFrame or LazyFrame by date and symbol before
        calculation. Default is True.
    _mandelbrot_usage : bool, optional
        If True, performs additional checks specific to the Mandelbrot Channel
        calculation. This should be set to True when you have a cumulative
        deviate series, and False when not. Please check 'Notes' for more
        information. Default is True.

    Returns
    -------
    pl.DataFrame | pl.LazyFrame | pl.Series
        The DataFrame or Series with the cumulative deviate series added as a
        new column or as itself.

    Notes
    -----
    This function is used to calculate the cumulative sum for the deviate series
    of detrended returns for the data in the pipeline for
    `calc_mandelbrot_channel`.

    So, although it is calculating a cumulative sum, it is known as a cumulative
    deviate because it is a cumulative sum on a deviate series, meaning that the
    cumulative sum should = 0 for each window. The _mandelbrot_usage parameter
    allows for checks to ensure the data is suitable for Mandelbrot Channel
    calculations, i.e that the deviate series was calculated correctly by the
    end of each series being 0, meaning the trend (the mean over the
    window_index) was successfully removed from the data.
    """
    if isinstance(data, pl.DataFrame | pl.LazyFrame):
        sort_cols = _set_sort_cols(data, "symbol", "date")
        if _sort and sort_cols:
            data = data.sort(sort_cols)
            for col in sort_cols:
                data = data.set_sorted(col)

        over_cols = _set_over_cols(data, "symbol", "window_index")
        if over_cols:
            out = data.with_columns(
                pl.col(_column_name).cum_sum().over(over_cols).alias("cum_sum")
            )
        else:
            out = data.with_columns(
                pl.col(_column_name).cum_sum().alias("cum_sum")
            )
    elif isinstance(data, pl.Series):
        out = data.cum_sum().alias("cum_sum")
    else:
        msg = "No DataFrame/LazyFrame/Series was provided."
        raise HumblDataError(msg)

    if _mandelbrot_usage:
        _cumsum_check(out, _column_name="cum_sum")

    return out

std ¤

std(data: LazyFrame | DataFrame | Series, _column_name: str = 'cum_sum', *, _sort: bool = True) -> LazyFrame | DataFrame | Series

Context: Toolbox || Category: Helpers || Command: std.

Calculate the standard deviation of the cumulative deviate series within each window of the dataset.

Parameters:

Name Type Description Default
df LazyFrame

The LazyFrame from which to calculate the standard deviation.

required
_column_name str

The name of the column from which to calculate the standard deviation, with "cumdev" as the default value.

'cum_sum'
_sort bool

If True, sorts the DataFrame or LazyFrame by date and symbol before calculation. Default is True.

True

Returns:

Type Description
LazyFrame

A LazyFrame with the standard deviation of the specified column for each window, added as a new column named "S".

Improvements

Just need to parametrize .over() call in the function if want an even dumber function, that doesn't calculate each window_index.

Source code in src/humbldata/toolbox/toolbox_helpers.py
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
def std(
    data: pl.LazyFrame | pl.DataFrame | pl.Series,
    _column_name: str = "cum_sum",
    *,
    _sort: bool = True,
) -> pl.LazyFrame | pl.DataFrame | pl.Series:
    """
    Context: Toolbox || Category: Helpers || **Command: std**.

    Calculate the standard deviation of the cumulative deviate series within
    each window of the dataset.

    Parameters
    ----------
    df : pl.LazyFrame
        The LazyFrame from which to calculate the standard deviation.
    _column_name : str, optional
        The name of the column from which to calculate the standard deviation,
        with "cumdev" as the default value.
    _sort : bool, optional
        If True, sorts the DataFrame or LazyFrame by date and symbol before
        calculation. Default is True.

    Returns
    -------
    pl.LazyFrame
        A LazyFrame with the standard deviation of the specified column for each
        window, added as a new column named "S".

    Improvements
    -----------
    Just need to parametrize `.over()` call in the function if want an even
    dumber function, that doesn't calculate each `window_index`.
    """
    if isinstance(data, pl.Series):
        out = data.std()
    elif isinstance(data, pl.DataFrame | pl.LazyFrame):
        sort_cols = _set_sort_cols(data, "symbol", "date")
        over_cols = _set_over_cols(data, "symbol", "window_index")
        if _sort and sort_cols:
            data = data.sort(sort_cols)
            for col in sort_cols:
                data = data.set_sorted(col)

        if over_cols:
            out = data.with_columns(
                [
                    pl.col(_column_name)
                    .std()
                    .over(over_cols)
                    .alias(f"{_column_name}_std"),  # used to be 'S'
                ]
            )
        else:
            out = data.with_columns(
                pl.col(_column_name).std().alias("S"),
            )

    return out

mean ¤

mean(data: DataFrame | LazyFrame | Series, _column_name: str = 'log_returns', *, _sort: bool = True) -> DataFrame | LazyFrame

Context: Toolbox || Category: Helpers || Function: mean.

This is a DUMB command. It can be used in any CONTEXT or CATEGORY.

This function calculates the mean of a column (<_column_name>) over a each window in the dataset, if there are any. This window is intended to be the window that is passed in the calc_mandelbrot_channel() function. The mean calculated is meant to be used as the mean of each window within the time series. This way, each block of windows has their own mean, which can then be used to normalize the data (i.e remove the mean) from each window section.

Parameters:

Name Type Description Default
data DataFrame | LazyFrame

The DataFrame or LazyFrame to calculate the mean on.

required
_column_name str

The name of the column to calculate the mean on.

'log_returns'
_sort bool

If True, sorts the DataFrame or LazyFrame by date before calculation. Default is False.

True

Returns:

Type Description
DataFrame | LazyFrame

The original DataFrame or LazyFrame with a window_mean & date column, which contains the mean of 'log_returns' per range/window.

Notes

Since this function is an aggregation function, it reduces the # of observations in the dataset,thus, unless I take each value and iterate each window_mean value to correlate to the row in the original dataframe, the function will return a dataframe WITHOUT the original data.

Source code in src/humbldata/toolbox/toolbox_helpers.py
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
def mean(
    data: pl.DataFrame | pl.LazyFrame | pl.Series,
    _column_name: str = "log_returns",
    *,
    _sort: bool = True,
) -> pl.DataFrame | pl.LazyFrame:
    """
    Context: Toolbox || Category: Helpers || **Function: mean**.

    This is a DUMB command. It can be used in any CONTEXT or CATEGORY.

    This function calculates the mean of a column (<_column_name>) over a
    each window in the dataset, if there are any.
    This window is intended to be the `window` that is passed in the
    `calc_mandelbrot_channel()` function. The mean calculated is meant to be
    used as the mean of each `window` within the time series. This
    way, each block of windows has their own mean, which can then be used to
    normalize the data (i.e remove the mean) from each window section.

    Parameters
    ----------
    data : pl.DataFrame | pl.LazyFrame
        The DataFrame or LazyFrame to calculate the mean on.
    _column_name : str
        The name of the column to calculate the mean on.
    _sort : bool
        If True, sorts the DataFrame or LazyFrame by date before calculation.
        Default is False.

    Returns
    -------
    pl.DataFrame | pl.LazyFrame
        The original DataFrame or LazyFrame with a `window_mean` & `date` column,
        which contains the mean of 'log_returns' per range/window.


    Notes
    -----
    Since this function is an aggregation function, it reduces the # of
    observations in the dataset,thus, unless I take each value and iterate each
    window_mean value to correlate to the row in the original dataframe, the
    function will return a dataframe WITHOUT the original data.

    """
    if isinstance(data, pl.Series):
        out = data.mean()
    else:
        if data is None:
            msg = "No DataFrame was passed to the `mean()` function."
            raise HumblDataError(msg)
        sort_cols = _set_sort_cols(data, "symbol", "date")
        over_cols = _set_over_cols(data, "symbol", "window_index")
        if _sort and sort_cols:  # Check if _sort is True
            data = data.sort(sort_cols)
            for col in sort_cols:
                data = data.set_sorted(col)
        if over_cols:
            out = data.with_columns(
                pl.col(_column_name).mean().over(over_cols).alias("window_mean")
            )
        else:
            out = data.with_columns(pl.col(_column_name).mean().alias("mean"))
        if _sort and sort_cols:
            out = out.sort(sort_cols)
    return out

range_ ¤

range_(data: LazyFrame | DataFrame | Series, _column_name: str = 'cum_sum', *, _sort: bool = True) -> LazyFrame | DataFrame | Series

Context: Toolbox || Category: Technical || Sub-Category: MandelBrot Channel || Sub-Category: Helpers || Function: mandelbrot_range.

Calculate the range (max - min) of the cumulative deviate values of a specified column in a DataFrame for each window in the dataset, if there are any.

Parameters:

Name Type Description Default
data LazyFrame

The DataFrame to calculate the range from.

required
_column_name str

The column to calculate the range from, by default "cumdev".

'cum_sum'

Returns:

Type Description
LazyFrame | DataFrame

A DataFrame with the range of the specified column for each window.

Source code in src/humbldata/toolbox/toolbox_helpers.py
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
def range_(
    data: pl.LazyFrame | pl.DataFrame | pl.Series,
    _column_name: str = "cum_sum",
    *,
    _sort: bool = True,
) -> pl.LazyFrame | pl.DataFrame | pl.Series:
    """
    Context: Toolbox || Category: Technical || Sub-Category: MandelBrot Channel || Sub-Category: Helpers || **Function: mandelbrot_range**.

    Calculate the range (max - min) of the cumulative deviate values of a
    specified column in a DataFrame for each window in the dataset, if there are any.

    Parameters
    ----------
    data : pl.LazyFrame
        The DataFrame to calculate the range from.
    _column_name : str, optional
        The column to calculate the range from, by default "cumdev".

    Returns
    -------
    pl.LazyFrame | pl.DataFrame
        A DataFrame with the range of the specified column for each window.
    """
    if isinstance(data, pl.Series):
        out = data.max() - data.min()

    if isinstance(data, pl.LazyFrame | pl.DataFrame):
        sort_cols = _set_sort_cols(data, "symbol", "date")
        over_cols = _set_over_cols(data, "symbol", "window_index")
        if _sort and sort_cols:
            data = data.sort(sort_cols)
            for col in sort_cols:
                data = data.set_sorted(col)
        if over_cols:
            out = (
                data.with_columns(
                    [
                        pl.col(_column_name)
                        .min()
                        .over(over_cols)
                        .alias(f"{_column_name}_min"),
                        pl.col(_column_name)
                        .max()
                        .over(over_cols)
                        .alias(f"{_column_name}_max"),
                    ]
                )
                .sort(sort_cols)
                .with_columns(
                    (
                        pl.col(f"{_column_name}_max")
                        - pl.col(f"{_column_name}_min")
                    ).alias(f"{_column_name}_range"),  # used to be 'R'
                )
            )
    else:
        out = (
            data.with_columns(
                [
                    pl.col(_column_name).min().alias(f"{_column_name}_min"),
                    pl.col(_column_name).max().alias(f"{_column_name}_max"),
                ]
            )
            .sort(sort_cols)
            .with_columns(
                (
                    pl.col(f"{_column_name}_max")
                    - pl.col(f"{_column_name}_min")
                ).alias(f"{_column_name}_range"),
            )
        )

    return out

toolbox_controller ¤

Context: Toolbox.

The Toolbox Controller Module.

Toolbox ¤

Bases: ToolboxQueryParams

A top-level controller for data analysis tools in humblDATA.

This module serves as the primary controller, routing user-specified ToolboxQueryParams as core arguments that are used to fetch time series data.

The Toolbox controller also gives access to all sub-modules adn their functions.

It is designed to facilitate the collection of data across various types such as stocks, options, or alternative time series by requiring minimal input from the user.

Submodules

The Toolbox controller is composed of the following submodules:

  • technical:
  • quantitative:
  • fundamental:

Parameters:

Name Type Description Default
symbol str

The symbol or ticker of the stock.

required
interval str

The interval of the data. Defaults to '1d'.

required
start_date str

The start date for the data query.

required
end_date str

The end date for the data query.

required
provider str

The provider to use for the data query. Defaults to 'yfinance'.

required
Parameter Notes

The parameters (symbol, interval, start_date, end_date) are the ToolboxQueryParams. They are used for data collection further down the pipeline in other commands. Intended to execute operations on core data sets. This approach enables composable and standardized querying while accommodating data-specific collection logic.

Source code in src/humbldata/toolbox/toolbox_controller.py
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
class Toolbox(ToolboxQueryParams):
    """

    A top-level <context> controller for data analysis tools in `humblDATA`.

    This module serves as the primary controller, routing user-specified
    ToolboxQueryParams as core arguments that are used to fetch time series
    data.

    The `Toolbox` controller also gives access to all sub-modules adn their
    functions.

    It is designed to facilitate the collection of data across various types such as
    stocks, options, or alternative time series by requiring minimal input from the user.

    Submodules
    ----------
    The `Toolbox` controller is composed of the following submodules:

    - `technical`:
    - `quantitative`:
    - `fundamental`:

    Parameters
    ----------
    symbol : str
        The symbol or ticker of the stock.
    interval : str, optional
        The interval of the data. Defaults to '1d'.
    start_date : str
        The start date for the data query.
    end_date : str
        The end date for the data query.
    provider : str, optional
        The provider to use for the data query. Defaults to 'yfinance'.

    Parameter Notes
    -----
    The parameters (`symbol`, `interval`, `start_date`, `end_date`)
    are the `ToolboxQueryParams`. They are used for data collection further
    down the pipeline in other commands. Intended to execute operations on core
    data sets. This approach enables composable and standardized querying while
    accommodating data-specific collection logic.
    """

    def __init__(self, *args, **kwargs):
        """
        Initialize the Toolbox module.

        This method does not take any parameters and does not return anything.
        """
        super().__init__(*args, **kwargs)

    @property
    def technical(self):
        """
        The technical submodule of the Toolbox controller.

        Access to all the technical indicators. WHen the Toolbox class is
        instatiated the parameters are initialized with the ToolboxQueryParams
        class, which hold all the fields needed for the context_params, like the
        symbol, interval, start_date, and end_date.
        """
        return Technical(context_params=self)
__init__ ¤
__init__(*args, **kwargs)

Initialize the Toolbox module.

This method does not take any parameters and does not return anything.

Source code in src/humbldata/toolbox/toolbox_controller.py
56
57
58
59
60
61
62
def __init__(self, *args, **kwargs):
    """
    Initialize the Toolbox module.

    This method does not take any parameters and does not return anything.
    """
    super().__init__(*args, **kwargs)
technical property ¤
technical

The technical submodule of the Toolbox controller.

Access to all the technical indicators. WHen the Toolbox class is instatiated the parameters are initialized with the ToolboxQueryParams class, which hold all the fields needed for the context_params, like the symbol, interval, start_date, and end_date.

fundamental ¤

Context: Toolbox || Category: Fundamental.

A category to group all of the fundamental indicators available in the Toolbox().

Fundamental indicators relies on earnings data, valuation models of companies, balance sheet metrics etc...

quantitative ¤

Context: Toolbox || Category: Quantitative.

Quantitative indicators rely on statistical transformations of time series data.

technical ¤

technical_controller ¤

Context: Toolbox || Category: Technical.

A controller to manage and compile all of the technical indicator models available. This will be passed as a @property to the Toolbox() class, giving access to the technical module and its functions.

Technical ¤

Module for all technical analysis.

Attributes:

Name Type Description
context_params ToolboxQueryParams

The standard query parameters for toolbox data.

Methods:

Name Description
mandelbrot_channel

Calculate the rescaled range statistics.

Source code in src/humbldata/toolbox/technical/technical_controller.py
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
class Technical:
    """
    Module for all technical analysis.

    Attributes
    ----------
    context_params : ToolboxQueryParams
        The standard query parameters for toolbox data.

    Methods
    -------
    mandelbrot_channel(command_params: MandelbrotChannelQueryParams)
        Calculate the rescaled range statistics.

    """

    def __init__(self, context_params: ToolboxQueryParams):
        self.context_params = context_params

    def mandelbrot_channel(self, **kwargs: MandelbrotChannelQueryParams):
        """
        Calculate the Mandelbrot Channel.

        Parameters
        ----------
        window : str, optional
            The width of the window used for splitting the data into sections for
            detrending. Defaults to "1mo".
        rv_adjustment : bool, optional
            Whether to adjust the calculation for realized volatility. If True, the
            data is filtered to only include observations in the same volatility bucket
            that the stock is currently in. Defaults to True.
        rv_method : str, optional
            The method to calculate the realized volatility. Only need to define
            when rv_adjustment is True. Defaults to "std".
        rs_method : Literal["RS", "RS_min", "RS_max", "RS_mean"], optional
            The method to use for Range/STD calculation. This is either min, max
            or mean of all RS ranges per window. If not defined, just used the
            most recent RS window. Defaults to "RS".
        rv_grouped_mean : bool, optional
            Whether to calculate the mean value of realized volatility over
            multiple window lengths. Defaults to False.
        live_price : bool, optional
            Whether to calculate the ranges using the current live price, or the
            most recent 'close' observation. Defaults to False.
        historical : bool, optional
            Whether to calculate the Historical Mandelbrot Channel (over-time), and
            return a time-series of channels from the start to the end date. If
            False, the Mandelbrot Channel calculation is done aggregating all of the
            data into one observation. If True, then it will enable daily
            observations over-time. Defaults to False.
        chart : bool, optional
            Whether to return a chart object. Defaults to False.
        template : str, optional
            The template/theme to use for the plotly figure. Defaults to "humbl_dark".

        Returns
        -------
        HumblObject
            An object containing the Mandelbrot Channel data and metadata.
        """
        from humbldata.core.standard_models.toolbox.technical.mandelbrot_channel import (
            MandelbrotChannelFetcher,
        )

        # Instantiate the Fetcher with the query parameters
        fetcher = MandelbrotChannelFetcher(
            context_params=self.context_params, command_params=kwargs
        )

        # Use the fetcher to get the data
        return fetcher.fetch_data()
mandelbrot_channel ¤
mandelbrot_channel(**kwargs: MandelbrotChannelQueryParams)

Calculate the Mandelbrot Channel.

Parameters:

Name Type Description Default
window str

The width of the window used for splitting the data into sections for detrending. Defaults to "1mo".

required
rv_adjustment bool

Whether to adjust the calculation for realized volatility. If True, the data is filtered to only include observations in the same volatility bucket that the stock is currently in. Defaults to True.

required
rv_method str

The method to calculate the realized volatility. Only need to define when rv_adjustment is True. Defaults to "std".

required
rs_method Literal[RS, RS_min, RS_max, RS_mean]

The method to use for Range/STD calculation. This is either min, max or mean of all RS ranges per window. If not defined, just used the most recent RS window. Defaults to "RS".

required
rv_grouped_mean bool

Whether to calculate the mean value of realized volatility over multiple window lengths. Defaults to False.

required
live_price bool

Whether to calculate the ranges using the current live price, or the most recent 'close' observation. Defaults to False.

required
historical bool

Whether to calculate the Historical Mandelbrot Channel (over-time), and return a time-series of channels from the start to the end date. If False, the Mandelbrot Channel calculation is done aggregating all of the data into one observation. If True, then it will enable daily observations over-time. Defaults to False.

required
chart bool

Whether to return a chart object. Defaults to False.

required
template str

The template/theme to use for the plotly figure. Defaults to "humbl_dark".

required

Returns:

Type Description
HumblObject

An object containing the Mandelbrot Channel data and metadata.

Source code in src/humbldata/toolbox/technical/technical_controller.py
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
def mandelbrot_channel(self, **kwargs: MandelbrotChannelQueryParams):
    """
    Calculate the Mandelbrot Channel.

    Parameters
    ----------
    window : str, optional
        The width of the window used for splitting the data into sections for
        detrending. Defaults to "1mo".
    rv_adjustment : bool, optional
        Whether to adjust the calculation for realized volatility. If True, the
        data is filtered to only include observations in the same volatility bucket
        that the stock is currently in. Defaults to True.
    rv_method : str, optional
        The method to calculate the realized volatility. Only need to define
        when rv_adjustment is True. Defaults to "std".
    rs_method : Literal["RS", "RS_min", "RS_max", "RS_mean"], optional
        The method to use for Range/STD calculation. This is either min, max
        or mean of all RS ranges per window. If not defined, just used the
        most recent RS window. Defaults to "RS".
    rv_grouped_mean : bool, optional
        Whether to calculate the mean value of realized volatility over
        multiple window lengths. Defaults to False.
    live_price : bool, optional
        Whether to calculate the ranges using the current live price, or the
        most recent 'close' observation. Defaults to False.
    historical : bool, optional
        Whether to calculate the Historical Mandelbrot Channel (over-time), and
        return a time-series of channels from the start to the end date. If
        False, the Mandelbrot Channel calculation is done aggregating all of the
        data into one observation. If True, then it will enable daily
        observations over-time. Defaults to False.
    chart : bool, optional
        Whether to return a chart object. Defaults to False.
    template : str, optional
        The template/theme to use for the plotly figure. Defaults to "humbl_dark".

    Returns
    -------
    HumblObject
        An object containing the Mandelbrot Channel data and metadata.
    """
    from humbldata.core.standard_models.toolbox.technical.mandelbrot_channel import (
        MandelbrotChannelFetcher,
    )

    # Instantiate the Fetcher with the query parameters
    fetcher = MandelbrotChannelFetcher(
        context_params=self.context_params, command_params=kwargs
    )

    # Use the fetcher to get the data
    return fetcher.fetch_data()

mandelbrot_channel ¤

model ¤

Context: Toolbox || Category: Technical || Command: calc_mandelbrot_channel.

A command to generate a Mandelbrot Channel for any time series.

calc_mandelbrot_channel ¤
calc_mandelbrot_channel(data: DataFrame | LazyFrame, window: str = '1m', rv_method: str = 'std', rs_method: Literal['RS', 'RS_mean', 'RS_max', 'RS_min'] = 'RS', *, rv_adjustment: bool = True, rv_grouped_mean: bool = True, live_price: bool = True, **kwargs) -> LazyFrame

Context: Toolbox || Category: Technical || Command: calc_mandelbrot_channel.

This command calculates the Mandelbrot Channel for a given time series, utilizing various parameters to adjust the calculation. The Mandelbrot Channel provides insights into the volatility and price range of a stock over a specified window.

Parameters:

Name Type Description Default
data DataFrame | LazyFrame

The time series data for which to calculate the Mandelbrot Channel. There needs to be a close and date column.

required
window str

The window size for the calculation, specified as a string. This determines the period over which the channel is calculated.

'1m'
rv_adjustment bool

Adjusts the calculation for realized volatility. If True, filters the data to include only observations within the current volatility bucket of the stock.

True
rv_grouped_mean bool

Determines whether to use the grouped mean in the realized volatility calculation.

True
rv_method str

Specifies the method for calculating realized volatility, applicable only if rv_adjustment is True.

'std'
rs_method Literal['RS', 'RS_mean', 'RS_max', 'RS_min']

Defines the method for calculating the range over standard deviation, affecting the width of the Mandelbrot Channel. Options include RS, RS_mean, RS_min, and RS_max.

'RS'
live_price bool

Indicates whether to incorporate live price data into the calculation, which may extend the calculation time by 1-3 seconds.

True
**kwargs

Additional keyword arguments to pass to the function, if you want to change the behavior or pass parameters to internal functions.

{}

Returns:

Type Description
LazyFrame

A LazyFrame containing the calculated Mandelbrot Channel data for the specified time series.

Notes

The function returns a pl.LazyFrame; remember to call .collect() on the result to obtain a DataFrame. This lazy evaluation strategy postpones the calculation until it is explicitly requested.

Example

To calculate the Mandelbrot Channel for a yearly window with adjustments for realized volatility using the 'yz' method, and incorporating live price data:

mandelbrot_channel = calc_mandelbrot_channel(
    data,
    window="1y",
    rv_adjustment=True,
    rv_method="yz",
    rv_grouped_mean=False,
    rs_method="RS",
    live_price=True
).collect()
Source code in src/humbldata/toolbox/technical/mandelbrot_channel/model.py
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
def calc_mandelbrot_channel(  # noqa: PLR0913
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    rv_method: str = "std",
    rs_method: Literal["RS", "RS_mean", "RS_max", "RS_min"] = "RS",
    *,
    rv_adjustment: bool = True,
    rv_grouped_mean: bool = True,
    live_price: bool = True,
    **kwargs,
) -> pl.LazyFrame:
    """
    Context: Toolbox || Category: Technical || **Command: calc_mandelbrot_channel**.

    This command calculates the Mandelbrot Channel for a given time series, utilizing various parameters to adjust the calculation. The Mandelbrot Channel provides insights into the volatility and price range of a stock over a specified window.

    Parameters
    ----------
    data: pl.DataFrame | pl.LazyFrame
        The time series data for which to calculate the Mandelbrot Channel.
        There needs to be a `close` and `date` column.
    window: str, default "1m"
        The window size for the calculation, specified as a string. This
        determines the period over which the channel is calculated.
    rv_adjustment: bool, default True
        Adjusts the calculation for realized volatility. If True, filters the
        data to include only observations within the current volatility bucket
        of the stock.
    rv_grouped_mean: bool, default True
        Determines whether to use the grouped mean in the realized volatility
        calculation.
    rv_method: str, default "std"
        Specifies the method for calculating realized volatility, applicable
        only if `rv_adjustment` is True.
    rs_method: str, default "RS"
        Defines the method for calculating the range over standard deviation,
        affecting the width of the Mandelbrot Channel. Options include RS,
        RS_mean, RS_min, and RS_max.
    live_price: bool, default True
        Indicates whether to incorporate live price data into the calculation,
        which may extend the calculation time by 1-3 seconds.
    **kwargs
        Additional keyword arguments to pass to the function, if you want to
        change the behavior or pass parameters to internal functions.

    Returns
    -------
    pl.LazyFrame
        A LazyFrame containing the calculated Mandelbrot Channel data for the specified time series.

    Notes
    -----
    The function returns a pl.LazyFrame; remember to call `.collect()` on the result to obtain a DataFrame. This lazy evaluation strategy postpones the calculation until it is explicitly requested.

    Example
    -------
    To calculate the Mandelbrot Channel for a yearly window with adjustments for realized volatility using the 'yz' method, and incorporating live price data:

    ```python
    mandelbrot_channel = calc_mandelbrot_channel(
        data,
        window="1y",
        rv_adjustment=True,
        rv_method="yz",
        rv_grouped_mean=False,
        rs_method="RS",
        live_price=True
    ).collect()
    ```
    """
    # Setup ====================================================================
    # window_datetime = _window_format(window, _return_timedelta=True)
    sort_cols = _set_sort_cols(data, "symbol", "date")

    data = data.lazy()
    # Step 1: Collect Price Data -----------------------------------------------
    # Step X: Add window bins --------------------------------------------------
    # We want date grouping, non-overlapping window bins
    data1 = add_window_index(data, window=window)

    # Step X: Calculate Log Returns + Rvol -------------------------------------
    if "log_returns" not in data1.collect_schema().names():
        data2 = log_returns(data1, _column_name="close")
    else:
        data2 = data1

    # Step X: Calculate Log Mean Series ----------------------------------------
    if isinstance(data2, pl.DataFrame | pl.LazyFrame):
        data3 = mean(data2)
    else:
        msg = "A series was passed to `mean()` calculation. Please provide a DataFrame or LazyFrame."
        raise HumblDataError(msg)
    # Step X: Calculate Mean De-trended Series ---------------------------------
    data4 = detrend(
        data3, _detrend_value_col="window_mean", _detrend_col="log_returns"
    )
    # Step X: Calculate Cumulative Deviate Series ------------------------------
    data5 = cum_sum(data4, _column_name="detrended_log_returns")
    # Step X: Calculate Mandelbrot Range ---------------------------------------
    data6 = range_(data5, _column_name="cum_sum")
    # Step X: Calculate Standard Deviation -------------------------------------
    data7 = std(data6, _column_name="cum_sum")
    # Step X: Calculate Range (R) & Standard Deviation (S) ---------------------
    if rv_adjustment:
        # Step 8.1: Calculate Realized Volatility ------------------------------
        data7 = calc_realized_volatility(
            data=data7,
            window=window,
            method=rv_method,
            grouped_mean=rv_grouped_mean,
        )
        # rename col for easy selection
        for col in data7.collect_schema().names():
            if "volatility_pct" in col:
                data7 = data7.rename({col: "realized_volatility"})
        # Step 8.2: Calculate Volatility Bucket Stats --------------------------
        data7 = vol_buckets(data=data7, lo_quantile=0.3, hi_quantile=0.65)
        data7 = vol_filter(
            data7
        )  # removes rows that arent in the same vol bucket

    # Step X: Calculate RS -----------------------------------------------------
    data8 = data7.sort(sort_cols).with_columns(
        (pl.col("cum_sum_range") / pl.col("cum_sum_std")).alias("RS")
    )

    # Step X: Collect Recent Prices --------------------------------------------
    if live_price:
        symbols = (
            data.select("symbol").unique().sort("symbol").collect().to_series()
        )
        recent_prices = get_latest_price(symbols)
    else:
        recent_prices = None

    # Step X: Calculate Rescaled Price Range ----------------------------------
    out = price_range(
        data=data8,
        recent_price_data=recent_prices,
        rs_method=rs_method,
        _rv_adjustment=rv_adjustment,
    )

    return out
acalc_mandelbrot_channel async ¤
acalc_mandelbrot_channel(data: DataFrame | LazyFrame, window: str = '1m', rv_method: str = 'std', rs_method: Literal['RS', 'RS_mean', 'RS_max', 'RS_min'] = 'RS', *, rv_adjustment: bool = True, rv_grouped_mean: bool = True, live_price: bool = True, **kwargs) -> DataFrame | LazyFrame

Context: Toolbox || Category: Technical || Sub-Category: Mandelbrot Channel || Command: acalc_mandelbrot_channel.

Asynchronous wrapper for calc_mandelbrot_channel. This function allows calc_mandelbrot_channel to be called in an async context.

Notes

This does not make calc_mandelbrot_channel() non-blocking or asynchronous.

Source code in src/humbldata/toolbox/technical/mandelbrot_channel/model.py
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
async def acalc_mandelbrot_channel(  # noqa: PLR0913
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    rv_method: str = "std",
    rs_method: Literal["RS", "RS_mean", "RS_max", "RS_min"] = "RS",
    *,
    rv_adjustment: bool = True,
    rv_grouped_mean: bool = True,
    live_price: bool = True,
    **kwargs,
) -> pl.DataFrame | pl.LazyFrame:
    """
    Context: Toolbox || Category: Technical || Sub-Category: Mandelbrot Channel || **Command: acalc_mandelbrot_channel**.

    Asynchronous wrapper for calc_mandelbrot_channel.
    This function allows calc_mandelbrot_channel to be called in an async context.

    Notes
    -----
    This does not make `calc_mandelbrot_channel()` non-blocking or asynchronous.
    """
    # Directly call the synchronous calc_mandelbrot_channel function

    return calc_mandelbrot_channel(
        data=data,
        window=window,
        rv_adjustment=rv_adjustment,
        rv_method=rv_method,
        rs_method=rs_method,
        rv_grouped_mean=rv_grouped_mean,
        live_price=live_price,
        **kwargs,
    )
calc_mandelbrot_channel_historical ¤
calc_mandelbrot_channel_historical(data: DataFrame | LazyFrame, window: str = '1m', rv_method: str = 'std', rs_method: Literal['RS', 'RS_mean', 'RS_max', 'RS_min'] = 'RS', *, rv_adjustment: bool = True, rv_grouped_mean: bool = True, live_price: bool = True, **kwargs) -> LazyFrame

Context: Toolbox || Category: Technical || Sub-Category: Mandelbrot Channel || Command: calc_mandelbrot_channel_historical.

This function calculates the Mandelbrot Channel for historical data.

Synchronous wrapper for the asynchronous Mandelbrot Channel historical calculation.

Parameters:

Name Type Description Default
The
required
Please
required
description
required

Returns:

Type Description
LazyFrame

A LazyFrame containing the historical Mandelbrot Channel calculations.

Source code in src/humbldata/toolbox/technical/mandelbrot_channel/model.py
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
def calc_mandelbrot_channel_historical(  # noqa: PLR0913
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    rv_method: str = "std",
    rs_method: Literal["RS", "RS_mean", "RS_max", "RS_min"] = "RS",
    *,
    rv_adjustment: bool = True,
    rv_grouped_mean: bool = True,
    live_price: bool = True,
    **kwargs,
) -> pl.LazyFrame:
    """
    Context: Toolbox || Category: Technical || Sub-Category: Mandelbrot Channel || **Command: calc_mandelbrot_channel_historical**.

    This function calculates the Mandelbrot Channel for historical data.

    Synchronous wrapper for the asynchronous Mandelbrot Channel historical calculation.

    Parameters
    ----------
    The parameters for this function are the same as those for calc_mandelbrot_channel().
    Please refer to the documentation of calc_mandelbrot_channel() for a detailed
    description of each parameter.

    Returns
    -------
    pl.LazyFrame
        A LazyFrame containing the historical Mandelbrot Channel calculations.
    """
    return run_async(
        _acalc_mandelbrot_channel_historical_engine(
            data=data,
            window=window,
            rv_adjustment=rv_adjustment,
            rv_method=rv_method,
            rs_method=rs_method,
            rv_grouped_mean=rv_grouped_mean,
            live_price=live_price,
            **kwargs,
        )
    )
calc_mandelbrot_channel_historical_mp ¤
calc_mandelbrot_channel_historical_mp(data: DataFrame | LazyFrame, window: str = '1m', rv_adjustment: bool = True, rv_method: str = 'std', rs_method: Literal['RS', 'RS_mean', 'RS_max', 'RS_min'] = 'RS', *, rv_grouped_mean: bool = True, live_price: bool = True, n_processes: int = 1, **kwargs) -> LazyFrame

Calculate the Mandelbrot Channel historically using multiprocessing.

Parameters:

n_processes : int, optional Number of processes to use. If None, it uses all available cores.

Other parameters are the same as calc_mandelbrot_channel_historical.

Source code in src/humbldata/toolbox/technical/mandelbrot_channel/model.py
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
def calc_mandelbrot_channel_historical_mp(
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    rv_adjustment: bool = True,
    rv_method: str = "std",
    rs_method: Literal["RS", "RS_mean", "RS_max", "RS_min"] = "RS",
    *,
    rv_grouped_mean: bool = True,
    live_price: bool = True,
    n_processes: int = 1,
    **kwargs,
) -> pl.LazyFrame:
    """
    Calculate the Mandelbrot Channel historically using multiprocessing.

    Parameters:
    -----------
    n_processes : int, optional
        Number of processes to use. If None, it uses all available cores.

    Other parameters are the same as calc_mandelbrot_channel_historical.
    """
    window_days = _window_format(window, _return_timedelta=True)
    start_date = data.lazy().select(pl.col("date")).min().collect().row(0)[0]
    start_date = start_date + window_days
    end_date = data.lazy().select("date").max().collect().row(0)[0]

    if start_date >= end_date:
        msg = f"You set <historical=True> \n\
        This calculation needs *at least* one window of data. \n\
        The (start date + window) is: {start_date} and the dataset ended: {end_date}. \n\
        Please adjust dates accordingly."
        raise HumblDataError(msg)

    dates = (
        data.lazy()
        .select(pl.col("date"))
        .filter(pl.col("date") >= start_date)
        .unique()
        .sort("date")
        .collect()
        .to_series()
    )

    # Prepare the partial function with all arguments except the date
    calc_func = partial(
        _calc_mandelbrot_for_date,
        data=data,
        window=window,
        rv_adjustment=rv_adjustment,
        rv_method=rv_method,
        rs_method=rs_method,
        rv_grouped_mean=rv_grouped_mean,
        live_price=live_price,
        **kwargs,
    )

    # Use multiprocessing to calculate in parallel
    with multiprocessing.Pool(processes=n_processes) as pool:
        results = pool.map(calc_func, dates)

    # Combine results
    out = pl.concat(results, how="vertical").sort(["symbol", "date"])

    return out.lazy()
calc_mandelbrot_channel_historical_concurrent ¤
calc_mandelbrot_channel_historical_concurrent(data: DataFrame | LazyFrame, window: str = '1m', rv_method: str = 'std', rs_method: Literal['RS', 'RS_mean', 'RS_max', 'RS_min'] = 'RS', *, rv_adjustment: bool = True, rv_grouped_mean: bool = True, live_price: bool = True, max_workers: int | None = None, use_processes: bool = False, **kwargs) -> LazyFrame

Calculate the Mandelbrot Channel historically using concurrent.futures.

Parameters:

max_workers : int, optional Maximum number of workers to use. If None, it uses the default for ProcessPoolExecutor or ThreadPoolExecutor (usually the number of processors on the machine, multiplied by 5). use_processes : bool, default True If True, use ProcessPoolExecutor, otherwise use ThreadPoolExecutor.

Other parameters are the same as calc_mandelbrot_channel_historical.

Source code in src/humbldata/toolbox/technical/mandelbrot_channel/model.py
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
def calc_mandelbrot_channel_historical_concurrent(
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    rv_method: str = "std",
    rs_method: Literal["RS", "RS_mean", "RS_max", "RS_min"] = "RS",
    *,
    rv_adjustment: bool = True,
    rv_grouped_mean: bool = True,
    live_price: bool = True,
    max_workers: int | None = None,
    use_processes: bool = False,
    **kwargs,
) -> pl.LazyFrame:
    """
    Calculate the Mandelbrot Channel historically using concurrent.futures.

    Parameters:
    -----------
    max_workers : int, optional
        Maximum number of workers to use. If None, it uses the default for ProcessPoolExecutor
        or ThreadPoolExecutor (usually the number of processors on the machine, multiplied by 5).
    use_processes : bool, default True
        If True, use ProcessPoolExecutor, otherwise use ThreadPoolExecutor.

    Other parameters are the same as calc_mandelbrot_channel_historical.
    """
    window_days = _window_format(window, _return_timedelta=True)
    start_date = data.lazy().select(pl.col("date")).min().collect().row(0)[0]
    start_date = start_date + window_days
    end_date = data.lazy().select("date").max().collect().row(0)[0]

    if start_date >= end_date:
        msg = f"You set <historical=True> \n\
        This calculation needs *at least* one window of data. \n\
        The (start date + window) is: {start_date} and the dataset ended: {end_date}. \n\
        Please adjust dates accordingly."
        raise HumblDataError(msg)

    dates = (
        data.lazy()
        .select(pl.col("date"))
        .filter(pl.col("date") >= start_date)
        .unique()
        .sort("date")
        .collect()
        .to_series()
    )

    # Prepare the partial function with all arguments except the date
    calc_func = partial(
        _calc_mandelbrot_for_date,
        data=data,
        window=window,
        rv_adjustment=rv_adjustment,
        rv_method=rv_method,
        rs_method=rs_method,
        rv_grouped_mean=rv_grouped_mean,
        live_price=live_price,
        **kwargs,
    )

    # Choose the appropriate executor
    executor_class = (
        concurrent.futures.ProcessPoolExecutor
        if use_processes
        else concurrent.futures.ThreadPoolExecutor
    )

    # Use concurrent.futures to calculate in parallel
    with executor_class(max_workers=max_workers) as executor:
        futures = [executor.submit(calc_func, date) for date in dates]
        results = [
            future.result()
            for future in concurrent.futures.as_completed(futures)
        ]

    # Combine results
    out = pl.concat(results, how="vertical").sort(["symbol", "date"])

    return out.lazy()
view ¤
create_historical_plot ¤
create_historical_plot(data: DataFrame, symbol: str, template: ChartTemplate = ChartTemplate.plotly) -> Figure

Generate a historical plot for a given symbol from the provided data.

Parameters:

Name Type Description Default
data DataFrame

The dataframe containing historical data including dates, bottom prices, close prices, and top prices.

required
symbol str

The symbol for which the historical plot is to be generated.

required
template ChartTemplate

The template to be used for styling the plot.

plotly

Returns:

Type Description
Figure

A plotly figure object representing the historical data of the given symbol.

Source code in src/humbldata/toolbox/technical/mandelbrot_channel/view.py
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
def create_historical_plot(
    data: pl.DataFrame,
    symbol: str,
    template: ChartTemplate = ChartTemplate.plotly,
) -> go.Figure:
    """
    Generate a historical plot for a given symbol from the provided data.

    Parameters
    ----------
    data : pl.DataFrame
        The dataframe containing historical data including dates, bottom prices, close prices, and top prices.
    symbol : str
        The symbol for which the historical plot is to be generated.
    template : ChartTemplate
        The template to be used for styling the plot.

    Returns
    -------
    go.Figure
        A plotly figure object representing the historical data of the given symbol.
    """
    filtered_data = data.filter(pl.col("symbol") == symbol)

    fig = go.Figure()
    fig.add_trace(
        go.Scatter(
            x=filtered_data.select("date").to_series(),
            y=filtered_data.select("bottom_price").to_series(),
            name="Bottom Price",
            line=dict(color="green"),
        )
    )
    fig.add_trace(
        go.Scatter(
            x=filtered_data.select("date").to_series(),
            y=filtered_data.select("recent_price").to_series(),
            name="Recent Price",
            line=dict(color="blue"),
        )
    )
    fig.add_trace(
        go.Scatter(
            x=filtered_data.select("date").to_series(),
            y=filtered_data.select("top_price").to_series(),
            name="Top Price",
            line=dict(color="red"),
        )
    )
    fig.update_layout(
        title=f"Historical Mandelbrot Channel for {symbol}",
        xaxis_title="Date",
        yaxis_title="Price",
        template=template,
    )
    return fig
create_current_plot ¤
create_current_plot(data: DataFrame, equity_data: DataFrame, symbol: str, template: ChartTemplate = ChartTemplate.plotly) -> Figure

Generate a current plot for a given symbol from the provided data and equity data.

Parameters:

Name Type Description Default
data DataFrame

The dataframe containing historical data including top and bottom prices.

required
equity_data DataFrame

The dataframe containing current equity data including dates and close prices.

required
symbol str

The symbol for which the current plot is to be generated.

required
template ChartTemplate

The template to be used for styling the plot.

plotly

Returns:

Type Description
Figure

A plotly figure object representing the current data of the given symbol.

Source code in src/humbldata/toolbox/technical/mandelbrot_channel/view.py
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
def create_current_plot(
    data: pl.DataFrame,
    equity_data: pl.DataFrame,
    symbol: str,
    template: ChartTemplate = ChartTemplate.plotly,
) -> go.Figure:
    """
    Generate a current plot for a given symbol from the provided data and equity data.

    Parameters
    ----------
    data : pl.DataFrame
        The dataframe containing historical data including top and bottom prices.
    equity_data : pl.DataFrame
        The dataframe containing current equity data including dates and close prices.
    symbol : str
        The symbol for which the current plot is to be generated.
    template : ChartTemplate
        The template to be used for styling the plot.

    Returns
    -------
    go.Figure
        A plotly figure object representing the current data of the given symbol.
    """
    filtered_data = data.filter(pl.col("symbol") == symbol)
    equity_data = equity_data.filter(pl.col("symbol") == symbol)
    fig = go.Figure()
    fig.add_trace(
        go.Scatter(
            x=equity_data.select("date").to_series(),
            y=equity_data.select("close").to_series(),
            name="Recent Price",
            line=dict(color="blue"),
        )
    )
    fig.add_hline(
        y=filtered_data.select("top_price").row(0)[0],
        line=dict(color="red", width=2),
        name="Top Price",
    )
    fig.add_hline(
        y=filtered_data.select("bottom_price").row(0)[0],
        line=dict(color="green", width=2),
        name="Bottom Price",
    )
    fig.update_layout(
        title=f"Current Mandelbrot Channel for {symbol}",
        xaxis_title="Date",
        yaxis_title="Price",
        template=template,
    )
    return fig
is_historical_data ¤
is_historical_data(data: DataFrame) -> bool

Check if the provided dataframe contains historical data based on the uniqueness of dates.

Parameters:

Name Type Description Default
data DataFrame

The dataframe to check for historical data presence.

required

Returns:

Type Description
bool

Returns True if the dataframe contains historical data (more than one unique date), otherwise False.

Source code in src/humbldata/toolbox/technical/mandelbrot_channel/view.py
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
def is_historical_data(data: pl.DataFrame) -> bool:
    """
    Check if the provided dataframe contains historical data based on the uniqueness of dates.

    Parameters
    ----------
    data : pl.DataFrame
        The dataframe to check for historical data presence.

    Returns
    -------
    bool
        Returns True if the dataframe contains historical data (more than one unique date), otherwise False.
    """
    return data.select("date").to_series().unique().shape[0] > 1
generate_plot_for_symbol ¤
generate_plot_for_symbol(data: DataFrame, equity_data: DataFrame, symbol: str, template: ChartTemplate = ChartTemplate.plotly) -> Chart

Generate a plot for a specific symbol that is filtered from the original DF.

This function will check if the data provided is a Historical or Current Mandelbrot Channel data. If it is historical, it will generate a historical plot. If it is current, it will generate a current plot.

Parameters:

Name Type Description Default
data DataFrame

The dataframe containing Mandelbrot channel data for all symbols.

required
equity_data DataFrame

The dataframe containing equity data for all symbols.

required
symbol str

The symbol for which to generate the plot.

required
template ChartTemplate

The template/theme to use for the plotly figure. Options are: "humbl_light", "humbl_dark", "plotly_light", "plotly_dark", "ggplot2", "seaborn", "simple_white", "none"

plotly

Returns:

Type Description
Chart

A Chart object containing the generated plot for the specified symbol.

Source code in src/humbldata/toolbox/technical/mandelbrot_channel/view.py
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
def generate_plot_for_symbol(
    data: pl.DataFrame,
    equity_data: pl.DataFrame,
    symbol: str,
    template: ChartTemplate = ChartTemplate.plotly,
) -> Chart:
    """
    Generate a plot for a specific symbol that is filtered from the original DF.

    This function will check if the data provided is a Historical or Current
    Mandelbrot Channel data. If it is historical, it will generate a historical
    plot. If it is current, it will generate a current plot.

    Parameters
    ----------
    data : pl.DataFrame
        The dataframe containing Mandelbrot channel data for all symbols.
    equity_data : pl.DataFrame
        The dataframe containing equity data for all symbols.
    symbol : str
        The symbol for which to generate the plot.
    template : ChartTemplate
        The template/theme to use for the plotly figure. Options are:
        "humbl_light", "humbl_dark", "plotly_light", "plotly_dark", "ggplot2", "seaborn", "simple_white", "none"

    Returns
    -------
    Chart
        A Chart object containing the generated plot for the specified symbol.

    """
    if is_historical_data(data):
        out = create_historical_plot(data, symbol, template)
    else:
        out = create_current_plot(data, equity_data, symbol, template)

    return Chart(
        content=out.to_json(), fig=out
    )  # TODO: use to_json() instead of to_plotly_json()
generate_plots ¤
generate_plots(data: LazyFrame, equity_data: LazyFrame, template: ChartTemplate = ChartTemplate.plotly) -> list[Chart]

Context: Toolbox || Category: Technical || Subcategory: Mandelbrot Channel || Command: generate_plots().

Generate plots for each unique symbol in the given dataframes.

Parameters:

Name Type Description Default
data LazyFrame

The LazyFrame containing the symbols and MandelbrotChannelData

required
equity_data LazyFrame

The LazyFrame containing equity data for the symbols.

required
template ChartTemplate

The template/theme to use for the plotly figure.

plotly

Returns:

Type Description
list[Chart]

A list of Chart objects, each representing a plot for a unique symbol.

Source code in src/humbldata/toolbox/technical/mandelbrot_channel/view.py
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
def generate_plots(
    data: pl.LazyFrame,
    equity_data: pl.LazyFrame,
    template: ChartTemplate = ChartTemplate.plotly,
) -> list[Chart]:
    """
    Context: Toolbox || Category: Technical || Subcategory: Mandelbrot Channel || **Command: generate_plots()**.

    Generate plots for each unique symbol in the given dataframes.

    Parameters
    ----------
    data : pl.LazyFrame
        The LazyFrame containing the symbols and MandelbrotChannelData
    equity_data : pl.LazyFrame
        The LazyFrame containing equity data for the symbols.
    template : ChartTemplate
        The template/theme to use for the plotly figure.

    Returns
    -------
    list[Chart]
        A list of Chart objects, each representing a plot for a unique symbol.

    """
    symbols = data.select("symbol").unique().collect().to_series()

    plots = [
        generate_plot_for_symbol(
            data.collect(), equity_data.collect(), symbol, template
        )
        for symbol in symbols
    ]
    return plots
helpers ¤

Context: Toolbox || Category: Technical || Sub-Category: MandelBrot Channel || Sub-Category: Helpers.

These Toolbox() helpers are used in various calculations in the toolbox context. Most of the helpers will be mathematical transformations of data. These functions should be DUMB functions.

add_window_index ¤
add_window_index(data: LazyFrame | DataFrame, window: str) -> LazyFrame | DataFrame
Context: Toolbox || Category: MandelBrot Channel || Sub-Category: Helpers || Command: **add_window_index**.

Add a column to the dataframe indicating the window grouping for each row in a time series.

Parameters:

Name Type Description Default
data LazyFrame | DataFrame

The input data frame or lazy frame to which the window index will be added.

required
window str

The window size as a string, used to determine the grouping of rows into windows.

required

Returns:

Type Description
LazyFrame | DataFrame

The original data frame or lazy frame with an additional column named "window_index" indicating the window grouping for each row.

Notes
  • This function is essential for calculating the Mandelbrot Channel, where the dataset is split into numerous 'windows', and statistics are calculated for each window.
  • The function adds a dummy symbol column if the data contains only one symbol, to avoid errors in the group_by_dynamic() function.
  • It is utilized within the log_mean() and calc_mandelbrot_channel() functions for window-based calculations.

Examples:

>>> data = pl.DataFrame({"date": ["2021-01-01", "2021-01-02"], "symbol": ["AAPL", "AAPL"], "value": [1, 2]})
>>> window = "1d"
>>> add_window_index(data, window)
shape: (2, 4)
┌────────────┬────────┬───────┬──────────────┐
│ date       ┆ symbol ┆ value ┆ window_index │
│ ---        ┆ ---    ┆ ---   ┆ ---          │
│ date       ┆ str    ┆ i64   ┆ i64          │
╞════════════╪════════╪═══════╪══════════════╡
│ 2021-01-01 ┆ AAPL   ┆ 1     ┆ 0            │
├────────────┼────────┼───────┼──────────────┤
│ 2021-01-02 ┆ AAPL   ┆ 2     ┆ 1            │
└────────────┴────────┴───────┴──────────────┘
Source code in src/humbldata/toolbox/technical/mandelbrot_channel/helpers.py
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
def add_window_index(
    data: pl.LazyFrame | pl.DataFrame, window: str
) -> pl.LazyFrame | pl.DataFrame:
    """
        Context: Toolbox || Category: MandelBrot Channel || Sub-Category: Helpers || Command: **add_window_index**.

    Add a column to the dataframe indicating the window grouping for each row in
    a time series.

    Parameters
    ----------
    data : pl.LazyFrame | pl.DataFrame
        The input data frame or lazy frame to which the window index will be
        added.
    window : str
        The window size as a string, used to determine the grouping of rows into
        windows.

    Returns
    -------
    pl.LazyFrame | pl.DataFrame
        The original data frame or lazy frame with an additional column named
        "window_index" indicating
        the window grouping for each row.

    Notes
    -----
    - This function is essential for calculating the Mandelbrot Channel, where
    the dataset is split into
    numerous 'windows', and statistics are calculated for each window.
    - The function adds a dummy `symbol` column if the data contains only one
    symbol, to avoid errors in the `group_by_dynamic()` function.
    - It is utilized within the `log_mean()` and `calc_mandelbrot_channel()`
    functions for window-based calculations.

    Examples
    --------
    >>> data = pl.DataFrame({"date": ["2021-01-01", "2021-01-02"], "symbol": ["AAPL", "AAPL"], "value": [1, 2]})
    >>> window = "1d"
    >>> add_window_index(data, window)
    shape: (2, 4)
    ┌────────────┬────────┬───────┬──────────────┐
    │ date       ┆ symbol ┆ value ┆ window_index │
    │ ---        ┆ ---    ┆ ---   ┆ ---          │
    │ date       ┆ str    ┆ i64   ┆ i64          │
    ╞════════════╪════════╪═══════╪══════════════╡
    │ 2021-01-01 ┆ AAPL   ┆ 1     ┆ 0            │
    ├────────────┼────────┼───────┼──────────────┤
    │ 2021-01-02 ┆ AAPL   ┆ 2     ┆ 1            │
    └────────────┴────────┴───────┴──────────────┘
    """

    def _create_monthly_window_index(col: str, k: int = 1):
        year_diff = pl.col(col).last().dt.year() - pl.col(col).dt.year()
        month_diff = pl.col(col).last().dt.month() - pl.col(col).dt.month()
        day_indicator = pl.col(col).dt.day() > pl.col(col).last().dt.day()
        return (12 * year_diff + month_diff - day_indicator) // k

    # Clean the window into standardized strings (i.e "1month"/"1 month" = "1mo")
    window = _window_format(window, _return_timedelta=False)  # returns `str`

    if "w" in window or "d" in window:
        msg = "The window cannot include 'd' or 'w', the window needs to be larger than 1 month!"
        raise HumblDataError(msg)

    window_monthly = _window_format_monthly(window)

    data = data.with_columns(
        _create_monthly_window_index(col="date", k=window_monthly)
        .alias("window_index")
        .over("symbol")
    )

    return data
vol_buckets ¤
vol_buckets(data: DataFrame | LazyFrame, lo_quantile: float = 0.4, hi_quantile: float = 0.8, _column_name_volatility: str = 'realized_volatility', *, _boundary_group_down: bool = False) -> LazyFrame

Context: Toolbox || Category: MandelBrot Channel || Sub-Category: Helpers || Command: vol_buckets.

Splitting data observations into 3 volatility buckets: low, mid and high. The function does this for each symbol present in the data.

Parameters:

Name Type Description Default
data LazyFrame | DataFrame

The input dataframe or lazy frame.

required
lo_quantile float

The lower quantile for bucketing. Default is 0.4.

0.4
hi_quantile float

The higher quantile for bucketing. Default is 0.8.

0.8
_column_name_volatility str

The name of the column to apply volatility bucketing. Default is "realized_volatility".

'realized_volatility'
_boundary_group_down bool

If True, then group boundary values down to the lower bucket, using vol_buckets_alt() If False, then group boundary values up to the higher bucket, using the Polars .qcut() method. Default is False.

False

Returns:

Type Description
LazyFrame

The data with an additional column: vol_bucket

Source code in src/humbldata/toolbox/technical/mandelbrot_channel/helpers.py
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
def vol_buckets(
    data: pl.DataFrame | pl.LazyFrame,
    lo_quantile: float = 0.4,
    hi_quantile: float = 0.8,
    _column_name_volatility: str = "realized_volatility",
    *,
    _boundary_group_down: bool = False,
) -> pl.LazyFrame:
    """
    Context: Toolbox || Category: MandelBrot Channel || Sub-Category: Helpers || Command: **vol_buckets**.

    Splitting data observations into 3 volatility buckets: low, mid and high.
    The function does this for each `symbol` present in the data.

    Parameters
    ----------
    data : pl.LazyFrame | pl.DataFrame
        The input dataframe or lazy frame.
    lo_quantile : float
        The lower quantile for bucketing. Default is 0.4.
    hi_quantile : float
        The higher quantile for bucketing. Default is 0.8.
    _column_name_volatility : str
        The name of the column to apply volatility bucketing. Default is
        "realized_volatility".
    _boundary_group_down: bool = False
        If True, then group boundary values down to the lower bucket, using
        `vol_buckets_alt()` If False, then group boundary values up to the
        higher bucket, using the Polars `.qcut()` method.
        Default is False.

    Returns
    -------
    pl.LazyFrame
        The `data` with an additional column: `vol_bucket`
    """
    _check_required_columns(data, _column_name_volatility, "symbol")

    if not _boundary_group_down:
        # Grouping Boundary Values in Higher Bucket
        out = data.lazy().with_columns(
            pl.col(_column_name_volatility)
            .qcut(
                [lo_quantile, hi_quantile],
                labels=["low", "mid", "high"],
                left_closed=False,
                allow_duplicates=True,
            )
            .over("symbol")
            .alias("vol_bucket")
            .cast(pl.Utf8)
        )
    else:
        out = vol_buckets_alt(
            data, lo_quantile, hi_quantile, _column_name_volatility
        )

    return out
vol_buckets_alt ¤
vol_buckets_alt(data: DataFrame | LazyFrame, lo_quantile: float = 0.4, hi_quantile: float = 0.8, _column_name_volatility: str = 'realized_volatility') -> LazyFrame

Context: Toolbox || Category: MandelBrot Channel || Sub-Category: Helpers || Command: vol_buckets_alt.

This is an alternative implementation of vol_buckets() using expressions, and not using .qcut(). The biggest difference is how the function groups values on the boundaries of quantiles. This function groups boundary values down Splitting data observations into 3 volatility buckets: low, mid and high. The function does this for each symbol present in the data.

Parameters:

Name Type Description Default
data LazyFrame | DataFrame

The input dataframe or lazy frame.

required
lo_quantile float

The lower quantile for bucketing. Default is 0.4.

0.4
hi_quantile float

The higher quantile for bucketing. Default is 0.8.

0.8
_column_name_volatility str

The name of the column to apply volatility bucketing. Default is "realized_volatility".

'realized_volatility'

Returns:

Type Description
LazyFrame

The data with an additional column: vol_bucket

Notes

The biggest difference is how the function groups values on the boundaries of quantiles. This function groups boundary values down to the lower bucket. So, if there is a value that lies on the mid/low border, this function will group it with low, whereas vol_buckets() will group it with mid

This function is also slightly less performant.

Source code in src/humbldata/toolbox/technical/mandelbrot_channel/helpers.py
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
def vol_buckets_alt(
    data: pl.DataFrame | pl.LazyFrame,
    lo_quantile: float = 0.4,
    hi_quantile: float = 0.8,
    _column_name_volatility: str = "realized_volatility",
) -> pl.LazyFrame:
    """
    Context: Toolbox || Category: MandelBrot Channel || Sub-Category: Helpers || Command: **vol_buckets_alt**.

    This is an alternative implementation of `vol_buckets()` using expressions,
    and not using `.qcut()`.
    The biggest difference is how the function groups values on the boundaries
    of quantiles. This function groups boundary values down
    Splitting data observations into 3 volatility buckets: low, mid and high.
    The function does this for each `symbol` present in the data.

    Parameters
    ----------
    data : pl.LazyFrame | pl.DataFrame
        The input dataframe or lazy frame.
    lo_quantile : float
        The lower quantile for bucketing. Default is 0.4.
    hi_quantile : float
        The higher quantile for bucketing. Default is 0.8.
    _column_name_volatility : str
        The name of the column to apply volatility bucketing. Default is "realized_volatility".

    Returns
    -------
    pl.LazyFrame
        The `data` with an additional column: `vol_bucket`

    Notes
    -----
    The biggest difference is how the function groups values on the boundaries
    of quantiles. This function __groups boundary values down__ to the lower bucket.
    So, if there is a value that lies on the mid/low border, this function will
    group it with `low`, whereas `vol_buckets()` will group it with `mid`

    This function is also slightly less performant.
    """
    # Calculate low and high quantiles for each symbol
    low_vol = pl.col(_column_name_volatility).quantile(lo_quantile)
    high_vol = pl.col(_column_name_volatility).quantile(hi_quantile)

    # Determine the volatility bucket for each row using expressions
    vol_bucket = (
        pl.when(pl.col(_column_name_volatility) <= low_vol)
        .then(pl.lit("low"))
        .when(pl.col(_column_name_volatility) <= high_vol)
        .then(pl.lit("mid"))
        .otherwise(pl.lit("high"))
        .alias("vol_bucket")
    )

    # Add the volatility bucket column to the data
    out = data.lazy().with_columns(vol_bucket.over("symbol"))

    return out
vol_filter ¤
vol_filter(data: DataFrame | LazyFrame) -> LazyFrame

Context: Toolbox || Category: MandelBrot Channel || Sub-Category: Helpers || Command: vol_filter.

If _rv_adjustment is True, then filter the data to only include rows that are in the same vol_bucket as the latest row for each symbol.

Parameters:

Name Type Description Default
data DataFrame | LazyFrame

The input dataframe or lazy frame. This should be the output of vol_buckets() function in calc_mandelbrot_channel().

required

Returns:

Type Description
LazyFrame

The data with only observations in the same volatility bucket as the most recent data observation

Source code in src/humbldata/toolbox/technical/mandelbrot_channel/helpers.py
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
def vol_filter(
    data: pl.DataFrame | pl.LazyFrame,
) -> pl.LazyFrame:
    """
    Context: Toolbox || Category: MandelBrot Channel || Sub-Category: Helpers || Command: **vol_filter**.

    If `_rv_adjustment` is True, then filter the data to only include rows
    that are in the same vol_bucket as the latest row for each symbol.

    Parameters
    ----------
    data : pl.DataFrame | pl.LazyFrame
        The input dataframe or lazy frame. This should be the output of
        `vol_buckets()` function in `calc_mandelbrot_channel()`.

    Returns
    -------
    pl.LazyFrame
        The data with only observations in the same volatility bucket as the
        most recent data observation
    """
    _check_required_columns(data, "vol_bucket", "symbol")

    data = data.lazy().with_columns(
        pl.col("vol_bucket").last().over("symbol").alias("last_vol_bucket")
    )

    out = data.filter(
        (pl.col("vol_bucket") == pl.col("last_vol_bucket")).over("symbol")
    ).drop("last_vol_bucket")

    return out
price_range ¤
price_range(data: LazyFrame | DataFrame, recent_price_data: DataFrame | LazyFrame | None = None, rs_method: Literal['RS', 'RS_mean', 'RS_max', 'RS_min'] = 'RS', _detrended_returns: str = 'detrended_log_returns', _column_name_cum_sum_max: str = 'cum_sum_max', _column_name_cum_sum_min: str = 'cum_sum_min', *, _rv_adjustment: bool = False, _sort: bool = True, **kwargs) -> LazyFrame

Context: Toolbox || Category: MandelBrot Channel || Sub-Category: Helpers || Command: price_range.

Calculate the price range for a given dataset using the Mandelbrot method.

This function computes the price range based on the recent price data, cumulative sum max and min, and RS method specified. It supports adjustments for real volatility and sorting of the data based on symbols and dates.

Parameters:

Name Type Description Default
data LazyFrame | DataFrame

The dataset containing the financial data.

required
recent_price_data DataFrame | LazyFrame | None

The dataset containing the most recent price data. If None, the most recent prices are extracted from data.

None
rs_method Literal['RS', 'RS_mean', 'RS_max', 'RS_min']

The RS value to use. Must be one of 'RS', 'RS_mean', 'RS_max', 'RS_min'. RS is the column that is the Range/STD of the detrended returns.

"RS"
_detrended_returns str

The column name for detrended returns in data

"detrended_log_returns"
_column_name_cum_sum_max str

The column name for cumulative sum max in data

"cum_sum_max"
_column_name_cum_sum_min str

The column name for cumulative sum min in data

"cum_sum_min"
_rv_adjustment bool

If True, calculated the std() for all observations (since they have already been filtered by volatility bucket). If False, then calculates the std() for the most recent window_index and uses that to adjust the price range.

False
_sort bool

If True, sorts the data based on symbols and dates.

True
**kwargs

Arbitrary keyword arguments.

{}

Returns:

Type Description
LazyFrame

The dataset with calculated price range, including columns for top and bottom prices.

Raises:

Type Description
HumblDataError

If the RS method specified is not supported.

Examples:

>>> price_range_data = price_range(data, recent_price_data=None, rs_method="RS")
>>> print(price_range_data.columns)
['symbol', 'bottom_price', 'recent_price', 'top_price']
Notes

For rs_method, you should know how this affects the mandelbrot channel that is produced. Selecting RS uses the most recent RS value to calculate the price range, whereas selecting RS_mean, RS_max, or RS_min uses the mean, max, or min of the RS values, respectively.

Source code in src/humbldata/toolbox/technical/mandelbrot_channel/helpers.py
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
def price_range(
    data: pl.LazyFrame | pl.DataFrame,
    recent_price_data: pl.DataFrame | pl.LazyFrame | None = None,
    rs_method: Literal["RS", "RS_mean", "RS_max", "RS_min"] = "RS",
    _detrended_returns: str = "detrended_log_returns",  # Parameterized detrended_returns column
    _column_name_cum_sum_max: str = "cum_sum_max",
    _column_name_cum_sum_min: str = "cum_sum_min",
    *,
    _rv_adjustment: bool = False,
    _sort: bool = True,
    **kwargs,
) -> pl.LazyFrame:
    """
    Context: Toolbox || Category: MandelBrot Channel || Sub-Category: Helpers || Command: **price_range**.

    Calculate the price range for a given dataset using the Mandelbrot method.

    This function computes the price range based on the recent price data,
    cumulative sum max and min, and RS method specified. It supports adjustments
    for real volatility and sorting of the data based on symbols and dates.

    Parameters
    ----------
    data : pl.LazyFrame | pl.DataFrame
        The dataset containing the financial data.
    recent_price_data : pl.DataFrame | pl.LazyFrame | None
        The dataset containing the most recent price data. If None, the most recent prices are extracted from `data`.
    rs_method : Literal["RS", "RS_mean", "RS_max", "RS_min"], default "RS"
        The RS value to use. Must be one of 'RS', 'RS_mean', 'RS_max', 'RS_min'.
        RS is the column that is the Range/STD of the detrended returns.
    _detrended_returns : str, default "detrended_log_returns"
        The column name for detrended returns in `data`
    _column_name_cum_sum_max : str, default "cum_sum_max"
        The column name for cumulative sum max in `data`
    _column_name_cum_sum_min : str, default "cum_sum_min"
        The column name for cumulative sum min in `data`
    _rv_adjustment : bool, default False
        If True, calculated the `std()` for all observations (since they have
        already been filtered by volatility bucket). If False, then calculates
        the `std()` for the most recent `window_index`
        and uses that to adjust the price range.
    _sort : bool, default True
        If True, sorts the data based on symbols and dates.
    **kwargs
        Arbitrary keyword arguments.

    Returns
    -------
    pl.LazyFrame
        The dataset with calculated price range, including columns for top and
        bottom prices.

    Raises
    ------
    HumblDataError
        If the RS method specified is not supported.

    Examples
    --------
    >>> price_range_data = price_range(data, recent_price_data=None, rs_method="RS")
    >>> print(price_range_data.columns)
    ['symbol', 'bottom_price', 'recent_price', 'top_price']

    Notes
    -----
    For `rs_method`, you should know how this affects the mandelbrot channel
    that is produced. Selecting RS uses the most recent RS value to calculate
    the price range, whereas selecting RS_mean, RS_max, or RS_min uses the mean,
    max, or min of the RS values, respectively.
    """
    # Check if RS_method is one of the allowed values
    if rs_method not in RS_METHODS:
        msg = "RS_method must be one of 'RS', 'RS_mean', 'RS_max', 'RS_min'"
        raise HumblDataError(msg)

    if isinstance(data, pl.DataFrame):
        data = data.lazy()

    sort_cols = _set_sort_cols(data, "symbol", "date")
    if _sort:
        data.sort(sort_cols)

    # Define Polars Expressions ================================================
    last_cum_sum_max = (
        pl.col(_column_name_cum_sum_max).last().alias("last_cum_sum_max")
    )
    last_cum_sum_min = (
        pl.col(_column_name_cum_sum_min).last().alias("last_cum_sum_min")
    )
    # Define a conditional expression for std_detrended_returns based on _rv_adjustment
    std_detrended_returns_expr = (
        pl.col(_detrended_returns).std().alias(f"std_{_detrended_returns}")
        if _rv_adjustment
        else pl.col(_detrended_returns)
        .filter(pl.col("window_index") == pl.col("window_index").min())
        .std()
        .alias(f"std_{_detrended_returns}")
    )
    # if rv_adjustment isnt used, then use the most recent window will be used
    # for calculating the price_range
    date_expr = pl.col("date").max()
    # ===========================================================================

    if rs_method == "RS":
        rs_expr = pl.col("RS").last().alias("RS")
    elif rs_method == "RS_mean":
        rs_expr = pl.col("RS").mean().alias("RS_mean")
    elif rs_method == "RS_max":
        rs_expr = pl.col("RS").max().alias("RS_max")
    elif rs_method == "RS_min":
        rs_expr = pl.col("RS").min().alias("RS_min")

    if recent_price_data is None:
        # if no recent_prices_data is passed, then pull the most recent prices from the data
        recent_price_expr = pl.col("close").last().alias("recent_price")
        # Perform a single group_by operation to calculate both STD of detrended returns and RS statistics
        price_range_data = (
            data.group_by("symbol")
            .agg(
                [
                    date_expr,
                    # Conditional STD calculation based on _rv_adjustment
                    std_detrended_returns_expr,
                    # Recent Price Data
                    recent_price_expr,
                    # cum_sum_max/min last
                    last_cum_sum_max,
                    last_cum_sum_min,
                    # RS statistics
                    rs_expr,
                ]
            )
            # Join with recent_price_data on symbol
            .with_columns(
                (
                    pl.col(rs_method)
                    * pl.col("std_detrended_log_returns")
                    * pl.col("recent_price")
                ).alias("price_range")
            )
            .sort("symbol")
        )
    else:
        price_range_data = (
            data.group_by("symbol")
            .agg(
                [
                    date_expr,
                    # Conditional STD calculation based on _rv_adjustment
                    std_detrended_returns_expr,
                    # cum_sum_max/min last
                    last_cum_sum_max,
                    last_cum_sum_min,
                    # RS statistics
                    rs_expr,
                ]
            )
            # Join with recent_price_data on symbol
            .join(recent_price_data.lazy(), on="symbol")
            .with_columns(
                (
                    pl.col(rs_method)
                    * pl.col("std_detrended_log_returns")
                    * pl.col("recent_price")
                ).alias("price_range")
            )
            .sort("symbol")
        )
    # Relative Position Modifier
    out = _price_range_engine(price_range_data)

    return out

volatility ¤

realized_volatility_helpers ¤

Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers.

All of the volatility estimators used in calc_realized_volatility(). These are various methods to calculate the realized volatility of financial data.

std ¤
std(data: DataFrame | LazyFrame | Series, window: str = '1m', trading_periods=252, _drop_nulls: bool = True, _avg_trading_days: bool = False, _column_name_returns: str = 'log_returns', _sort: bool = True) -> LazyFrame | Series

Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers || Command: _std.

This function computes the standard deviation of returns, which is a common measure of volatility.It calculates the rolling standard deviation for a given window size, optionally adjusting for the average number of trading days and scaling the result to an annualized volatility percentage.

Parameters:

Name Type Description Default
data Union[DataFrame, LazyFrame, Series]

The input data containing the returns. It can be a DataFrame, LazyFrame, or Series.

required
window str

The rolling window size for calculating the standard deviation. The default is "1m" (one month).

'1m'
trading_periods int

The number of trading periods in a year, used for annualizing the volatility. The default is 252.

252
_drop_nulls bool

If True, null values will be dropped from the result. The default is True.

True
_avg_trading_days bool

If True, the average number of trading days will be used when calculating the window size. The default is True.

False
_column_name_returns str

The name of the column containing the returns. This parameter is used when data is a DataFrame or LazyFrame. The default is "log_returns".

'log_returns'

Returns:

Type Description
Union[DataFrame, LazyFrame, Series]

The input data structure with an additional column for the rolling standard deviation of returns, or the modified Series with the rolling standard deviation values.

Source code in src/humbldata/toolbox/technical/volatility/realized_volatility_helpers.py
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
def std(
    data: pl.DataFrame | pl.LazyFrame | pl.Series,
    window: str = "1m",
    trading_periods=252,
    _drop_nulls: bool = True,
    _avg_trading_days: bool = False,
    _column_name_returns: str = "log_returns",
    _sort: bool = True,
) -> pl.LazyFrame | pl.Series:
    """
    Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers || **Command: _std**.

    This function computes the standard deviation of returns, which is a common
    measure of volatility.It calculates the rolling standard deviation for a
    given window size, optionally adjusting for the average number of trading
    days and scaling the result to an annualized volatility percentage.

    Parameters
    ----------
    data : Union[pl.DataFrame, pl.LazyFrame, pl.Series]
        The input data containing the returns. It can be a DataFrame, LazyFrame,
        or Series.
    window : str, optional
        The rolling window size for calculating the standard deviation.
        The default is "1m" (one month).
    trading_periods : int, optional
        The number of trading periods in a year, used for annualizing the
        volatility. The default is 252.
    _drop_nulls : bool, optional
        If True, null values will be dropped from the result.
        The default is True.
    _avg_trading_days : bool, optional
        If True, the average number of trading days will be used when
        calculating the window size. The default is True.
    _column_name_returns : str, optional
        The name of the column containing the returns. This parameter is used
        when `data` is a DataFrame or LazyFrame. The default is "log_returns".

    Returns
    -------
    Union[pl.DataFrame, pl.LazyFrame, pl.Series]
        The input data structure with an additional column for the rolling
        standard deviation of returns, or the modified Series with the rolling
        standard deviation values.
    """
    window_timedelta = _window_format(
        window, _return_timedelta=True, _avg_trading_days=_avg_trading_days
    )
    if isinstance(data, pl.Series):
        return data.rolling_std(
            window_size=window_timedelta.days, min_periods=1
        )
    sort_cols = _set_sort_cols(data, "symbol", "date")
    if _sort and sort_cols:
        data = data.lazy().sort(sort_cols)
        for col in sort_cols:
            data = data.set_sorted(col)

    # convert window_timedelta to days to use fixed window
    result = data.lazy().with_columns(
        (
            pl.col(_column_name_returns).rolling_std_by(
                window_size=window_timedelta,
                min_periods=2,  # using min_periods=2, bc if min_periods=1, the first value will be 0.
                by="date",
            )
            * math.sqrt(trading_periods)
            * 100
        ).alias(f"std_volatility_pct_{window_timedelta.days}D")
    )
    if _drop_nulls:
        return result.drop_nulls(
            subset=f"std_volatility_pct_{window_timedelta.days}D"
        )
    return result
parkinson ¤
parkinson(data: DataFrame | LazyFrame, window: str = '1m', _column_name_high: str = 'high', _column_name_low: str = 'low', *, _drop_nulls: bool = True, _avg_trading_days: bool = False, _sort: bool = True) -> LazyFrame

Calculate Parkinson's volatility over a specified window.

Parkinson's volatility is a measure that uses the stock's high and low prices of the day rather than just close to close prices. It is particularly useful for capturing large price movements during the day.

Parameters:

Name Type Description Default
data DataFrame | LazyFrame

The input data containing the stock prices.

required
window int

The rolling window size for calculating volatility, by default 30.

'1m'
trading_periods int

The number of trading periods in a year, by default 252.

required
_column_name_high str

The name of the column containing the high prices, by default "high".

'high'
_column_name_low str

The name of the column containing the low prices, by default "low".

'low'
_drop_nulls bool

Whether to drop null values from the result, by default True.

True
_avg_trading_days bool

Whether to use the average number of trading days when calculating the window size, by default True.

False

Returns:

Type Description
DataFrame | LazyFrame

The calculated Parkinson's volatility, with an additional column "parkinson_volatility_pct_{window_int}D" indicating the percentage volatility.

Notes

This function requires the input data to have 'high' and 'low' columns to calculate the logarithm of their ratio, which is squared and scaled by a constant to estimate volatility. The result is then annualized and expressed as a percentage.

Usage

If you pass "1m as a window argument and _avg_trading_days=False. The result will be 30. If _avg_trading_days=True, the result will be 21.

Examples:

>>> data = pl.DataFrame({'high': [120, 125], 'low': [115, 120]})
>>> _parkinson(data)
A DataFrame with the calculated Parkinson's volatility.
Source code in src/humbldata/toolbox/technical/volatility/realized_volatility_helpers.py
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
def parkinson(
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    _column_name_high: str = "high",
    _column_name_low: str = "low",
    *,
    _drop_nulls: bool = True,
    _avg_trading_days: bool = False,
    _sort: bool = True,
) -> pl.LazyFrame:
    """
    Calculate Parkinson's volatility over a specified window.

    Parkinson's volatility is a measure that uses the stock's high and low prices
    of the day rather than just close to close prices. It is particularly useful
    for capturing large price movements during the day.

    Parameters
    ----------
    data : pl.DataFrame | pl.LazyFrame
        The input data containing the stock prices.
    window : int, optional
        The rolling window size for calculating volatility, by default 30.
    trading_periods : int, optional
        The number of trading periods in a year, by default 252.
    _column_name_high : str, optional
        The name of the column containing the high prices, by default "high".
    _column_name_low : str, optional
        The name of the column containing the low prices, by default "low".
    _drop_nulls : bool, optional
        Whether to drop null values from the result, by default True.
    _avg_trading_days : bool, optional
        Whether to use the average number of trading days when calculating the
        window size, by default True.

    Returns
    -------
    pl.DataFrame | pl.LazyFrame
        The calculated Parkinson's volatility, with an additional column
        "parkinson_volatility_pct_{window_int}D"
        indicating the percentage volatility.

    Notes
    -----
    This function requires the input data to have 'high' and 'low' columns to
    calculate
    the logarithm of their ratio, which is squared and scaled by a constant to
    estimate
    volatility. The result is then annualized and expressed as a percentage.

    Usage
    -----
    If you pass `"1m` as a `window` argument and  `_avg_trading_days=False`.
    The result will be `30`. If `_avg_trading_days=True`, the result will be
    `21`.

    Examples
    --------
    >>> data = pl.DataFrame({'high': [120, 125], 'low': [115, 120]})
    >>> _parkinson(data)
    A DataFrame with the calculated Parkinson's volatility.
    """
    sort_cols = _set_sort_cols(data, "symbol", "date")
    if _sort and sort_cols:
        data = data.lazy().sort(sort_cols)
        for col in sort_cols:
            data = data.set_sorted(col)

    var1 = 1.0 / (4.0 * math.log(2.0))
    var2 = (
        data.lazy()
        .select((pl.col(_column_name_high) / pl.col(_column_name_low)).log())
        .collect()
        .to_series()
    )
    rs = var1 * var2**2

    window_int: int = _window_format(
        window, _return_timedelta=True, _avg_trading_days=_avg_trading_days
    ).days
    result = data.lazy().with_columns(
        (
            rs.rolling_map(_annual_vol, window_size=window_int, min_periods=1)
            * 100
        ).alias(f"parkinson_volatility_pct_{window_int}D")
    )
    if _drop_nulls:
        return result.drop_nulls(
            subset=f"parkinson_volatility_pct_{window_int}D"
        )

    return result
garman_klass ¤
garman_klass(data: DataFrame | LazyFrame, window: str = '1m', _column_name_high: str = 'high', _column_name_low: str = 'low', _column_name_open: str = 'open', _column_name_close: str = 'close', _drop_nulls: bool = True, _avg_trading_days: bool = False, _sort: bool = True) -> LazyFrame

Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers || Command: _garman_klass.

Calculates the Garman-Klass volatility for a given dataset.

Parameters:

Name Type Description Default
data DataFrame | LazyFrame

The input data containing the price information.

required
window str

The rolling window size for volatility calculation, by default "1m".

'1m'
_column_name_high str

The name of the column containing the high prices, by default "high".

'high'
_column_name_low str

The name of the column containing the low prices, by default "low".

'low'
_column_name_open str

The name of the column containing the opening prices, by default "open".

'open'
_column_name_close str

The name of the column containing the adjusted closing prices, by default "close".

'close'
_drop_nulls bool

Whether to drop null values from the result, by default True.

True
_avg_trading_days bool

Whether to use the average number of trading days when calculating the window size, by default True.

False

Returns:

Type Description
DataFrame | LazyFrame | Series

The calculated Garman-Klass volatility, with an additional column "volatility_pct" indicating the percentage volatility.

Notes

Garman-Klass volatility extends Parkinson’s volatility by considering the opening and closing prices in addition to the high and low prices. This approach provides a more accurate estimation of volatility, especially in markets with significant activity at the opening and closing of trading sessions.

Source code in src/humbldata/toolbox/technical/volatility/realized_volatility_helpers.py
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
def garman_klass(
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    _column_name_high: str = "high",
    _column_name_low: str = "low",
    _column_name_open: str = "open",
    _column_name_close: str = "close",
    _drop_nulls: bool = True,
    _avg_trading_days: bool = False,
    _sort: bool = True,
) -> pl.LazyFrame:
    """
    Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers || **Command: _garman_klass**.

    Calculates the Garman-Klass volatility for a given dataset.

    Parameters
    ----------
    data : pl.DataFrame | pl.LazyFrame
        The input data containing the price information.
    window : str, optional
        The rolling window size for volatility calculation, by default "1m".
    _column_name_high : str, optional
        The name of the column containing the high prices, by default "high".
    _column_name_low : str, optional
        The name of the column containing the low prices, by default "low".
    _column_name_open : str, optional
        The name of the column containing the opening prices, by default "open".
    _column_name_close : str, optional
        The name of the column containing the adjusted closing prices, by
        default "close".
    _drop_nulls : bool, optional
        Whether to drop null values from the result, by default True.
    _avg_trading_days : bool, optional
        Whether to use the average number of trading days when calculating the
        window size, by default True.

    Returns
    -------
    pl.DataFrame | pl.LazyFrame | pl.Series
        The calculated Garman-Klass volatility, with an additional column
        "volatility_pct" indicating the percentage volatility.

    Notes
    -----
    Garman-Klass volatility extends Parkinson’s volatility by considering the
    opening and closing prices in addition to the high and low prices. This
    approach provides a more accurate estimation of volatility, especially in
    markets with significant activity at the opening and closing of trading
    sessions.
    """
    sort_cols = _set_sort_cols(data, "symbol", "date")
    if _sort and sort_cols:
        data = data.lazy().sort(sort_cols)
        for col in sort_cols:
            data = data.set_sorted(col)
    log_hi_lo = (
        data.lazy()
        .select((pl.col(_column_name_high) / pl.col(_column_name_low)).log())
        .collect()
        .to_series()
    )
    log_close_open = (
        data.lazy()
        .select((pl.col(_column_name_close) / pl.col(_column_name_open)).log())
        .collect()
        .to_series()
    )
    rs: pl.Series = 0.5 * log_hi_lo**2 - (2 * np.log(2) - 1) * log_close_open**2

    window_int: int = _window_format(
        window, _return_timedelta=True, _avg_trading_days=_avg_trading_days
    ).days
    result = data.lazy().with_columns(
        (
            rs.rolling_map(_annual_vol, window_size=window_int, min_periods=1)
            * 100
        ).alias(f"gk_volatility_pct_{window_int}D")
    )
    if _drop_nulls:
        return result.drop_nulls(subset=f"gk_volatility_pct_{window_int}D")
    return result
hodges_tompkins ¤
hodges_tompkins(data: DataFrame | LazyFrame | Series, window: str = '1m', trading_periods=252, _column_name_returns: str = 'log_returns', *, _drop_nulls: bool = True, _avg_trading_days: bool = False, _sort: bool = True) -> LazyFrame | Series

Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers || Command: _hodges_tompkins.

Hodges-Tompkins volatility is a bias correction for estimation using an overlapping data sample that produces unbiased estimates and a substantial gain in efficiency.

Source code in src/humbldata/toolbox/technical/volatility/realized_volatility_helpers.py
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
def hodges_tompkins(
    data: pl.DataFrame | pl.LazyFrame | pl.Series,
    window: str = "1m",
    trading_periods=252,
    _column_name_returns: str = "log_returns",
    *,
    _drop_nulls: bool = True,
    _avg_trading_days: bool = False,
    _sort: bool = True,
) -> pl.LazyFrame | pl.Series:
    """
    Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers || **Command: _hodges_tompkins**.

    Hodges-Tompkins volatility is a bias correction for estimation using an
    overlapping data sample that produces unbiased estimates and a
    substantial gain in efficiency.
    """
    # When calculating rv_mean, need a different adjustment factor,
    # so window doesn't influence the Volatility_mean
    # RV_MEAN

    # Define Window Size
    window_timedelta = _window_format(
        window, _return_timedelta=True, _avg_trading_days=_avg_trading_days
    )
    # Calculate STD, assigned to `vol`
    if isinstance(data, pl.Series):
        vol = data.rolling_std(window_size=window_timedelta.days, min_periods=1)
    else:
        sort_cols = _set_sort_cols(data, "symbol", "date")
        if _sort and sort_cols:
            data = data.lazy().sort(sort_cols)
            for col in sort_cols:
                data = data.set_sorted(col)
        vol = data.lazy().select(
            pl.col(_column_name_returns).rolling_std_by(
                window_size=window_timedelta, min_periods=1, by="date"
            )
            * np.sqrt(trading_periods)
        )

    # Assign window size to h for adjustment
    h: int = window_timedelta.days

    if isinstance(data, pl.Series):
        count = data.len()
    elif isinstance(data, pl.LazyFrame):
        count = data.collect().shape[0]
    else:
        count = data.shape[0]

    n = (count - h) + 1
    adj_factor = 1.0 / (1.0 - (h / n) + ((h**2 - 1) / (3 * n**2)))

    if isinstance(data, pl.Series):
        return (vol * adj_factor) * 100
    else:
        result = data.lazy().with_columns(
            ((vol.collect() * adj_factor) * 100)
            .to_series()
            .alias(f"ht_volatility_pct_{h}D")
        )
    if _drop_nulls:
        result = result.drop_nulls(subset=f"ht_volatility_pct_{h}D")
    return result
rogers_satchell ¤
rogers_satchell(data: DataFrame | LazyFrame, window: str = '1m', _column_name_high: str = 'high', _column_name_low: str = 'low', _column_name_open: str = 'open', _column_name_close: str = 'close', _drop_nulls: bool = True, _avg_trading_days: bool = False, _sort: bool = True) -> LazyFrame

Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers || Command: _rogers_satchell.

Rogers-Satchell is an estimator for measuring the volatility of securities with an average return not equal to zero. Unlike Parkinson and Garman-Klass estimators, Rogers-Satchell incorporates a drift term (mean return not equal to zero). This function calculates the Rogers-Satchell volatility estimator over a specified window and optionally drops null values from the result.

Parameters:

Name Type Description Default
data DataFrame | LazyFrame

The input data for which to calculate the Rogers-Satchell volatility estimator. This can be either a DataFrame or a LazyFrame. There need to be OHLC columns present in the data.

required
window str

The window over which to calculate the volatility estimator. The window is specified as a string, such as "1m" for one month.

"1m"
_column_name_high str

The name of the column representing the high prices in the data.

"high"
_column_name_low str

The name of the column representing the low prices in the data.

"low"
_column_name_open str

The name of the column representing the opening prices in the data.

"open"
_column_name_close str

The name of the column representing the adjusted closing prices in the data.

"close"
_drop_nulls bool

Whether to drop null values from the result. If True, rows with null values in the calculated volatility column will be removed from the output.

True
_avg_trading_days bool

Indicates whether to use the average number of trading days per window. This affects how the window size is interpreted. i.e instead of "1mo" returning timedelta(days=31), it will return timedelta(days=21).

True

Returns:

Type Description
DataFrame | LazyFrame

The input data with an additional column containing the calculated Rogers-Satchell volatility estimator. The return type matches the input type (DataFrame or LazyFrame).

Source code in src/humbldata/toolbox/technical/volatility/realized_volatility_helpers.py
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
def rogers_satchell(
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    _column_name_high: str = "high",
    _column_name_low: str = "low",
    _column_name_open: str = "open",
    _column_name_close: str = "close",
    _drop_nulls: bool = True,
    _avg_trading_days: bool = False,
    _sort: bool = True,
) -> pl.LazyFrame:
    """
    Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers || **Command: _rogers_satchell**.

    Rogers-Satchell is an estimator for measuring the volatility of
    securities with an average return not equal to zero. Unlike Parkinson
    and Garman-Klass estimators, Rogers-Satchell incorporates a drift term
    (mean return not equal to zero). This function calculates the
    Rogers-Satchell volatility estimator over a specified window and optionally
    drops null values from the result.

    Parameters
    ----------
    data : pl.DataFrame | pl.LazyFrame
        The input data for which to calculate the Rogers-Satchell volatility
        estimator. This can be either a DataFrame or a LazyFrame. There need to
        be OHLC columns present in the data.
    window : str, default "1m"
        The window over which to calculate the volatility estimator. The
        window is specified as a string, such as "1m" for one month.
    _column_name_high : str, default "high"
        The name of the column representing the high prices in the data.
    _column_name_low : str, default "low"
        The name of the column representing the low prices in the data.
    _column_name_open : str, default "open"
        The name of the column representing the opening prices in the data.
    _column_name_close : str, default "close"
        The name of the column representing the adjusted closing prices in the
        data.
    _drop_nulls : bool, default True
        Whether to drop null values from the result. If True, rows with null
        values in the calculated volatility column will be removed from the
        output.
    _avg_trading_days : bool, default True
        Indicates whether to use the average number of trading days per window.
        This affects how the window size is interpreted. i.e instead of "1mo"
        returning `timedelta(days=31)`, it will return `timedelta(days=21)`.

    Returns
    -------
    pl.DataFrame | pl.LazyFrame
        The input data with an additional column containing the calculated
        Rogers-Satchell volatility estimator. The return type matches the input
        type (DataFrame or LazyFrame).
    """
    # Check if all required columns are present in the DataFrame
    _check_required_columns(
        data,
        _column_name_high,
        _column_name_low,
        _column_name_open,
        _column_name_close,
    )
    sort_cols = _set_sort_cols(data, "symbol", "date")
    if _sort and sort_cols:
        data = data.lazy().sort(sort_cols)
        for col in sort_cols:
            data = data.set_sorted(col)
    # assign window
    window_int: int = _window_format(
        window=window,
        _return_timedelta=True,
        _avg_trading_days=_avg_trading_days,
    ).days

    data = (
        data.lazy()
        .with_columns(
            [
                (pl.col(_column_name_high) / pl.col(_column_name_open))
                .log()
                .alias("log_ho"),
                (pl.col(_column_name_low) / pl.col(_column_name_open))
                .log()
                .alias("log_lo"),
                (pl.col(_column_name_close) / pl.col(_column_name_open))
                .log()
                .alias("log_co"),
            ]
        )
        .with_columns(
            (
                pl.col("log_ho") * (pl.col("log_ho") - pl.col("log_co"))
                + pl.col("log_lo") * (pl.col("log_lo") - pl.col("log_co"))
            ).alias("rs")
        )
    )
    result = data.lazy().with_columns(
        (
            pl.col("rs").rolling_map(
                _annual_vol, window_size=window_int, min_periods=1
            )
            * 100
        ).alias(f"rs_volatility_pct_{window_int}D")
    )
    if _drop_nulls:
        result = result.drop_nulls(subset=f"rs_volatility_pct_{window_int}D")
    return result
yang_zhang ¤
yang_zhang(data: DataFrame | LazyFrame, window: str = '1m', trading_periods: int = 252, _column_name_high: str = 'high', _column_name_low: str = 'low', _column_name_open: str = 'open', _column_name_close: str = 'close', _avg_trading_days: bool = False, _drop_nulls: bool = True, _sort: bool = True) -> LazyFrame

Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers || Command: _yang_zhang.

Yang-Zhang volatility is the combination of the overnight (close-to-open volatility), a weighted average of the Rogers-Satchell volatility and the day’s open-to-close volatility.

Source code in src/humbldata/toolbox/technical/volatility/realized_volatility_helpers.py
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
def yang_zhang(
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    trading_periods: int = 252,
    _column_name_high: str = "high",
    _column_name_low: str = "low",
    _column_name_open: str = "open",
    _column_name_close: str = "close",
    _avg_trading_days: bool = False,
    _drop_nulls: bool = True,
    _sort: bool = True,
) -> pl.LazyFrame:
    """
    Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers || **Command: _yang_zhang**.

    Yang-Zhang volatility is the combination of the overnight
    (close-to-open volatility), a weighted average of the Rogers-Satchell
    volatility and the day’s open-to-close volatility.
    """
    # check required columns
    _check_required_columns(
        data,
        _column_name_high,
        _column_name_low,
        _column_name_open,
        _column_name_close,
    )
    sort_cols = _set_sort_cols(data, "symbol", "date")
    if _sort and sort_cols:
        data = data.lazy().sort(sort_cols)
        for col in sort_cols:
            data = data.set_sorted(col)

    # assign window
    window_int: int = _window_format(
        window=window,
        _return_timedelta=True,
        _avg_trading_days=_avg_trading_days,
    ).days

    data = (
        data.lazy()
        .with_columns(
            [
                (pl.col(_column_name_high) / pl.col(_column_name_open))
                .log()
                .alias("log_ho"),
                (pl.col(_column_name_low) / pl.col(_column_name_open))
                .log()
                .alias("log_lo"),
                (pl.col(_column_name_close) / pl.col(_column_name_open))
                .log()
                .alias("log_co"),
                (pl.col(_column_name_open) / pl.col(_column_name_close).shift())
                .log()
                .alias("log_oc"),
                (
                    pl.col(_column_name_close)
                    / pl.col(_column_name_close).shift()
                )
                .log()
                .alias("log_cc"),
            ]
        )
        .with_columns(
            [
                (pl.col("log_oc") ** 2).alias("log_oc_sq"),
                (pl.col("log_cc") ** 2).alias("log_cc_sq"),
                (
                    pl.col("log_ho") * (pl.col("log_ho") - pl.col("log_co"))
                    + pl.col("log_lo") * (pl.col("log_lo") - pl.col("log_co"))
                ).alias("rs"),
            ]
        )
    )

    k = 0.34 / (1.34 + (window_int + 1) / (window_int - 1))
    data = _yang_zhang_engine(data=data, window=window_int)
    result = (
        data.lazy()
        .with_columns(
            (
                (
                    pl.col("open_vol")
                    + k * pl.col("close_vol")
                    + (1 - k) * pl.col("window_rs")
                ).sqrt()
                * np.sqrt(trading_periods)
                * 100
            ).alias(f"yz_volatility_pct_{window_int}D")
        )
        .select(
            pl.exclude(
                [
                    "log_ho",
                    "log_lo",
                    "log_co",
                    "log_oc",
                    "log_cc",
                    "log_oc_sq",
                    "log_cc_sq",
                    "rs",
                    "close_vol",
                    "open_vol",
                    "window_rs",
                ]
            )
        )
    )
    if _drop_nulls:
        return result.drop_nulls(subset=f"yz_volatility_pct_{window_int}D")
    return result
squared_returns ¤
squared_returns(data: DataFrame | LazyFrame, window: str = '1m', trading_periods: int = 252, _drop_nulls: bool = True, _avg_trading_days: bool = False, _column_name_returns: str = 'log_returns', _sort: bool = True) -> LazyFrame

Calculate squared returns over a rolling window.

Parameters:

Name Type Description Default
data DataFrame | LazyFrame

The input data containing the price information.

required
window str

The rolling window size for calculating squared returns, by default "1m".

'1m'
trading_periods int

The number of trading periods in a year, used for scaling the result. The default is 252.

252
_drop_nulls bool

Whether to drop null values from the result, by default True.

True
_column_name_returns str

The name of the column containing the price data, by default "close".

'log_returns'

Returns:

Type Description
DataFrame | LazyFrame

The input data structure with an additional column for the rolling squared returns.

Source code in src/humbldata/toolbox/technical/volatility/realized_volatility_helpers.py
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
def squared_returns(
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    trading_periods: int = 252,
    _drop_nulls: bool = True,
    _avg_trading_days: bool = False,
    _column_name_returns: str = "log_returns",
    _sort: bool = True,
) -> pl.LazyFrame:
    """
    Calculate squared returns over a rolling window.

    Parameters
    ----------
    data : pl.DataFrame | pl.LazyFrame
        The input data containing the price information.
    window : str, optional
        The rolling window size for calculating squared returns, by default "1m".
    trading_periods : int, optional
        The number of trading periods in a year, used for scaling the result.
        The default is 252.
    _drop_nulls : bool, optional
        Whether to drop null values from the result, by default True.
    _column_name_returns : str, optional
        The name of the column containing the price data, by default "close".

    Returns
    -------
    pl.DataFrame | pl.LazyFrame
        The input data structure with an additional column for the rolling
        squared returns.
    """
    _check_required_columns(data, _column_name_returns)

    sort_cols = _set_sort_cols(data, "symbol", "date")
    if _sort and sort_cols:
        data = data.lazy().sort(sort_cols)
        for col in sort_cols:
            data = data.set_sorted(col)

    # assign window
    window_int: int = _window_format(
        window=window,
        _return_timedelta=True,
        _avg_trading_days=_avg_trading_days,
    ).days

    data = data.lazy().with_columns(
        ((pl.col(_column_name_returns) * 100) ** 2).alias("sq_log_returns_pct")
    )
    # Calculate rolling squared returns
    result = (
        data.lazy()
        .with_columns(
            pl.col("sq_log_returns_pct")
            .rolling_mean(window_size=window_int, min_periods=1)
            .alias(f"sq_volatility_pct_{window_int}D")
        )
        .drop("sq_log_returns_pct")
    )
    if _drop_nulls:
        result = result.drop_nulls(subset=f"sq_volatility_pct_{window_int}D")
    return result
realized_volatility_model ¤

Context: Toolbox || Category: Technical || Command: calc_realized_volatility.

A command to generate Realized Volatility for any time series. A complete set of volatility estimators based on Euan Sinclair's Volatility Trading

calc_realized_volatility ¤
calc_realized_volatility(data: DataFrame | LazyFrame, window: str = '1m', method: Literal['std', 'parkinson', 'garman_klass', 'gk', 'hodges_tompkins', 'ht', 'rogers_satchell', 'rs', 'yang_zhang', 'yz', 'squared_returns', 'sq'] = 'std', grouped_mean: list[int] | None = None, _trading_periods: int = 252, _column_name_returns: str = 'log_returns', _column_name_close: str = 'close', _column_name_high: str = 'high', _column_name_low: str = 'low', _column_name_open: str = 'open', *, _sort: bool = True) -> LazyFrame | DataFrame

Context: Toolbox || Category: Technical || Command: calc_realized_volatility.

Calculates the Realized Volatility for a given time series based on the provided standard and extra parameters. This function adds ONE rolling volatility column to the input DataFrame.

Parameters:

Name Type Description Default
data DataFrame | LazyFrame

The time series data for which to calculate the Realized Volatility.

required
window str

The window size for a rolling volatility calculation, default is "1m" (1 month).

'1m'
method Literal['std', 'parkinson', 'garman_klass', 'hodges_tompkins', 'rogers_satchell', 'yang_zhang', 'squared_returns']

The volatility estimator to use. You can also use abbreviations to access the same methods. The abbreviations are: gk for garman_klass, ht for hodges_tompkins, rs for rogers_satchell, yz for yang_zhang, sq for squared_returns.

'std'
grouped_mean list[int] | None

A list of window sizes to use for calculating volatility. If provided, the volatility method will be calculated across these various windows, and then an averaged value of all the windows will be returned. If None, a single window size specified by window parameter will be used.

None
_sort bool

If True, the data will be sorted before calculation. Default is True.

True
_trading_periods int

The number of trading periods in a year, default is 252 (the typical number of trading days in a year).

252
_column_name_returns str

The name of the column containing the returns. Default is "log_returns".

'log_returns'
_column_name_close str

The name of the column containing the close prices. Default is "close".

'close'
_column_name_high str

The name of the column containing the high prices. Default is "high".

'high'
_column_name_low str

The name of the column containing the low prices. Default is "low".

'low'
_column_name_open str

The name of the column containing the open prices. Default is "open".

'open'

Returns:

Type Description
VolatilityData

The calculated Realized Volatility data for the given time series.

Notes
  • Rolling calculations are used to show a time series of recent volatility that captures only a certain number of data points. The window size is used to determine the number of data points to use in the calculation. We do this because when looking at the volatility of a stock, you get a better insight (more granular) into the characteristics of the volatility seeing how 1-month or 3-month rolling volatility looked over time.

  • This function does not accept pl.Series because the methods used to calculate volatility require, high, low, close, open columns for the data. It would be too cumbersome to pass each series needed for the calculation as a separate argument. Therefore, the function only accepts pl.DataFrame or pl.LazyFrame as input.

Source code in src/humbldata/toolbox/technical/volatility/realized_volatility_model.py
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
def calc_realized_volatility(
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    method: Literal[  # used to be rvol_method
        "std",
        "parkinson",
        "garman_klass",
        "gk",
        "hodges_tompkins",
        "ht",
        "rogers_satchell",
        "rs",
        "yang_zhang",
        "yz",
        "squared_returns",
        "sq",
    ] = "std",
    grouped_mean: list[int] | None = None,  # used to be rv_mean
    _trading_periods: int = 252,
    _column_name_returns: str = "log_returns",
    _column_name_close: str = "close",
    _column_name_high: str = "high",
    _column_name_low: str = "low",
    _column_name_open: str = "open",
    *,
    _sort: bool = True,
) -> pl.LazyFrame | pl.DataFrame:
    """
    Context: Toolbox || Category: Technical || **Command: calc_realized_volatility**.

    Calculates the Realized Volatility for a given time series based on the
    provided standard and extra parameters. This function adds ONE rolling
    volatility column to the input DataFrame.

    Parameters
    ----------
    data : pl.DataFrame | pl.LazyFrame
        The time series data for which to calculate the Realized Volatility.
    window : str
        The window size for a rolling volatility calculation, default is `"1m"`
        (1 month).
    method : Literal["std", "parkinson", "garman_klass", "hodges_tompkins","rogers_satchell", "yang_zhang", "squared_returns"]
        The volatility estimator to use. You can also use abbreviations to
        access the same methods. The abbreviations are: `gk` for `garman_klass`,
        `ht` for `hodges_tompkins`, `rs` for `rogers_satchell`, `yz` for
        `yang_zhang`, `sq` for `squared_returns`.
    grouped_mean : list[int] | None
        A list of window sizes to use for calculating volatility. If provided,
        the volatility method will be calculated across these various windows,
        and then an averaged value of all the windows will be returned. If `None`,
        a single window size specified by `window` parameter will be used.
    _sort : bool
        If True, the data will be sorted before calculation. Default is True.
    _trading_periods : int
        The number of trading periods in a year, default is 252 (the typical
        number of trading days in a year).
    _column_name_returns : str
        The name of the column containing the returns. Default is "log_returns".
    _column_name_close : str
        The name of the column containing the close prices. Default is "close".
    _column_name_high : str
        The name of the column containing the high prices. Default is "high".
    _column_name_low : str
        The name of the column containing the low prices. Default is "low".
    _column_name_open : str
        The name of the column containing the open prices. Default is "open".

    Returns
    -------
    VolatilityData
        The calculated Realized Volatility data for the given time series.

    Notes
    -----
    - Rolling calculations are used to show a time series of recent volatility
    that captures only a certain number of data points. The window size is
    used to determine the number of data points to use in the calculation. We do
    this because when looking at the volatility of a stock, you get a better
    insight (more granular) into the characteristics of the volatility seeing how 1-month or
    3-month rolling volatility looked over time.

    - This function does not accept `pl.Series` because the methods used to
    calculate volatility require, high, low, close, open columns for the data.
    It would be too cumbersome to pass each series needed for the calculation
    as a separate argument. Therefore, the function only accepts `pl.DataFrame`
    or `pl.LazyFrame` as input.
    """  # noqa: W505
    # Step 1: Get the correct realized volatility function =====================
    func = VOLATILITY_METHODS.get(method)
    if not func:
        msg = f"Volatility method: '{method}' is not supported."
        raise HumblDataError(msg)

    # Step 2: Get the names of the parameters that the function accepts ========
    func_params = inspect.signature(func).parameters

    # Step 3: Filter out the parameters not accepted by the function ===========
    args_to_pass = {
        key: value for key, value in locals().items() if key in func_params
    }

    # Step 4: Calculate Realized Volatility ====================================
    if grouped_mean:
        # calculate volatility over multiple windows and average the result, add to a new column
        print("🚧 WIP!")
    else:
        out = func(**args_to_pass)

    return out