SignalEncodings

SignalEncodings.jl is a Julia package for discretizing numeric signals into bins (a.k.a. quantization).

It provides a common interface for converting continuous values into integer bin indices using three different strategies:

  • Uniform: equally spaced bins between the minimum and maximum values
  • Quantile: bins based on empirical quantiles
  • Jenks: iterative natural-breaks binning that minimizes within-bin deviation

The package is designed to work with several input layouts, including:

  • scalar vectors
  • tabular data (n_samples × n_features matrices)
  • time series stored as cells of vectors
  • images stored as cells of matrices
  • arbitrary tensors stored as cells of N-D arrays

Quick start

using SignalEncodings

X = rand(Float32, 100, 4)

config = Uniform(; nbins=16)
X_bin, edges = bin(config, X)

config = Quantile(; nbins=16, type=:linear)
X_bin, edges = bin(config, X)

config = Jenks(; nbins=16, errornorm=:l1)
X_bin, edges = bin(config, X)

Available algorithms

ConfigStrategyMain parameters
UniformLinearly spaced edges between min and maxnbins
QuantileEdges at empirical quantilesnbins, type
JenksIterative optimization of within-bin deviationnbins, maxiter, flux, errornorm

All configurations share a common interface through nbins, max_nobs, and rng.

Output format

bin(config, X) returns:

  • X_bin: binned indices as UInt8
  • edges: one edge vector per feature

The original shape is preserved for multidimensional inputs.

Documentation

  • See Algorithms for method details.
  • See the API documentation below for full type and function references.
SignalEncodings.AlphaBetaQuantilesConstant

Mapping of quantile interpolation type to (alpha, beta) used by Statistics.quantile.

Supported types:

  • :linear => (1.0, 1.0) (default)
  • :inverted => (0.0, 0.0)
  • :average => (0.0, 1.0)
  • :median => (1//3, 1//3)
  • :normal => (3//8, 3//8)
  • :matlab => (0.5, 0.5)
source
SignalEncodings.JenksErrNormConstant

Mapping of Jenks error norms to deviation functions.

  • :l1 uses lin_deviation (sum of absolute deviations)
  • :l2 uses sq_deviation (sum of squared deviations)
source
SignalEncodings.JenksType
Jenks(; nbins=64, maxiter=200, flux=0.1, fluxadjust=1.03,
        fluxadjust_bothways=true, errornorm=:l1, max_nobs=1000,
        rng=Xoshiro(42))
Jenks <: AbstractBinningConfig

Jenks-style iterative discretization config.

Fields:

  • nbins::UInt8: number of bins (2 ≤ nbins ≤ 255).
  • maxiter::Int: maximum optimization iterations.
  • flux::Real: initial boundary-shift ratio.
  • fluxadjust::Real: multiplicative flux adaptation factor.
  • fluxadjust_bothways::Bool: allow both increase/decrease of flux.
  • errornorm::Base.Callable: deviation function (lin_deviation or sq_deviation).
  • max_nobs::Int: shared-interface sampling parameter.
  • rng::AbstractRNG: shared-interface RNG parameter.

Constructor keywords:

  • nbins, maxiter, flux, fluxadjust, fluxadjust_bothways.
  • errornorm: :l1 or :l2.
  • max_nobs, rng.
source
SignalEncodings.QuantileType
Quantile(; type=:linear, nbins=64, max_nobs=1000, rng=Xoshiro(42))
Quantile <: AbstractBinningConfig

Quantile-based discretization config.

Fields:

  • nbins::UInt8: number of bins (2 ≤ nbins ≤ 255).
  • alpha::Float16, beta::Float16: quantile interpolation parameters.
  • max_nobs::Int: sampling budget per bin for quantile estimation.
  • rng::AbstractRNG: RNG for reproducible subsampling.

Constructor keywords:

  • type: interpolation mode key in AlphaBetaQuantiles.
  • nbins: number of bins.
  • max_nobs: observations-per-bin cap factor.
  • rng: random generator used when sampling is required.
source
SignalEncodings.UniformType
Uniform(; nbins=64, max_nobs=1000, rng=Xoshiro(42))
Uniform <: AbstractBinningConfig

Uniform-width discretization config.

Fields:

  • nbins::UInt8: number of bins (2 ≤ nbins ≤ 255).
  • max_nobs::Int: sampling budget per bin used by edge estimation (effective cap: max_nobs * nbins).
  • rng::AbstractRNG: RNG for reproducible subsampling.

Constructor keywords:

  • nbins: number of bins.
  • max_nobs: observations-per-bin cap factor.
  • rng: random generator used when sampling is required.
source
SignalEncodings.encodeMethod
encode(config::Jenks, x)

Discretize a numeric vector x with an iterative Jenks-style optimization.

The algorithm adjusts class breaks to reduce within-encode deviation using the configured deviation function and flux parameters. Returns (x_bin, edges), where x_bin are 1-based encode indices and edges are learned break values.

source
SignalEncodings.encodeMethod
encode(config::Quantile, x)

Discretize a numeric vector x using quantile-based bins.

Internal edges are computed from quantiles of sampled observations (get_idxs), with interpolation controlled by alpha and beta from config. Returns (x_bin, edges) where x_bin contains 1-based encode indices.

source
SignalEncodings.encodeMethod
encode(config::Uniform, x)

Discretize a numeric vector x into uniformly spaced bins.

Edges are linearly spaced between minimum(x) and maximum(x). Returns:

  • x_bin::Vector{UInt8}: 1-based encode index for each value in x
  • edges::Vector: encode edge values used for discretization
source
SignalEncodings.encodeMethod
encode(config, X::AbstractArray{T})

Feature-wise binning for tabular numeric data (n_samples × n_features).

Each column is binned independently (threaded), returning:

  • X_bin::Vector{Vector{UInt8}}: one binned vector per feature
  • edges::Vector{Vector}: one edge vector per feature
source
SignalEncodings.encodeMethod
encode(config, X::Matrix{<:AbstractArray{T}})

Binning for datasets where each cell is a multidimensional item (e.g., time series vectors, images, or tensors).

For each column/feature:

  1. all per-row items are flattened and concatenated,
  2. encode edges are learned once on the flattened values,
  3. binned values are reshaped back to each original item shape.

Returns:

  • binned data with original (nrows, ncols) structure and per-item shapes preserved.
  • edges::Vector{Vector} with one edge vector per column.
source
SignalEncodings.get_idxsMethod
get_idxs(x, max_nobs, nbins, rng)

Return observation indices used to estimate encode edges.

If length(x) > max_nobs * nbins, a reproducible, ordered sample without replacement is drawn using rng. Otherwise, all indices are returned.

This limits edge-estimation cost on large datasets while preserving input-order indexing.

source
SignalEncodings.lin_deviationMethod
lin_deviation(x::AbstractVector{T}) where {T<:Real}

Return the sum of absolute deviations from the mean of x:

∑ᵢ |xᵢ - μ|, where μ = mean(x).

Useful as an L1 spread measure (less sensitive to outliers than squared deviation).

source
SignalEncodings.sq_deviationMethod
sq_deviation(x::AbstractVector{T}) where {T<:Real}

Return the sum of squared deviations from the mean of x:

∑ᵢ (xᵢ - μ)^2, where μ = mean(x).

Equivalent to length(x) * var(x) when variance is computed with population normalization.

source

License

MIT License

About

Developed by the ACLAI Lab at the University of Ferrara.