SignalEncodings
SignalEncodings.jl is a Julia package for discretizing numeric signals into bins (a.k.a. quantization).
It provides a common interface for converting continuous values into integer bin indices using three different strategies:
- Uniform: equally spaced bins between the minimum and maximum values
- Quantile: bins based on empirical quantiles
- Jenks: iterative natural-breaks binning that minimizes within-bin deviation
The package is designed to work with several input layouts, including:
- scalar vectors
- tabular data (
n_samples × n_featuresmatrices) - time series stored as cells of vectors
- images stored as cells of matrices
- arbitrary tensors stored as cells of N-D arrays
Quick start
using SignalEncodings
X = rand(Float32, 100, 4)
config = Uniform(; nbins=16)
X_bin, edges = bin(config, X)
config = Quantile(; nbins=16, type=:linear)
X_bin, edges = bin(config, X)
config = Jenks(; nbins=16, errornorm=:l1)
X_bin, edges = bin(config, X)Available algorithms
| Config | Strategy | Main parameters |
|---|---|---|
Uniform | Linearly spaced edges between min and max | nbins |
Quantile | Edges at empirical quantiles | nbins, type |
Jenks | Iterative optimization of within-bin deviation | nbins, maxiter, flux, errornorm |
All configurations share a common interface through nbins, max_nobs, and rng.
Output format
bin(config, X) returns:
X_bin: binned indices asUInt8edges: one edge vector per feature
The original shape is preserved for multidimensional inputs.
Documentation
- See Algorithms for method details.
- See the API documentation below for full type and function references.
SignalEncodings.AlphaBetaQuantilesSignalEncodings.JenksErrNormSignalEncodings.BinningConfigSignalEncodings.JenksSignalEncodings.QuantileSignalEncodings.UniformSignalEncodings.check_errnormSignalEncodings.check_parametersSignalEncodings.check_quantilesSignalEncodings.encodeSignalEncodings.encodeSignalEncodings.encodeSignalEncodings.encodeSignalEncodings.encodeSignalEncodings.get_alphaSignalEncodings.get_betaSignalEncodings.get_deviationSignalEncodings.get_errornormSignalEncodings.get_fluxSignalEncodings.get_fluxadjustSignalEncodings.get_fluxadjust_bothwaysSignalEncodings.get_idxsSignalEncodings.get_initmodeSignalEncodings.get_max_nobsSignalEncodings.get_maxiterSignalEncodings.get_nbinsSignalEncodings.get_rngSignalEncodings.lin_deviationSignalEncodings.sq_deviation
SignalEncodings.AlphaBetaQuantiles — Constant
Mapping of quantile interpolation type to (alpha, beta) used by Statistics.quantile.
Supported types:
:linear=>(1.0, 1.0)(default):inverted=>(0.0, 0.0):average=>(0.0, 1.0):median=>(1//3, 1//3):normal=>(3//8, 3//8):matlab=>(0.5, 0.5)
SignalEncodings.BinningConfig — Type
Union of all supported binning configuration types.
SignalEncodings.JenksErrNorm — Constant
Mapping of Jenks error norms to deviation functions.
:l1useslin_deviation(sum of absolute deviations):l2usessq_deviation(sum of squared deviations)
SignalEncodings.Jenks — Type
Jenks(; nbins=64, maxiter=200, flux=0.1, fluxadjust=1.03,
fluxadjust_bothways=true, errornorm=:l1, max_nobs=1000,
rng=Xoshiro(42))
Jenks <: AbstractBinningConfigJenks-style iterative discretization config.
Fields:
nbins::UInt8: number of bins (2 ≤ nbins ≤ 255).maxiter::Int: maximum optimization iterations.flux::Real: initial boundary-shift ratio.fluxadjust::Real: multiplicative flux adaptation factor.fluxadjust_bothways::Bool: allow both increase/decrease of flux.errornorm::Base.Callable: deviation function (lin_deviationorsq_deviation).max_nobs::Int: shared-interface sampling parameter.rng::AbstractRNG: shared-interface RNG parameter.
Constructor keywords:
nbins,maxiter,flux,fluxadjust,fluxadjust_bothways.errornorm::l1or:l2.max_nobs,rng.
SignalEncodings.Quantile — Type
Quantile(; type=:linear, nbins=64, max_nobs=1000, rng=Xoshiro(42))
Quantile <: AbstractBinningConfigQuantile-based discretization config.
Fields:
nbins::UInt8: number of bins (2 ≤ nbins ≤ 255).alpha::Float16,beta::Float16: quantile interpolation parameters.max_nobs::Int: sampling budget per bin for quantile estimation.rng::AbstractRNG: RNG for reproducible subsampling.
Constructor keywords:
type: interpolation mode key inAlphaBetaQuantiles.nbins: number of bins.max_nobs: observations-per-bin cap factor.rng: random generator used when sampling is required.
SignalEncodings.Uniform — Type
Uniform(; nbins=64, max_nobs=1000, rng=Xoshiro(42))
Uniform <: AbstractBinningConfigUniform-width discretization config.
Fields:
nbins::UInt8: number of bins (2 ≤ nbins ≤ 255).max_nobs::Int: sampling budget per bin used by edge estimation (effective cap:max_nobs * nbins).rng::AbstractRNG: RNG for reproducible subsampling.
Constructor keywords:
nbins: number of bins.max_nobs: observations-per-bin cap factor.rng: random generator used when sampling is required.
SignalEncodings.check_errnorm — Method
check_errnorm(errornorm)Validate Jenks error norm key (:l1 or :l2).
SignalEncodings.check_parameters — Method
check_parameters(nbins, max_nobs)Validate generic binning parameters.
nbinsmust be in[2, 255](stored asUInt8)max_nobsmust be≥ 1
SignalEncodings.check_quantiles — Method
check_quantiles(type)Validate quantile interpolation type key for Quantile.
SignalEncodings.encode — Method
encode(config::Jenks, x)Discretize a numeric vector x with an iterative Jenks-style optimization.
The algorithm adjusts class breaks to reduce within-encode deviation using the configured deviation function and flux parameters. Returns (x_bin, edges), where x_bin are 1-based encode indices and edges are learned break values.
SignalEncodings.encode — Method
encode(config::Quantile, x)Discretize a numeric vector x using quantile-based bins.
Internal edges are computed from quantiles of sampled observations (get_idxs), with interpolation controlled by alpha and beta from config. Returns (x_bin, edges) where x_bin contains 1-based encode indices.
SignalEncodings.encode — Method
encode(config::Uniform, x)Discretize a numeric vector x into uniformly spaced bins.
Edges are linearly spaced between minimum(x) and maximum(x). Returns:
x_bin::Vector{UInt8}: 1-based encode index for each value inxedges::Vector: encode edge values used for discretization
SignalEncodings.encode — Method
encode(config, X::AbstractArray{T})Feature-wise binning for tabular numeric data (n_samples × n_features).
Each column is binned independently (threaded), returning:
X_bin::Vector{Vector{UInt8}}: one binned vector per featureedges::Vector{Vector}: one edge vector per feature
SignalEncodings.encode — Method
encode(config, X::Matrix{<:AbstractArray{T}})Binning for datasets where each cell is a multidimensional item (e.g., time series vectors, images, or tensors).
For each column/feature:
- all per-row items are flattened and concatenated,
- encode edges are learned once on the flattened values,
- binned values are reshaped back to each original item shape.
Returns:
- binned data with original
(nrows, ncols)structure and per-item shapes preserved. edges::Vector{Vector}with one edge vector per column.
SignalEncodings.get_alpha — Method
Return quantile interpolation alpha parameter.
SignalEncodings.get_beta — Method
Return quantile interpolation beta parameter.
SignalEncodings.get_deviation — Method
Return the deviation callable used by Jenks optimization.
SignalEncodings.get_errornorm — Method
Return Jenks error function.
SignalEncodings.get_flux — Method
Return current Jenks flux value.
SignalEncodings.get_fluxadjust — Method
Return Jenks flux adaptation factor.
SignalEncodings.get_fluxadjust_bothways — Method
Return whether Jenks flux adapts in both directions.
SignalEncodings.get_idxs — Method
get_idxs(x, max_nobs, nbins, rng)Return observation indices used to estimate encode edges.
If length(x) > max_nobs * nbins, a reproducible, ordered sample without replacement is drawn using rng. Otherwise, all indices are returned.
This limits edge-estimation cost on large datasets while preserving input-order indexing.
SignalEncodings.get_initmode — Method
Return Jenks initialization mode.
SignalEncodings.get_max_nobs — Method
Return max_nobs sampling budget factor.
SignalEncodings.get_maxiter — Method
Return maximum number of Jenks iterations.
SignalEncodings.get_nbins — Method
Return number of bins for any binning config.
SignalEncodings.get_rng — Method
Return RNG associated with config.
SignalEncodings.lin_deviation — Method
lin_deviation(x::AbstractVector{T}) where {T<:Real}Return the sum of absolute deviations from the mean of x:
∑ᵢ |xᵢ - μ|, where μ = mean(x).
Useful as an L1 spread measure (less sensitive to outliers than squared deviation).
SignalEncodings.sq_deviation — Method
sq_deviation(x::AbstractVector{T}) where {T<:Real}Return the sum of squared deviations from the mean of x:
∑ᵢ (xᵢ - μ)^2, where μ = mean(x).
Equivalent to length(x) * var(x) when variance is computed with population normalization.
License
MIT License
About
Developed by the ACLAI Lab at the University of Ferrara.