MadamOpt.jl

Summary

This is a teseting ground for some extensions related to Adam (Adaptive Moment Estimation). For a summary description see the readme.

API reference

MadamOpt.MadamType
Madam{T}(
    theta
    ; alpha     = 0.01
    , beta1     = 0.9
    , beta2     = 0.999
    , beta3     = 0.9
    , eps       = 1e-8
    , max_temp  = 0.0
    , max_steps = nothing
    , dx        = 1e-8
)

Arguments

  • theta::T: the initial parameters (i.e. staring point).
  • alpha::Float64: learning-rate / step-size.
  • beta1::Float64: controls the exponential decay rate for the 1st moment.
  • beta2::Float64: controls the exponential decay rate for the 2nd moment.
  • beta3::Float64: in gradient-free usage, beta3 determines over what period a gradient approximation that was not recently sampled should decay
  • eps::Float64: small-constant for numerical stability.
  • max_temp::Float64: starting temperature representing the search space perturbations used to estimate the gradient; the perturbations will approach zero when max_steps is reached; setting this parameter can help with non-convex problems
  • max_steps::Union{Integer, Nothing}: if an Integer is set, the algorithm will estimate the gradient using a larger region around the current estimate to allow optimizing non-convex functions
  • dx::Float64: the steady state perturbations used to estimate the gradient after max_steps have been taken

Construct the Madam optimizer.

source
MadamOpt.step!Function
step!(
    adam
    , grad
    ; l1_penalty    = nothing
    , l2_penalty    = nothing
    , weight_decay  = nothing
)

Advances the algorithm by one step based on the current gradient, but does not update theta estimates. This method is included for interfaces such as those of FluxML which require that only the step size is returned.

Arguments

  • adam::Madam{T}
  • grad::T: the gradient for the next set of observations (i.e mini-batch)
  • l1_penalty::Union{Nothing, Float64}: penalty for L1 regularization (implemented via the proximal operator)
  • l2_penalty::Union{Nothing, Float64}: penalty for L2 regularization
  • weight_decay::Union{Nothing, Float64}: weight decay regularization
source
step!(adam, loss; grad_samples=max(1,length(adam.theta) >> 4), kwargs...)

Advances the algorithm by one step based on a loss function.

Arguments

  • adam::Madam{T}
  • loss::Function: objective or loss function (e.g. mini-batch loss)
  • grad_samples::Integer: The number of samples taken from the loss function to estimate the gradient. This value must be in the range 0 <= grad_samples <= length(parameters).
  • kwargs....: l1_penalty, l2_penalty, and weight_decay penalties described above
source
MadamOpt.step_ols!Function
step_ols!(adam, A, b; kwargs...)

For linear models of the form Ax = b, this is a convenience method that calculates the gradient given A and b.

Arguments

  • adam::Madam
  • A::AbstractArray{Float64, Union{1,2}}: Design / independent variables.
  • A::Union{AbstractArray{Float64, 1}, Float64}: Response / dependent variables.
  • kwargs....: same keyword arguments as for step!.
source
MadamOpt.currentFunction
current(adam::Madam)

Retrieves a reference to the current parameter estimates (i.e. theta in constructor Madam).

source