MadamOpt.jl

Summary

This is a teseting ground for some extensions related to Adam (Adaptive Moment Estimation). For a summary description see the readme.

API reference

MadamOpt.Madam — Type

Madam{T}(
    theta
    ; alpha     = 0.01
    , beta1     = 0.9
    , beta2     = 0.999
    , beta3     = 0.9
    , eps       = 1e-8
    , max_temp  = 0.0
    , max_steps = nothing
    , dx        = 1e-8
)

Arguments

theta::T: the initial parameters (i.e. staring point).
alpha::Float64: learning-rate / step-size.
beta1::Float64: controls the exponential decay rate for the 1st moment.
beta2::Float64: controls the exponential decay rate for the 2nd moment.
beta3::Float64: in gradient-free usage, beta3 determines over what period a gradient approximation that was not recently sampled should decay
eps::Float64: small-constant for numerical stability.
max_temp::Float64: starting temperature representing the search space perturbations used to estimate the gradient; the perturbations will approach zero when max_steps is reached; setting this parameter can help with non-convex problems
max_steps::Union{Integer, Nothing}: if an Integer is set, the algorithm will estimate the gradient using a larger region around the current estimate to allow optimizing non-convex functions
dx::Float64: the steady state perturbations used to estimate the gradient after max_steps have been taken

Construct the Madam optimizer.

source

MadamOpt.step! — Function

step!(
    adam
    , grad
    ; l1_penalty    = nothing
    , l2_penalty    = nothing
    , weight_decay  = nothing
)

Advances the algorithm by one step based on the current gradient, but does not update theta estimates. This method is included for interfaces such as those of FluxML which require that only the step size is returned.

Arguments

adam::Madam{T}
grad::T: the gradient for the next set of observations (i.e mini-batch)
l1_penalty::Union{Nothing, Float64}: penalty for L1 regularization (implemented via the proximal operator)
l2_penalty::Union{Nothing, Float64}: penalty for L2 regularization
weight_decay::Union{Nothing, Float64}: weight decay regularization

source

step!(adam, loss; grad_samples=max(1,length(adam.theta) >> 4), kwargs...)

Advances the algorithm by one step based on a loss function.

Arguments

adam::Madam{T}
loss::Function: objective or loss function (e.g. mini-batch loss)
grad_samples::Integer: The number of samples taken from the loss function to estimate the gradient. This value must be in the range 0 <= grad_samples <= length(parameters).
kwargs....: l1_penalty, l2_penalty, and weight_decay penalties described above

source

MadamOpt.update! — Function

update!(adam, args...; kwargs...)

Calls step! and then also updates the parameter estimates theta.

source

MadamOpt.step_ols! — Function

step_ols!(adam, A, b; kwargs...)

For linear models of the form Ax = b, this is a convenience method that calculates the gradient given A and b.

Arguments

adam::Madam
A::AbstractArray{Float64, Union{1,2}}: Design / independent variables.
A::Union{AbstractArray{Float64, 1}, Float64}: Response / dependent variables.
kwargs....: same keyword arguments as for step!.

source

MadamOpt.update_ols! — Function

update_ols!(adam, args...; kwargs...)

Calls step_ols! and then also updates the parameter estimates theta.

source

MadamOpt.current — Function

current(adam::Madam)

Retrieves a reference to the current parameter estimates (i.e. theta in constructor Madam).

source