MadamOpt.jl
Summary
This is a teseting ground for some extensions related to Adam (Adaptive Moment Estimation). For a summary description see the readme.
API reference
MadamOpt.Madam
— TypeMadam{T}(
theta
; alpha = 0.01
, beta1 = 0.9
, beta2 = 0.999
, beta3 = 0.9
, eps = 1e-8
, max_temp = 0.0
, max_steps = nothing
, dx = 1e-8
)
Arguments
theta::T
: the initial parameters (i.e. staring point).alpha::Float64
: learning-rate / step-size.beta1::Float64
: controls the exponential decay rate for the 1st moment.beta2::Float64
: controls the exponential decay rate for the 2nd moment.beta3::Float64
: in gradient-free usage,beta3
determines over what period a gradient approximation that was not recently sampled should decayeps::Float64
: small-constant for numerical stability.max_temp::Float64
: starting temperature representing the search space perturbations used to estimate the gradient; the perturbations will approach zero whenmax_steps
is reached; setting this parameter can help with non-convex problemsmax_steps::Union{Integer, Nothing}
: if anInteger
is set, the algorithm will estimate the gradient using a larger region around the current estimate to allow optimizing non-convex functionsdx::Float64
: the steady state perturbations used to estimate the gradient aftermax_steps
have been taken
Construct the Madam optimizer.
MadamOpt.step!
— Functionstep!(
adam
, grad
; l1_penalty = nothing
, l2_penalty = nothing
, weight_decay = nothing
)
Advances the algorithm by one step based on the current gradient, but does not update theta estimates. This method is included for interfaces such as those of FluxML
which require that only the step size is returned.
Arguments
adam::Madam{T}
grad::T
: the gradient for the next set of observations (i.e mini-batch)l1_penalty::Union{Nothing, Float64}
: penalty for L1 regularization (implemented via the proximal operator)l2_penalty::Union{Nothing, Float64}
: penalty for L2 regularizationweight_decay::Union{Nothing, Float64}
: weight decay regularization
step!(adam, loss; grad_samples=max(1,length(adam.theta) >> 4), kwargs...)
Advances the algorithm by one step based on a loss function.
Arguments
adam::Madam{T}
loss::Function
: objective or loss function (e.g. mini-batch loss)grad_samples::Integer
: The number of samples taken from the loss function to estimate the gradient. This value must be in the range0 <= grad_samples <= length(parameters)
.kwargs....
:l1_penalty
,l2_penalty
, andweight_decay
penalties described above
MadamOpt.update!
— Functionupdate!(adam, args...; kwargs...)
Calls step!
and then also updates the parameter estimates theta
.
MadamOpt.step_ols!
— Functionstep_ols!(adam, A, b; kwargs...)
For linear models of the form Ax = b
, this is a convenience method that calculates the gradient given A
and b
.
Arguments
adam::Madam
A::AbstractArray{Float64, Union{1,2}}
: Design / independent variables.A::Union{AbstractArray{Float64, 1}, Float64}
: Response / dependent variables.kwargs....
: same keyword arguments as forstep!
.
MadamOpt.update_ols!
— Functionupdate_ols!(adam, args...; kwargs...)
Calls step_ols!
and then also updates the parameter estimates theta
.
MadamOpt.current
— Functioncurrent(adam::Madam)
Retrieves a reference to the current parameter estimates (i.e. theta
in constructor Madam
).