Previously I used a damping factor δ for the nonlinear fitting of the modified Veloclinic power-duration model. This resulted in me fitting 28 out of 29 test cases when I set δ = 0.25.
But δ = 0.25 is too small: it means at best 25% of the progress toward the solution is covered with each iteration. That's inefficient when homing in on the result.
The real issue is when the solver isn't homing in on the result: when the solution is still substantially off and the solver it attempting to make a big leap toward the solution. The big leap could easily overshoot the desired solution, or be in a slightly wrong direction in the hyperdimensional parameter space. In such instances, it's better to take smaller steps toward the solution, to home in on the the "zone of quadratic convergence" where the targeting becomes easier.
For that I introduced a dynamic weighting scheme multiplying the primary damping factor by a factor dependent on the step size of the parameters. Since I am solving for the logarithm of the parameters rather than the parameters themselves, what constitutes a "large step" is the same for each one: magnitude 1 is a large step, much less than magnitude 1 is a small step, and much larger than 1 is huge.
So what I decided to do is to use a damping scheme which restricted the maximum step size to 1:
δ = δ0 / (1 + |net undamped step size|)
where δ0 is the damping factor applied to small steps and the step size is the square root of the sum of the squares of the undamped changes in the natural logarithms of each parameter.
With the damping set to 0.25, I'd fit 28 of my 29 datasets. With the dynamic scheme and δ0 = 1, I fit all 29, and in general the solution was much quicker.
Another type of damping is to reduce the coefficient (more damping) if the number of iterations exceeds a certain value. For example, after 32 iterations without convergence to a solution, I can reduced the coefficient using the following formula:
δ → δη δf1-η
This causes δ to transition from its original value (for example, 1) to a final value (for example, δf = 0.25). after 32 iterations. I used η = 0.95, which limits the rate at which the δ is reduced. This may only be needed when the data are relatively poor and don't fit the model well. The problem can be reduced by reducing the weighting factor.