oval best fit

What oval goes through two points?  Or, better, the best-fit oval:
X²/a² + Y²/b² = 1
ε = X²/a² + Y²/b² - 1
ε² = X⁴/a⁴ + Y⁴/b⁴ + 1 + 2X²Y²/a²b² - 2X²/a² - 2Y²/b²
∑ε² = ∑X⁴/a⁴ + ∑Y⁴/b⁴ + ∑1 + 2∑X²Y²/a²b² - 2∑X²/a² - 2∑Y²/b²
d∑/da = -4∑X⁴/a⁵ + 0 + 0 + -4∑X²Y²/a³b² + 4∑X²/a³ + 0
d∑/db = 0 + -4∑Y⁴/b⁵ + 0 + -4∑X²Y²/a²b³ + 0 + 4∑Y²/b³

(d∑da = 0) and (d∑db = 0):

-4∑X⁴/a⁵ -4∑X²Y²/a³b² + 4∑X²/a³ = 0
-4∑Y⁴/b⁵ -4∑X²Y²/a²b³ + 4∑Y²/b³ = 0

divide by -4 and move the the negative term to the zero side:
∑X⁴/a⁵ + ∑X²Y²/a³b² = ∑X²/a³
∑Y⁴/b⁵ + ∑X²Y²/a²b³ = ∑Y²/b³

multiply by param³, which gets the right side without any parameter
∑X⁴/a² + ∑X²Y²/b² = ∑X²
∑Y⁴/b² + ∑X²Y²/a² = ∑Y²

Reorder the second to get it closer to matrix form:
(∑X⁴  )/a² + (∑X²Y²)/b² = ∑X²
(∑X²Y²)/a² + (∑Y⁴  )/b² = ∑Y²

Matrix form:
[ ∑X⁴     ∑X²Y² ] [1/a²] = [ ∑X² ]
[ ∑X²Y²   ∑Y⁴   ] [1/b²] = [ ∑Y² ]

Also tried a derivation (not shown) where A=1/a² and B=1/b² , and it comes to the same answer with
the column vector(A,B). So the p⁻² in the original equation, which generates powers of
-2,-3,-4,and -5 in the various stages, they all cancel out except for the final p⁻², so the two
are equivalent (except the p⁻² form emphasizes that a and b cannot be zero and still be a valid
ellipse).

Other hyperbolic best fits could be done similarly.  Actually, any implicit equation
f(𝕩 ∈ 𝕏|𝕡 ∈ ℙ)=k, where 𝕩 ∈ 𝕏 are the measured variables and 𝕡 ∈ ℙ are all the parameters, can be
attempted to be solved with that same method, assuming you can isolate each 𝕡 ∈ ℙ after you've set
∂(∑ε²)/∂𝕡=0

generic conic best fit

Wiki lists the arbitrary conic section as
    Ax² + Bxy + Cy² + Dx + Ey + F = 0

When I tried the process above, and generated a matrix of sums, there weren't any "lonely" terms
without 𝕡 ∈ {A,B,C,D,E,F}, so when I tried to do the M*𝕏=Q and solve for 𝕏, the Q was the
zero-column, which meant that 𝕏 was the zero-column as well.  That's not useful.

In using [Geogebra CAS](https://www.geogebra.org/calculator "in CAS mode"), I compared the partial
derivatives from the my original working oval to the one here, and found there were no negative
terms, and I realized that it was because there was no -1 term coming from the =1 in the original
oval equation.

However, if you normalize everything in the generic equation by -F, that is equivalent to
    Ax² + Bxy + Cy² + Dx + Ey = 1

... the F wasn't contributing anything meaningful.  Re-solve the partial derivatives for the
5-parameter version, and now everything looks much more reasonable:

__GEOGEBRA__
err2(x, y)=Expand((A x^(2)+B x y+C y^(2)+D x+E y-1)^(2))
ddA(x,y)=Derivative(Expand(Simplify(err2(x,y)/2))=0,A)
ddB(x,y)=Derivative(Expand(Simplify(err2(x,y)/2))=0,B)
ddC(x,y)=Derivative(Expand(Simplify(err2(x,y)/2))=0,C)
ddD(x,y)=Derivative(Expand(Simplify(err2(x,y)/2))=0,D)
ddE(x,y)=Derivative(Expand(Simplify(err2(x,y)/2))=0,E)
# divide each by 2 because that is a common factor throughout
__END__

I then manipulated the results into clean matrix form in Notepad++:

    [ ∑X⁴     ∑X³Y    ∑X²Y²   ∑X³     ∑X²Y    ]   [ A ]   [ ∑X² ]
    [ ∑X³Y    ∑X²Y²   ∑XY³    ∑X²Y    ∑XY²    ]   [ B ]   [ ∑XY ]
    [ ∑X²Y²   ∑XY³    ∑Y⁴     ∑XY²    ∑Y³     ] * [ C ] = [ ∑Y² ]
    [ ∑X³     ∑X²Y    ∑XY²    ∑X²     ∑XY     ]   [ D ]   [ ∑X  ]
    [ ∑X²Y    ∑XY²    ∑Y³     ∑XY     ∑Y²     ]   [ E ]   [ ∑Y  ]

Note in the 5x5 matrix M that M{i,j}==M{j,i}, so that's computationally easy.

To do the best fit in Excel, list the measured X and Y, and calculate the sums, then use
    MMULT(MINVERSE(A1:E5),G1:G5) => column vector of the ℙaram space

For doing a gradient descent to get from an arbitrary (x,y) to some point P on the conic section,
the ddX(x,y) and ddY(x,y) will give the gradient with respect to the coordinates, and err2/-ddX or
err2/-ddY will give the coordinate adjustment needed to get closer to the local minimum for err2,
which should be on the conic section.  This allows a computationally-expensive way of doing the
graphing of an implicit conic section.  (My Excel experiments showed that with clamping the deltaX
and deltaY to +/-1, it usually converged to the point within 5-10 gradient-descent steps.)  You
graph a point (x,y) if the err2 is within some threshold of 0.

But with this, I should be able to best-fit any conic section I run across.

_______________


update: I was realizing that maybe the reason it didn't have "lonely" terms was because I had tried
    Ax² + Bxy + Cy² + Dx + Ey + F = 0

But if I instead use
    Ax² + Bxy + Cy² + Dx + Ey + F = 1

Then the -1 should give me "lonely" terms again.  Use the same GEOGEBRA, except with +F and add ddF(x,y):

__GEOGEBRA__
err2(x, y)=Expand((A x^(2)+B x y+C y^(2)+D x+E y + F - 1)^(2))
...
ddF(x,y)=Derivative(Expand(Simplify(err2(x,y)/2))=0,F)
__END__

Convert their output to a matrix:

    [ ∑X⁴     ∑X³Y    ∑X²Y²   ∑X³     ∑X²Y    ∑X²     ]   [ A ]   [ ∑X² ]
    [ ∑X³Y    ∑X²Y²   ∑XY³    ∑X²Y    ∑XY²    ∑XY     ]   [ B ]   [ ∑XY ]
    [ ∑X²Y²   ∑XY³    ∑Y⁴     ∑XY²    ∑Y³     ∑Y²     ] * [ C ] = [ ∑Y² ]
    [ ∑X³     ∑X²Y    ∑XY²    ∑X²     ∑XY     ∑X      ]   [ D ]   [ ∑X  ]
    [ ∑X²Y    ∑XY²    ∑Y³     ∑XY     ∑Y²     ∑Y      ]   [ E ]   [ ∑Y  ]
    [ ∑X²     ∑XY     ∑Y²     ∑X      ∑Y      ∑1      ]   [ F ]   [ ∑1  ]

It's still M{i,j}==M{j,i}, so still easy to compute.
The Excel best-fit will just be 6x6 instead of 5x5.

This should be able to show me F as well, now, so I don't always have the downscaled version.
And if I really want
    Ax² + Bxy + Cy² + Dx + Ey + F = 0
then I can just subtract 1 from the F I calculated above.

image translate/rotate/skew best fit

    OUT = [ a b c ]   [ x1 x2 x3 ... xN ] = [ v1 v2 v3 ... vN ]
          [ d e f ] * [ y1 y2 y3 ... yN ]   [ w1 w2 w3 ... wN ]
          [ 0 0 1 ]   [ 1  1  1  ... 1  ]   [ 1  1  1  ... 1  ]

So if you want to figure out the expanded M, you want to best fit on OUT = M * IN (with the extra rows)

To figure out the best fit, you basically have the equations

    v_i = a*Xi + b*Yi + c*1
    w_i = d*Xi + e*Yi + f*1

Best fit is least-sum-squared-error, so
    err_v_i = a*Xi + b*Yi + c*1 - Vi
    err_w_i = d*Xi + e*Yi + f*1 - Wi

And the sq error
    esq_Vi = a²*Xi² + b²*Yi² + c²*1 + Vi² + 2ab*Xi*Yi + 2ac*Xi - 2a*Xi*Vi + 2bc*Yi - 2b*Yi*Vi - 2c*Vi
    esq_Wi = exact same form (I will just show derivation on the Vi terms for now)

Sum up the total error:
    SSEv = ∑(a²*Xi² + b²*Yi² + c²*1 + Vi² + 2ab*Xi*Yi + 2ac*Xi - 2a*Xi*Vi + 2bc*Yi - 2b*Yi*Vi - 2c*Vi)
         = ∑(a²*Xi²) + ∑(b²*Yi²) + ∑(c²*1) + ∑(Vi²) + ∑(2ab*Xi*Yi) + ∑(2ac*Xi) - ∑(2a*Xi*Vi) + ∑(2bc*Yi) - ∑(2b*Yi*Vi) - ∑(2c*Vi)
         = a²*∑(Xi²) + b²*∑(Yi²) + c²*∑(1) + ∑(Vi²) + 2ab*∑(Xi*Yi) + 2ac*∑(Xi) - 2a*∑(Xi*Vi) + 2bc*∑(Yi) - 2b*∑(Yi*Vi) - 2c*∑(Vi)

Need to find the minimum with respect to each of a, b, c (and equivalently d,e,f) by setting each partial derivative to 0
    dSSEv/da = 2a*∑(Xi²) + 0 + 0 + 0 + 2b*∑(Xi*Yi) + 2c*∑(Xi) - 2*∑(Xi*Vi) + 0 - 0 - 0 = 0
    dSSEv/db = 0 + 2b*∑(Yi²) + 0 + 0 + 2a*∑(Xi*Yi) + 0 - 0 + 2c*∑(Yi) - 2*∑(Yi*Vi) - 0 = 0
    dSSEv/dc = 0 + 0 + 2c*∑(1) + 0 + 0 + 2a*∑(Xi) - 0 + 2b*∑(Yi) - 0 - 2*∑(Vi) = 0

Divide by two and rearrange those three, grouping into a,b,c on one side and coefficientless on the other
    dSSEv/da: a*∑(Xi²)   + b*∑(Xi*Yi) + c*∑(Xi) = ∑(Xi*Vi)
    dSSEv/db: a*∑(Xi*Yi) + b*∑(Yi²)   + c*∑(Yi) = ∑(Yi*Vi)
    dSSEv/dc: a*∑(Xi)    + b*∑(Yi)    + c*∑(1)  = ∑(Vi)

But that is a matrix multiplication:
    [ ∑(Xi²)        ∑(Xi*Yi)        ∑(Xi) ] [ a ] = [ ∑(Xi*Vi) ]
    [ ∑(Xi*Yi)      ∑(Yi²)          ∑(Yi) ] [ b ] = [ ∑(Yi*Vi) ]
    [ ∑(Xi)         ∑(Yi)           ∑(1)  ] [ c ] = [ ∑(Vi)    ]
And similarly from the esq_Wi, you get:
    [ ∑(Xi²)        ∑(Xi*Yi)        ∑(Xi) ] [ d ] = [ ∑(Xi*Wi) ]
    [ ∑(Xi*Yi)      ∑(Yi²)          ∑(Yi) ] [ e ] = [ ∑(Yi*Wi) ]
    [ ∑(Xi)         ∑(Yi)           ∑(1)  ] [ f ] = [ ∑(Wi)    ]
Note that the matrix on the left is the same for both.  So we can merge those two into a single matrix multiplication
    [ ∑(Xi²)        ∑(Xi*Yi)        ∑(Xi) ] [ a d ] = [ ∑(Xi*Vi)    ∑(Xi*Wi) ]
    [ ∑(Xi*Yi)      ∑(Yi²)          ∑(Yi) ] [ b e ] = [ ∑(Yi*Vi)    ∑(Yi*Wi) ]
    [ ∑(Xi)         ∑(Yi)           ∑(1)  ] [ c f ] = [ ∑(Vi)       ∑(Wi)    ]

If we call those G x P = Q, then we can solve for the parameter matrix P by P = inv(G) x Q

But you can get the original abcdef001 matrix by transposing P and adding a row of (001)

Then to find the blue outputs, just do

    BLUE_OUT = abcdef001 * BLUE_IN

camera matrix best-fit

Y = C x X, where Y is the [3x1] extended 2d output, and X is the [4x1] extended 3d input,
but using X and Y are confusing when dealing with real space

Translate that to XYZ for input and RU for output (right and up, in screen coordinates)
[ R ]   [a b c d]   [ X ]
[ U ] = [e f g h] . [ Y ]
[ 1 ]   [0 0 1 0]   [ Z ]
                    [ 1 ]

R = a*X + b*Y + c*Z + d
U = e*X + f*Y + g*Z + h

errx = a*X + b*Y + c*Z + d - R
erry = e*X + f*Y + g*Z + h - U

esqx = a²*X² + b²*Y² + c²*Z² + d² + R² + 2ab*XY + 2ac*XZ + 2ad*X - 2a*XR + 2bc*YZ + 2bd*Y - 2b*YR + 2cd*Z - 2c*ZR - 2d*R
similar for esqy

technically all the terms with X,Y,Z,R,U have ∑ in front of them, but to save typing, will just apply it at the end

desqx/da = 2a*X² + 0 + 0 + 0 + 0 + 2b*XY + 2c*XZ + 2d*X - 2*XR + 0 + 0 + 0 + 0 + 0 - 0 = 0
desqx/db = 0 + 2b*Y² + 0 + 0 + 0 + 2a*XY + 0 + 0 - 0 + 2c*YZ + 2b*Y - 2*YR + 0 + 0 - 0 = 0
desqx/dc = 0 + 0 + 2c*Z² + 0 + 0 + 0 + 2a*XZ + 0 - 0 + 2b*YZ + 0 + 0 + 2d*Z - 2*ZR - 0 = 0
desqx/dd = 0 + 0 + 0 + 2d + 0 + 0 + 0 + 2a*X - 0 + 0 + 2b*Y + 0 + 2c*Z + 0 - 2*R = 0

desqx/da = 2a*X² + 2b*XY + 2c*XZ + 2d*X = 2*XR
desqx/db = 2b*Y² + 2a*XY + 2c*YZ + 2b*Y = 2*YR
desqx/dc = 2c*Z² + 2a*XZ + 2b*YZ + 2d*Z = 2*ZR
desqx/dd = 2d + 2a*X + 2b*Y + 2c*Z = 2*R

desqx/da: a*X² + b*XY + c*XZ + d*X = XR
desqx/db: a*XY + b*Y² + c*YZ + b*Y = YR
desqx/dc: a*XZ + b*YZ + c*Z² + d*Z = ZR
desqx/dd: a*X  + b*Y  + c*Z  + d*1 = R

[ X²   XY   XZ   X ] [a] = [ XR ]
[ XY   Y²   YZ   Y ] [b] = [ YR ]
[ XZ   YZ   Z²   Z ] [c] = [ ZR ]
[ X    Y    Z    1 ] [d] = [  R ]

Extend for the equivalent on efgh, and mark the sums correctly

[ ∑X²   ∑XY   ∑XZ   ∑X ] [ a e ] = [ ∑XR ∑XU ]
[ ∑XY   ∑Y²   ∑YZ   ∑Y ] [ b f ] = [ ∑YR ∑YU ]
[ ∑XZ   ∑YZ   ∑Z²   ∑Z ] [ c g ] = [ ∑ZR ∑ZU ]
[ ∑X    ∑Y    ∑Z    ∑1 ] [ d h ] = [ ∑R  ∑U  ]

M x P = S

Do the various X Y Z R U sums for M and S, then solve with P = inv(M) x S

Then C = transpose(P)->append(ones(1))

exponential best fit

Option Explicit
' excel: LINEST() = array formula for best-fit line for
'       fn = m_[n]*x_[n] + m_[n-1]*x_[n-1] + ... m_1*x_1 + b
'   POLYNOMIAL: to get best-fit polynomial, LINEST(ydata, xdata^{1,2,3,...n}), where xdata must be single-column
' excel: LOGEST(): fn = m_[n]^(x_[n]) * ... * m_[1]^(x_[1]) * b
'
' for fitting fn = a * exp(b*x), most will recommend
'       fnl     = log(fn)
'               = log( a * exp(b*x) )
'               = log( a ) + log( exp(b*x) )
'               = log( a ) + b*x
'               = A' + b*x
'   then you can just fit LINEST( log(y), log(x) ), where b = slope(), A' = offset(), and a = exp(A')
'       => note, this is equivalent to LOGEST, because b*m^x = b*exp(log(m)*x), but LOGEST brings the a back to the right value
'
' However, neither LOGEST() nor LINEST() methods work right for an exponential with dc offset:
'       fn = a + b * exp(cx)    (ie, an exponential with dc offset)
'           or
'       fn = a + b * base^x     (where base = exp(c))
'
' Hence, my EXPEST(), which (after some googling) does one linear approximation to get the exponential constant (or, equivalently, base)
'   and then using that, a second linear approximation to get the amplitude (b) and offset (a)
'

Function expest(yData As Range, xData As Range) As Variant
    ' [Array Function]: Estimate y = a*exp(m*x) + b using "best fit" methodology
    '
    '   row#:   data columns
    '   1:      ampl:a , slpe:m , offs:b    best-fit parameter results
    '   2:      sea, sem, seb   [FUTURE]    Standard errors of the input parameters
    '   3:      r^2, sey        [FUTURE]    correlation factor, std err for y-estimator
    '   4:      F, df           [FUTURE]    f-statistic, degrees of freedom
    '   5:      ssreg, ssresid  [FUTURE]    regression sum-squares, residual sum-squares
    '                                           ssresid = sum( (y_est_i - y_data_i)^2 )
    '                                           sstotal = sum( (y_data_i - avg_y)^2 )
    '                                           ssreg   = sum( (y_est_i - avg_y)^2 ) = sstotal - ssresid
    '                                           r^2     = ssreg / sstotal
    '
    ' _similar_ to LINEST ( y = m1*x1 + b ) or LOGEST ( y = (m1^x1)*b )
    ' but using y = a + b*exp(c*x)
    '       for now, NOT going to handle slpeiple columns of X data
    ' REFERENCES:
    ' * Return Array: SEE http://www.cpearson.com/excel/returningarraysfromvba.aspx
    ' * somewhat-best-fit for a + b*e^(cx):
    '           => NOTE that this function is using nomenclature b + a*exp(m*x)
    '       https://stackoverflow.com/questions/3938042/fitting-exponential-decay-with-no-initial-guessing
    '       https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales
    ' * Residuals and Diagnostics: http://www.informit.com/articles/article.aspx?p=2019171
    '   -- gives pure-excel formulas to mimic each of the elements of LINEST(), so is helping me make equivalent
    '   -- taught me DEVSQ():
    '       akin to SUMSQ(range) == SUM( range^2 ),
    '       there is DEVSQ(range) == SUMSQ( range - average(range) ) == SUM( (range - average(range))^2 )
    '
    ' I did my best to keep things parallel to the LINEST(), but I make no guarantees as to the statistical validity of anything
    '   * HIGHEST CONFIDENCE: ampl, slpe, offs          => these are as the fr.scribd document derived, so quite confident
    '   * HIGH CONFIDENCE: r^2, sey, ssreg, ssresid     => I understood these derivations better, and cannot see a reason they wouldn't be analogous
    '   * MEDIUM CONFIDENCE: F, df                      => I see no reason for the analog to the imformit derivation to not be analogous
    '   * LOWEST CONFIDENCE: se_ampl, se_slpe, se_offs  => these require the SSCP, and I don't know how they might influence each other...
    '       for the example dataset I derived from, se_slpe was bigger than I expected... but maybe it's right, because of the assumptions of the pseudo-integral

    Dim output() As Variant, x As Variant, y As Variant

    If xData.Columns.Count > 1 Then
        expest = "#ERR! Only one column of X (for now)"
        Exit Function
    End If
    Dim orows, ocols
    orows = 5
    ocols = 3
    ReDim output(1 To orows, 1 To ocols)

    Dim ny, nx, k
    ny = yData.Rows.Count
    nx = xData.Rows.Count
    If ny <> nx Then
        expest = "#ERR! X and Y are not the same size"
        Exit Function
    End If

    ' the integral only works if data is sorted by x...
    '   S.O. has a single-function QuickSort at https://stackoverflow.com/a/152325/5508606
    '       I will adapt this for sorting both the X and Y
    x = xData.value
    y = yData.value
    Call QuickSort2d(x, y)
    Dim colIdx
    colIdx = LBound(x, 2)

    ' do the first fit to find the slope 'slpe' (the constant power of the exponential)
    Dim integral, sum_dx, sum_dy, sum_dx_s, sum_dy_s, sum_dx_dx, sum_dy_dx, sum_s_s
    integral = 0
    sum_dx = 0
    sum_dy = 0
    sum_dx_s = 0
    sum_dy_s = 0
    sum_dx_dx = 0
    sum_dy_dx = 0
    sum_s_s = 0
    For k = 1 To nx
        Dim dx, dy
        If k = 1 Then
            integral = 0
        Else
            integral = integral + 0.5 * (y(k, colIdx) + y(k - 1, colIdx)) * (x(k, colIdx) - x(k - 1, colIdx)) ' += avg(y)*delta(x)
        End If

        dx = x(k, colIdx) - x(1, colIdx)
        dy = y(k, colIdx) - y(1, colIdx)

        sum_s_s = sum_s_s + integral * integral

        sum_dx = sum_dx + dx
        sum_dx_s = sum_dx_s + dx * integral
        sum_dx_dx = sum_dx_dx + dx * dx

        sum_dy = sum_dy + dy
        sum_dy_s = sum_dy_s + dy * integral
        sum_dy_dx = sum_dy_dx + dy * dx

    Next k

    ' Using ANS = MMult( MINVERSE(MAT), VEC) from VBA
    '   see https://stackoverflow.com/a/7307992/5508606
    Dim MAT(1 To 2, 1 To 2), VEC(1 To 2, 1 To 1), ANS As Variant
    Dim I1 As Variant, I2 As Variant
    MAT(1, 1) = sum_dx_dx
    MAT(1, 2) = sum_dx_s
    MAT(2, 1) = sum_dx_s
    MAT(2, 2) = sum_s_s
    VEC(1, 1) = sum_dy_dx
    VEC(2, 1) = sum_dy_s
    I1 = WorksheetFunction.MInverse(MAT)
    ANS = WorksheetFunction.MMult(I1, VEC)

    ' offs + ampl*exp(slpe*x)
    ' S.O.: d = IM11 * V1 + IM12 * V2   ' ignored
    ' S.O.: c = IM21 * V1 + IM22 * V2   ' SO:c => 'slpe'

    Dim offs, ampl, slpe
    slpe = ANS(2, 1)

    ' second matrix
    Dim sum_ec1x, sum_ec2x, sum_y, sum_y_ec1x
    For k = 1 To nx
        Dim ec1x, ec2x
        ec1x = Exp(slpe * 1 * x(k, colIdx))
        ec2x = Exp(slpe * 2 * x(k, colIdx))

        sum_ec1x = sum_ec1x + ec1x
        sum_ec2x = sum_ec2x + ec2x
        sum_y = sum_y + y(k, colIdx)
        sum_y_ec1x = sum_y_ec1x + y(k, colIdx) * ec1x
    Next k
    MAT(1, 1) = nx
    MAT(1, 2) = sum_ec1x
    MAT(2, 1) = sum_ec1x
    MAT(2, 2) = sum_ec2x
    VEC(1, 1) = sum_y
    VEC(2, 1) = sum_y_ec1x
    I2 = WorksheetFunction.MInverse(MAT)
    ANS = WorksheetFunction.MMult(I2, VEC)

    ' S.O.: a = IM11 * V1 + IM12 * V2   ' SO:a => 'offs'
    ' S.O.: b = IM21 * V1 + IM22 * V2   ' SO:b => 'ampl'
    offs = ANS(1, 1)
    ampl = ANS(2, 1)

    ' DF = #datapoints - (#variables+1)                         ' LINEST().df           ' degrees of freedom of the RESIDUAL
    '   in LINEST(), the degrees of freedom (df) is the number of data points,
    '   minus the TOTAL number of variables, which is really the number of obvious x colums,
    '       plus the implied 1s column for the DC offset.
    '   Thus, for EXPEST(), since the X is limited to 1 column, it should be
    '       nx - (1+1) = nx-2
    Dim DF As Long
    DF = nx - 2         ' TODO = if I ever allow multiple X columns, edit this formula

    ' In the second-matrix calcs, need to get
    '   my nomenclature                                         ' LINEST() nomenclature
    '   SSE = Sum((y_i - fn(x_i)) ^ 2)                          ' ssresid               ' sum of squares residual
    '   SST = SUM( (y_i - avg(y_*)^2 )                          ' sstotal               ' sum of squares total
    '   SSR = SST - SSE                                         ' ssreg                 ' sum of squares regression
    Dim avg_y, SSE, SSR, SST, yfn
    avg_y = sum_y / ny
    SSE = 0     ' sum( (y_data-yfn)^2  ) = EXCEL.DEVSQ(predicted-data)
    SSR = 0     ' sum( (yfn - avg_y)^2 ) = EXCEL.DEVSQ(predicted)
    SST = 0     ' sum( (y_data-avg_y)^2) = EXCEL.DEVSQ(data)
    For k = 1 To ny
        yfn = ampl * Exp(slpe * x(k, colIdx)) + offs
        SSE = SSE + (y(k, colIdx) - yfn) ^ 2
        'SSR = SSR + (yfn - avg_y) ^ 2
        SST = SST + (y(k, colIdx) - avg_y) ^ 2
    Next k
    SSR = SST - SSE

    '   my nomenclature                                         ' LINEST() nomenclature
    '   RSQ = 1 - SSE / SST = SSR / SST                         '                       ' correlation factor
    '   SE_Y = sqr( SSE / DF )                                  ' sqrt(ssresid/df)      ' standard error of the estimate
    Dim RSQ, SE_Y
    RSQ = 1 - SSE / SST         ' == SSR / SST
    SE_Y = Sqr(SSE / DF)

    ' for figuring out the F ratio, need a different degrees of freedom:
    '   total number of variables (again, number of X columns, including the 1s DC-offset column) minus 1
    '   so here, (1+1)-1 = 1 = dfx
    ' There are two ways of calculating the F_RATIO: one involves the RSQ, the other the sum-squares (SSR and SSE), but are mathematically equivalent
    '       F_RATIO_RSQ = (RSQ/dfx) / ( (1-RSQ)/DF )
    '                     (SSR/SST/dfx) / (1-(1-SSE/SST))/DF)
    '                     ((SSR/SST)/dfx) / ( (SSE/SST) / DF )  ' subst SSR/SST on left, and 1-SSE/SST on right
    '                     ((SSR/---)/dfx) / ( (SSE/---) / DF )  ' cancel the common SST
    '       F_RATIO_SS  = (SSR/dfx) / (SSE/DF)
    Dim dfx, F_RATIO
    dfx = 1
    F_RATIO = (SSR / dfx) / (SSE / DF)

    ' the last (the "Standard errors" for row 2 of the output)
    ' are the most complicated... still trying to understand the inform.it description...
    ' However, it starts with the MSResid (Mean Squared Residual), which is what I'd call the MSE
    Dim MSE
    MSE = SSE / DF                                              ' msresid = ssresid / df
    ' For linear regression, it's the GRID=MInverse( MMult( TRANSPOSE(X), X ) ) * MSE,
    '   then the se_param are the sqr(GRID(1,1)), sqr(GRID(2,2)), ... sqr(GRID(n,n))
    ' The problem is, for the non-linear version, I just don't see what the equivalent would be,
    '   because I don't know what the 'X' would be for each param...
    ' The "second matrix" somewhat implies that for the offset and amplitude,
    '   the X are 1s (offset) and exp(slpe*x)
    ' Compare the form of the first matrix to the second
    '   second              first
    '   M22 = sum(exp^2)    M22 = sum(integral^2)
    '   M11 = sum(1^2)      M11 = sum((x-x1)^2)
    '   M12 = sum(1*exp)    M12 = sum((x-x1)*integral)
    ' Note that in the second, the Xi data is the element that's squared in Mii: so Xs1 = 1, Xs2 = exp()...
    ' Thus, I would say in the first, Xf1 = (xj-x1) and Xf2 = (integral)
    ' so, I need to build XMERGE, where
    '   XMERGE(col:1) = x_r - x_1
    '   XMERGE(col:2) = integral
    '   XMERGE(col:3) = 1
    '   XMERGE(col:4) = exp(slpe*x)
    ' Then
    '   SSCP = MMULT( TRANSPOSE( XMERGE ), XMERGE )
    '   ISCP = INVERSE(SSCP)
    '   MSCP = ISCP * MSE
    '   se_ignored = sqr(MSCP(1,1))
    '   se_slpe    = sqr(MSCP(2,2))
    '   se_offs    = sqr(MSCP(3,3))
    '   se_ampl    = sqr(MSCP(4,4))
    ' TODO = when mimicing the inform.it figure 7, see if the order of columns matters for XMERGE ... it shouldn't
    ' TODO = if I've understood things correctly, then see if it makes sense when I implement it for this one
    '   also, it might be good to compare the results to splitting it into two SSCP, just like the
    '   the original matrixes were...
    '   Oh, wait... that might be even better, because the ISCP is what the MINVERSE(MAT) is in both the 2x2 matrixes...
    '   yeah, so if I saved a copy of I1 = INVERSE(MAT#1) and I2 = INVERSE(MAT#2), then I could get MSCP = (I1,I2)*MSE
    ' I think that's how I'll do it, without the XMERGE version
    '       sqr(MSE*I1(1,1)) = se_ignored      sqr(MSE*I1(2,2)) = se_slpe
    '       sqr(MSE*I2(1,1)) = se_offs         sqr(MSE*I2(2,2)) = se_ampl
    ' => that does _something_, but it shows that the slpe error is pretty big (se=0.49 vs slpe=-3.88 = 13% in the S.O. data)

    ' populate the output structure
    output(1, 1) = ampl
    output(1, 2) = slpe
    output(1, 3) = offs

    output(2, 1) = Sqr(MSE * I2(2, 2))                          ' se_ampl
    output(2, 2) = Sqr(MSE * I1(2, 2))                          ' se_slpe
    output(2, 3) = Sqr(MSE * I2(1, 1))                          ' se_offs

    output(3, 1) = RSQ
    output(3, 2) = SE_Y
    output(3, 3) = " - "

    output(4, 1) = F_RATIO
    output(4, 2) = DF
    output(4, 3) = " - "

    output(5, 1) = SSR                                          ' ssreg = SSR = SST - SSE
    output(5, 2) = SSE                                          ' ssresid = SSE
    output(5, 3) = " - "

    expest = output
End Function

bézier curve best fit

Cubic Bézier Curve: (1-t)³⋅P0 + 3⋅(1-t)²⋅t¹⋅P1 + 3⋅(1-t)¹⋅t²⋅P2 + t³⋅P3
Tᵢ are the inputs
Xᵢ are the outputs
Pₙ are the control points that I'm trying to solve for (the "parameters")
Mᵢₙ are the multipliers, which are a function of Tᵢ
    Mᵢ₀=1⋅(1-Tᵢ)³
    Mᵢ₁=3⋅(1-Tᵢ)²⋅Tᵢ¹
    Mᵢ₂=3⋅(1-Tᵢ)¹⋅Tᵢ²
    Mᵢ₃=1        ⋅Tᵢ³

(Skipping summation notation for now)

Eᵢ = Xᵢ - (1-Tᵢ)³⋅P₀ - (1-Tᵢ)²⋅Tᵢ¹⋅P₁ - (1-Tᵢ)¹⋅Tᵢ²⋅P₂ - Tᵢ³⋅P₃
Eᵢ = Xᵢ - Mᵢ₀⋅P₀ - Mᵢ₁⋅P₁ - Mᵢ₂⋅P₂ - Mᵢ₃⋅P₃
Eᵢ² = Xᵢ² + Mᵢ₀²⋅P₀² + Mᵢ₁²⋅P₁² + Mᵢ₂²⋅P₂² + Mᵢ₃²⋅P₃²
          - 2⋅Xᵢ⋅(Mᵢ₀⋅P₀ + Mᵢ₁⋅P₁ + Mᵢ₂⋅P₂ + Mᵢ₃⋅P₃)
          + 2⋅(Mᵢ₀⋅P₀⋅Mᵢ₁⋅P₁ + Mᵢ₀⋅P₀⋅Mᵢ₂⋅P₂ + Mᵢ₀⋅P₀⋅Mᵢ₃⋅P₃ + Mᵢ₁⋅P₁⋅Mᵢ₂⋅P₂ + Mᵢ₁⋅P₁⋅Mᵢ₃⋅P₃ + Mᵢ₂⋅P₂⋅Mᵢ₃⋅P₃)
S = ∑Eᵢ²
dS/dP₀ = 0 = 2⋅(Mᵢ₀²⋅P₀ - Xᵢ⋅Mᵢ₀              + Mᵢ₀⋅Mᵢ₁⋅P₁ + Mᵢ₀⋅Mᵢ₂⋅P₂ + Mᵢ₀⋅Mᵢ₃⋅P₃)
dS/dP₁ = 0 = 2⋅(Mᵢ₁²⋅P₁ - Xᵢ⋅Mᵢ₁ + Mᵢ₀⋅P₀⋅Mᵢ₁              + Mᵢ₁⋅Mᵢ₂⋅P₂ + Mᵢ₁⋅Mᵢ₃⋅P₃)
dS/dP₂ = 0 = 2⋅(Mᵢ₂²⋅P₂ - Xᵢ⋅Mᵢ₂ + Mᵢ₀⋅P₀⋅Mᵢ₂ + Mᵢ₁⋅P₁⋅Mᵢ₂              + Mᵢ₂⋅Mᵢ₃⋅P₃)
dS/dP₃ = 0 = 2⋅(Mᵢ₃²⋅P₃ - Xᵢ⋅Mᵢ₃ + Mᵢ₀⋅P₀⋅Mᵢ₃ + Mᵢ₁⋅P₁⋅Mᵢ₃ + Mᵢ₂⋅P₂⋅Mᵢ₃             )

Clean up into matrix form and add the summation symbols
    [ ∑Mᵢ₀²         ∑Mᵢ₀⋅Mᵢ₁        ∑Mᵢ₀⋅Mᵢ₂        ∑Mᵢ₀⋅Mᵢ₃ ] ⋅ [ P₀ ] = [ ∑Xᵢ⋅Mᵢ₀ ]
    [ ∑Mᵢ₀⋅Mᵢ₁      ∑Mᵢ₁²           ∑Mᵢ₁⋅Mᵢ₂        ∑Mᵢ₁⋅Mᵢ₃ ]   [ P₁ ]   [ ∑Xᵢ⋅Mᵢ₁ ]
    [ ∑Mᵢ₀⋅Mᵢ₂      ∑Mᵢ₁⋅Mᵢ₂        ∑Mᵢ₂²           ∑Mᵢ₂⋅Mᵢ₃ ]   [ P₂ ]   [ ∑Xᵢ⋅Mᵢ₂ ]
    [ ∑Mᵢ₀⋅Mᵢ₃      ∑Mᵢ₁⋅Mᵢ₃        ∑Mᵢ₂⋅Mᵢ₃        ∑Mᵢ₃²    ]   [ P₃ ]   [ ∑Xᵢ⋅Mᵢ₃ ]

Since the matrix A on the left is a square, and given column vectors p for the parameters and q for the RHS of the matrix equation:
    A⋅p = q
Then solve for p using:
    p = A⁻¹⋅q

(And note, to solve for the x and y coordinates of Pₙ simultaneously, just change the p and q to two-column vectors with the x in the first column, y in the second.  Matrix algebra is great!)

If you want the quadratic Bézier curve instead, just remove the Mᵢ₃ column and row, and set the coefficients as follows:
    Mᵢ₀=1⋅(1-Tᵢ)²
    Mᵢ₁=2⋅(1-Tᵢ)¹⋅Tᵢ¹
    Mᵢ₂=1        ⋅Tᵢ²

That's all great... except if trying to do it from an image, you don't have the Tᵢ for a given point.

When I tried to estimate Tᵢ based on the proportional distance along the curve (summing the √(ΔXᵢ²+ΔYᵢ²), and doing partial-sum divided by total-sum), the Tᵢ estimators were really bad, and it didn't get very close to the right p solution.

For quadratic, the tangent line to each point (Xᵢ,Yᵢ) should hit the baseline at P₀ + t⋅(P₁-P₀), so I should be able to solve:

    Xᵢ + s⋅dXᵢ = Px₀ + t⋅(Px₁-Px₀)
    Yᵢ + s⋅dYᵢ = Py₀ + t⋅(Py₁-Py₀)

    [ dXᵢ   -(Px₁-Px₀) ] ⋅ [ s ] = [ Px₀-Xᵢ ]
    [ dYᵢ   -(Py₁-Py₀) ]   [ t ] = [ Py₀-Yᵢ ]

So if I solve for s and t, then plug in the equation for t into all the above formulas, I could (in theory) translate the t into something dependent only on the control points and the input variables. But since Pₙ have to stay variables, it would need to be algebraic solution rather than st = []⁻¹⋅eq, which would be a lot of if statements for solving. I'm not sure I could make that work in the best-fit solution.

And the cubic adds an extra layer of complexity, because the tangent lines actually hit an implied control line, and then those hit the real control line -- and I don't think I could solve that one.


Studying the question and answers in this SO discussion, I have thought of an iterative approach:

- Make a guess for bezier G(t) - Loop - Set ΔPₖₓ=0, ΔPₖᵧ=0 - For each ᵢ of the N samples - use the iterative ClosestPoint algorithm to find the {tᵢ ⇒ [Gₓ(tᵢ), Gᵧ(tᵢ)]} such that the point is closest to [Sₓᵢ,Sᵧᵢ], and determine the squared-distance (Dᵢ²) αᵢ = (1-tᵢ)³⋅P₀ₓ + 3⋅(1-tᵢ)²⋅tᵢ⋅P₁ₓ + 3⋅(1-tᵢ)⋅tᵢ²⋅P₂ₓ + tᵢ³⋅P₃ₓ - Sₓᵢ βᵢ = (1-tᵢ)³⋅P₀ᵧ + 3⋅(1-tᵢ)²⋅tᵢ⋅P₁ᵧ + 3⋅(1-tᵢ)⋅tᵢ²⋅P₂ᵧ + tᵢ³⋅P₃ᵧ - Sᵧᵢ Dᵢ² = αᵢ² + βᵢ² - Update each ΔPₖ based on the error in x or y divided by the partial derivative of Dᵢ² w/r/t Pₖ ΔP₀ₓ += (Sₓᵢ-Gₓ(tᵢ)) / (2⋅αᵢ⋅(1-3tᵢ+3tᵢ²-tᵢ³)) ΔP₀ᵧ += (Sᵧᵢ-Gᵧ(tᵢ)) / (2⋅βᵢ⋅(1-3tᵢ+3tᵢ²-tᵢ³)) ΔP₁ₓ += (Sₓᵢ-Gₓ(tᵢ)) / (6⋅αᵢ⋅( tᵢ-2tᵢ²+tᵢ³)) ΔP₁ᵧ += (Sᵧᵢ-Gᵧ(tᵢ)) / (6⋅βᵢ⋅( tᵢ-2tᵢ²+tᵢ³)) ΔP₂ₓ += (Sₓᵢ-Gₓ(tᵢ)) / (6⋅αᵢ⋅( tᵢ²-tᵢ³)) ΔP₂ᵧ += (Sᵧᵢ-Gᵧ(tᵢ)) / (6⋅βᵢ⋅( tᵢ²-tᵢ³)) ΔP₃ₓ += (Sₓᵢ-Gₓ(tᵢ)) / (2⋅αᵢ⋅( tᵢ³)) ΔP₃ᵧ += (Sᵧᵢ-Gᵧ(tᵢ)) / (2⋅βᵢ⋅( tᵢ³)) - Update Pₖ += [ΔPₖₓ,ΔPₖᵧ] - if ∑D²ᵢ<ε, exit loop

But that has divide-by-0 errors at tᵢ=0. I remember from ANN that the backpropagation of errors has the activation slope in the numerator instead of the denominator, which always confuses me. Looking at the Wiki:Backprop discussion, I think I've figured it out. Translating it into nomenclature similar to what I'm using here (and dropping the x-or-y and i subscripts):

Q(t) = ∑{mₖ(t)⋅Pₖ} EQ1 Q is essentially the output of the neuron (wiki called it o_j, as the output of the jth layer) mₖ is acting as the "input" for the kth weight (wiki called it oᵢ, as the ith output of previous layer, or xᵢ for the first layer) where mₖ(t): m₀(t) = 1-3tᵢ+3tᵢ²-tᵢ³ m₁(t) = tᵢ-2tᵢ²+tᵢ³ m₂(t) = tᵢ²-tᵢ³ m₃(t) = tᵢ³ D(t) = Q(t) - Sᵢ EQ2 where Sᵢ is the ith sample, which is the target (expected) value of the network L = 0.5⋅D²(t) EQ3 where L is the loss function; the 0.5 multiplier is useful for the eventual partial derivatives. What we want to find is the change in the loss with respect to the weight (parameter): ∂L/∂Pₖ Using the chain rule, ∂L/∂Pₖ = [∂L/∂Q]⋅[∂Q/∂Pₖ] EQ4 Starting in the right partial ∂Q/∂Pₖ notation: ∂Q/∂Pₖ == {∂/∂Pₖ}∘Q to do the "functional" form of derivative ∂Q(t)/∂Pₖ = {∂/∂Pₖ}∘∑ₗ{mₗ(t)⋅Pₗ} using l as the summation index = ∑{{∂/∂Pₖ}∘mₗ(t)⋅Pₗ} the partial can move inside the sum = ∑{mₗ(t)⋅{∂/∂Pₖ}∘Pₗ} the mₖ(t) doesn't change with Pₗ, so move it out of partial note: ∂Pₗ/∂Pₖ = 0 for l<>k and 1 for l==k, so only need the l==k term instead of a sum: ∂Q(t)/∂Pₖ = mₖ(t) simplified per note (above) Substituting that into EQ4 ∂L/∂Pₖ = [∂L/∂Q]⋅mₖ(t) EQ5 For the left partial ∂L/∂Q: ∂L/∂Q = {∂/∂Q}∘L = {∂/∂Q}∘(0.5⋅D²(t)) substitute from EQ3 = 0.5⋅{∂/∂Q}∘(D²(t)) linearity = 0.5⋅{2⋅D(t)⋅∂D/∂Q} power rule = D(t)⋅{∂/∂Q}∘D this is why we set 0.5⋅ in EQ3 = D(t)⋅{∂/∂Q}∘(Q-Sᵢ) substitue EQ2 = D(t)⋅[∂Q/∂Q-∂Sᵢ/∂Q] separate = D(t)⋅[1-0] simplify = D(t) simplify = Q-Sᵢ substitute EQ2 again ∂L/∂Q = Q - Sᵢ Substituting that into EQ5 ∂L/∂Pₖ = (Q-Sᵢ)⋅mₖ(t) Finally, we want the change in weight ΔPₖ to _decrease_ the loss function L, so adjustment needs to reverse the sign of the derivative. They scale by a learning rate η to keep ΔPₖ = -η⋅∂L/∂Pₖ = -η⋅[(Q-Sᵢ)⋅mₖ(t)] (The learning rate is needed to keep the weights from over-adjusting and escaping to infinity) (That last step is still handwavy to me, because the "units" aren't right... but if η is chosen small enough, then it's taking us in the right direction and the bad magnitude is irrelevant)

excel best fit

In Excel, you could of course use their best-fit routines, but where's the fun in that? (Besides, it's limited to linear and a few others.)

For the square matrix A, with column vectors b (parameters) and y (results):

    A⋅b = y

Then solve for b using

    b = A⁻¹⋅y

Unfortunately, for best-fit purposes, your input data (or input data augmented by the column of 1s for offset term) rarely is square (and when it is, you solve not for the best-fit but for the exact solution, using the technique above).

However, https://people.revoledu.com/kardi/tutorial/Regression/GeneralizedInverse.html reminded me of the workaround (which I know I've found before, but have forgotten, because it's not here):

AtA is a square matrix. And multiply both sides of an equation by the same thing doesn't change the equality, so:

    At⋅A⋅b = At⋅y
    (At⋅A)⋅b = At⋅y

To solve for b, multiply both sides by the inverse of that parenthetical term:

    (At⋅A)⁻¹⋅(At⋅A)⋅b = (At⋅A)⁻¹⋅At⋅y

The left includes multiplying something by its inverse, which is the identity, so:

    b = (At⋅A)⁻¹⋅At⋅y

The article goes on to show that the At⋅A and At⋅y are just the normal ∑x-style terms in the standard best-fit derivations above, showing that it works in the normal case.

So, in Excel, if you want to best-fit for some polynomial, y = ∑j(bj⋅xj), just make columns for xⁿ and 1 for the A matrix:

    xⁿ  ... x²  x¹  1   |   y
    _       _   *   _   |   _

Then use matrix formulas of the right dimension for

    L = MMULT(MINVERSE(MMULT(TRANSPOSE(A),A)),TRANSPOSE(A))
    solve for b = MMULT(L, y)