naru.jpn.com

正規方程式の連立偏微分方程式からの導出

最近 Andrew Ng の機械学習の講座を見ています. 目的関数と最急降下法の説明は丁寧でしたが, 正規方程式(Normal equation)の紹介は唐突感がありました. 正規方程式の表式を, 最急降下法で使われていた連立偏微分方程式から導いてみます.

目的

表式

\[
\theta = (X^{\mathrm{T}} X )^{-1} X^{\mathrm{T}} y
\]

を導く.

前提

目的関数として,

\[
J(\theta) = \frac{1}{2m}\sum_{i=1}^{m}\left( h_{\theta}(x^{(i)}) – y^{(i)} \right)^{2}, ~~~h_{\theta}(x^{(i)}) = \sum_{k=0}^{n}\theta_{k}x_{k}^{(i)}
\]

を考える. \(J(\theta)\) を小さくする \(\theta\) の組を見つけたい. 最急降下法では, \(J(\theta)\) が極小値をとるようなパラメータを探していくのだった. 例えば, \(\theta_{j}\) については,

\[
\frac{\partial J(\theta)}{\partial\theta_{j}} = 0
\]

となるような \(\theta_{j}\) を計算する. 全てのパラメータについて, 同時に極小値をとるような \(\theta\) を見つけたい.

\[
\begin{eqnarray}
\left\{
\begin{array}{l}
\displaystyle \frac{\partial J(\theta)}{\partial\theta_{0}} = 0 \\
\displaystyle \frac{\partial J(\theta)}{\partial\theta_{1}} = 0　\tag{1} \\
\displaystyle \vdots \\
\displaystyle \frac{\partial J(\theta)}{\partial\theta_{n}} = 0
\end{array}
\right.
\end{eqnarray}
\]

偏微分方程式の展開

(1)式で\(\theta_{j}\)について考え, 式を展開していく.

\[
\begin{array}{rcl}
\displaystyle \frac{\partial J(\theta)}{\partial\theta_{j}} &=& \displaystyle \frac{1}{2m}\frac{\partial}{\partial\theta_{j}}\sum_{i=1}^{m}\left( h_{\theta}(x^{(i)}) – y^{(i)} \right)^{2} \\
&=& \displaystyle \frac{1}{2m}\sum_{i=1}^{m} \frac{\partial}{\partial\theta_{j}} \left( h_{\theta}(x^{(i)}) – y^{(i)} \right)^{2} \\
&=& \displaystyle \frac{1}{2m}\sum_{i=1}^{m} 2 \left( h_{\theta}(x^{(i)}) – y^{(i)} \right) x_{j}^{(i)} \\
&=& \displaystyle \frac{1}{m}\sum_{i=1}^{m} \left( h_{\theta}(x^{(i)}) – y^{(i)} \right) x_{j}^{(i)} \\
&=& \displaystyle \frac{1}{m} \left( \underline{\sum_{i=1}^{m} h_{\theta}(x^{(i)}) x_{j}^{(i)}}_{(\beta)} – \underline{\sum_{i=1}^{m} y^{(i)} x_{j}^{(i)}}_{(\alpha)} \right) \tag{2} \\
\end{array}
\]

ただし,

\[
\frac{\partial}{\partial\theta_{j}} h_{\theta}(x^{(i)}) = \frac{\partial}{\partial\theta_{j}} \left( \theta_{0}x_{0} + \theta_{1}x_{1} + \cdots + \theta_{n}x_{n} \right) = x_{j}^{(i)}
\]

なので

\[
\frac{\partial}{\partial\theta_{j}} \left( h_{\theta}(x^{(i)}) – y^{(i)} \right)^{2} = 2 \left( h_{\theta}(x^{(i)}) – y^{(i)} \right) x_{j}^{(i)}
\]

となることを用いている.

(1)式より \(\frac{\partial J(\theta)}{\partial\theta_{j}} = 0\) であるから, \(\theta_{j}\)について

\[
\sum_{i=1}^{m} h_{\theta}(x^{(i)}) x_{j}^{(i)} – \sum_{i=1}^{m} y^{(i)} x_{j}^{(i)} = 0 \tag{3}
\]

が成り立つ.

和の部分の行列表現

(2)式中の下線部 \((\alpha)\) について考える. \(i\) についての和をあらわに書くと.

\[
(\alpha) = \sum_{i=1}^{m} y^{(i)} x_{j}^{(i)} = x_{j}^{1}y^{1} + x_{j}^{2}y^{2} + \cdots + x_{j}^{m}y^{m}
\]

となり, これは行列で

\[
\begin{pmatrix} x_{j}^{1} & x_{j}^{2} & \cdots & x_{j}^{m} \end{pmatrix} \begin{pmatrix} y^{1} \\ y^{2} \\ \vdots \\ y^{m} \end{pmatrix} \tag{4}
\]

と書ける. 下線部 \((\beta)\) についても, 添え字に注意して同様に和を展開していくと,

\[
\begin{array}{rcl}
(\beta) &=& \displaystyle \sum_{i=1}^{m} h_{\theta}(x^{(i)}) x_{j}^{(i)} \\
&=& \displaystyle \sum_{i=1}^{m} \left( \sum_{k=0}^{n}\theta_{k}x_{k}^{(i)} \right) x_{j}^{(i)} \\
&=& \displaystyle \sum_{i=1}^{m} \left( \theta_{0}x_{0}^{(i)} + \theta_{1}x_{1}^{(i)} + \cdots + \theta_{n}x_{n}^{(i)} \right) x_{j}^{(i)} \\
&=& \displaystyle \left( \theta_{0}x_{0}^{(1)} + \theta_{1}x_{1}^{(1)} + \cdots + \theta_{n}x_{n}^{(1)} \right) x_{j}^{(1)} + \left( \theta_{0}x_{0}^{(2)} + \theta_{1}x_{1}^{(2)} + \cdots + \theta_{n}x_{n}^{(2)} \right) x_{j}^{(2)} + \left( \theta_{0}x_{0}^{(m)} + \theta_{1}x_{1}^{(m)} + \cdots + \theta_{n}x_{n}^{(m)} \right) x_{j}^{(m)}
\end{array}
\]

となり, これは行列で

\[
\begin{pmatrix} x_{j}^{(1)} & x_{j}^{(2)} & \cdots & x_{j}^{(m)} \end{pmatrix} \begin{pmatrix} x_{0}^{(1)} & x_{1}^{(1)} & \cdots & x_{n}^{(1)} \\ x_{0}^{(2)} & x_{1}^{(2)} & \cdots & x_{n}^{(2)} \\ \vdots & \ddots & \vdots & \vdots \\ x_{0}^{(m)} & x_{1}^{(m)} & \cdots & x_{n}^{(m)} \end{pmatrix} \begin{pmatrix} \theta_{0} \\ \theta_{1} \\ \vdots \\ \theta_{n} \end{pmatrix} \tag{5}
\]

と書ける.

(3),(4),(5)式から, \(\theta_{j}\)について,

となる.

再び, 連立

\(\theta_{0}\)について考えると, (6)式を参考にして

\[
\begin{pmatrix} x_{0}^{(1)} & x_{0}^{(2)} & \cdots & x_{0}^{(m)} \end{pmatrix} \begin{pmatrix} x_{0}^{(1)} & x_{1}^{(1)} & \cdots & x_{n}^{(1)} \\ x_{0}^{(2)} & x_{1}^{(2)} & \cdots & x_{n}^{(2)} \\ \vdots & \ddots & \vdots & \vdots \\ x_{0}^{(m)} & x_{1}^{(m)} & \cdots & x_{n}^{(m)} \end{pmatrix} \begin{pmatrix} \theta_{0} \\ \theta_{1} \\ \vdots \\ \theta_{n} \end{pmatrix} ~-~
\begin{pmatrix} x_{j}^{1} & x_{j}^{2} & \cdots & x_{j}^{m} \end{pmatrix} \begin{pmatrix} y^{1} \\ y^{2} \\ \vdots \\ y^{m} \end{pmatrix} = 0
\]

であるが, ここで各項の左端の行列の形式を変える.

\[
\begin{pmatrix} x_{0}^{(1)} & x_{0}^{(2)} & \cdots & x_{0}^{(m)} \\ 0 & 0 & \cdots & 0 \\ \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & 0 \end{pmatrix} \begin{pmatrix} x_{0}^{(1)} & x_{1}^{(1)} & \cdots & x_{n}^{(1)} \\ x_{0}^{(2)} & x_{1}^{(2)} & \cdots & x_{n}^{(2)} \\ \vdots & \ddots & \vdots & \vdots \\ x_{0}^{(m)} & x_{1}^{(m)} & \cdots & x_{n}^{(m)} \end{pmatrix} \begin{pmatrix} \theta_{0} \\ \theta_{1} \\ \vdots \\ \theta_{n} \end{pmatrix} ~-~
\begin{pmatrix} x_{0}^{(1)} & x_{0}^{(2)} & \cdots & x_{0}^{(m)} \\ 0 & 0 & \cdots & 0 \\ \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & 0 \end{pmatrix} \begin{pmatrix} y^{1} \\ y^{2} \\ \vdots \\ y^{m} \end{pmatrix} = 0
\]

\(\theta_{1}\), \(\theta_{n}\) についてはそれぞれ以下のようにする.

\[
\begin{pmatrix} 0 & 0 & \cdots & 0 \\ x_{1}^{(1)} & x_{1}^{(2)} & \cdots & x_{1}^{(m)} \\ \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & 0 \end{pmatrix} \begin{pmatrix} x_{0}^{(1)} & x_{1}^{(1)} & \cdots & x_{n}^{(1)} \\ x_{0}^{(2)} & x_{1}^{(2)} & \cdots & x_{n}^{(2)} \\ \vdots & \ddots & \vdots & \vdots \\ x_{0}^{(m)} & x_{1}^{(m)} & \cdots & x_{n}^{(m)} \end{pmatrix} \begin{pmatrix} \theta_{0} \\ \theta_{1} \\ \vdots \\ \theta_{n} \end{pmatrix} ~-~
\begin{pmatrix} 0 & 0 & \cdots & 0 \\ x_{1}^{(1)} & x_{1}^{(2)} & \cdots & x_{1}^{(m)} \\ \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & 0 \end{pmatrix} \begin{pmatrix} y^{1} \\ y^{2} \\ \vdots \\ y^{m} \end{pmatrix} = 0
\]

\[
\begin{pmatrix} 0 & 0 & \cdots & 0 \\ 0 & 0 & \cdots & 0 \\ \vdots & \ddots & \vdots & \vdots \\ x_{n}^{(1)} & x_{n}^{(2)} & \cdots & x_{n}^{(m)} \end{pmatrix} \begin{pmatrix} x_{0}^{(1)} & x_{1}^{(1)} & \cdots & x_{n}^{(1)} \\ x_{0}^{(2)} & x_{1}^{(2)} & \cdots & x_{n}^{(2)} \\ \vdots & \ddots & \vdots & \vdots \\ x_{0}^{(m)} & x_{1}^{(m)} & \cdots & x_{n}^{(m)} \end{pmatrix} \begin{pmatrix} \theta_{0} \\ \theta_{1} \\ \vdots \\ \theta_{n} \end{pmatrix} ~-~
\begin{pmatrix} 0 & 0 & \cdots & 0 \\ 0 & 0 & \cdots & 0 \\ \vdots & \ddots & \vdots & \vdots \\ x_{n}^{(1)} & x_{n}^{(2)} & \cdots & x_{n}^{(m)} \end{pmatrix} \begin{pmatrix} y^{1} \\ y^{2} \\ \vdots \\ y^{m} \end{pmatrix} = 0
\]

これらを連立させて足し合わせることで, 以下の式を得る.

\[
\begin{pmatrix} x_{0}^{(1)} & x_{0}^{(2)} & \cdots & x_{0}^{(m)} \\ x_{1}^{(1)} & x_{1}^{(2)} & \cdots & x_{1}^{(m)} \\ \vdots & \ddots & \vdots & \vdots \\ x_{n}^{(1)} & x_{n}^{(2)} & \cdots & x_{n}^{(m)} \end{pmatrix} \begin{pmatrix} x_{0}^{(1)} & x_{1}^{(1)} & \cdots & x_{n}^{(1)} \\ x_{0}^{(2)} & x_{1}^{(2)} & \cdots & x_{n}^{(2)} \\ \vdots & \ddots & \vdots & \vdots \\ x_{0}^{(m)} & x_{1}^{(m)} & \cdots & x_{n}^{(m)} \end{pmatrix} \begin{pmatrix} \theta_{0} \\ \theta_{1} \\ \vdots \\ \theta_{n} \end{pmatrix} ~-~
\begin{pmatrix} x_{0}^{(1)} & x_{0}^{(2)} & \cdots & x_{0}^{(m)} \\ x_{1}^{(1)} & x_{1}^{(2)} & \cdots & x_{1}^{(m)} \\ \vdots & \ddots & \vdots & \vdots \\ x_{n}^{(1)} & x_{n}^{(2)} & \cdots & x_{n}^{(m)} \end{pmatrix} \begin{pmatrix} y^{1} \\ y^{2} \\ \vdots \\ y^{m} \end{pmatrix} = 0 \tag{7}
\]

正規方程式の表式

\(X\), \(\theta\), \(y\) を定義する.

\[
X = \begin{pmatrix} x_{0}^{(1)} & x_{1}^{(1)} & \cdots & x_{n}^{(1)} \\ x_{0}^{(2)} & x_{1}^{(2)} & \cdots & x_{n}^{(2)} \\ \vdots & \ddots & \vdots & \vdots \\ x_{0}^{(m)} & x_{1}^{(m)} & \cdots & x_{n}^{(m)} \end{pmatrix}
\]

\[
\theta = \begin{pmatrix} \theta_{0} \\ \theta_{1} \\ \vdots \\ \theta_{n} \end{pmatrix}
\]

\[
y = \begin{pmatrix} y^{1} \\ y^{2} \\ \vdots \\ y^{m} \end{pmatrix}
\]

(7)式は以下のように書くことができる.

\[
X^{\mathrm{T}} X \theta – X^{\mathrm{T}} y = 0
\]

\(X^{\mathrm{T}} X\) の(一般)逆行列を左から掛ける.

\[
\theta = (X^{\mathrm{T}} X )^{-1} X^{\mathrm{T}} y
\]

目的の表式を得た.

2017年7月21日数学, 機械学習