线性回归
线性回归是线性模型的一种典型方法。产品销量预测、岗位薪资预测,都可以用先线性回归来拟合模型。从某种程度上来说,回归分析不再局限于线性回归这一具体模型和算法,更包含了广泛的由自变量到因变量的机器学习建模思想。
原理
给定一组由输入 x x x 和输出 y y y 构成的数据集 D = ( x 1 , y 1 ) , … , ( x k , y k ) D={(x_1,y_1),\dots,(x_k,y_k)} D=(x1,y1),…,(xk,yk),其中 x i x_i xi 为参数集合。线性回归就是通过不断训练从而得到一个线性模型去尽可能的根据输入 x x x 拟合出 y y y 值。
以 x x x 为影响因素, y y y 为输出结果,构建回归式: y = w x i + b y=wx_i+b y=wxi+b,其中 w w w 为模型矩阵。
线性回归模型的关键问题是确定
w
w
w 和
b
b
b 的值,使得拟合输出
y
y
y 与真实值
y
i
y_i
yi 尽可能接近。在回归任务中,我们通常使用均方误差来度量预测与标签值之间的损失,所以回归任务的优化目标就是使得拟合输出和真实输出之间的均方误差最小化。
f
(
w
∗
,
b
∗
)
=
a
r
g
m
i
n
∑
i
=
1
k
(
y
−
y
i
)
2
=
a
r
g
m
i
n
∑
i
=
1
k
(
w
x
i
+
b
−
y
i
)
2
(1)
\begin{aligned} f(w^*,b^*)&=argmin\sum_{i=1}^{k}(y-y_i)^2 \\ &=argmin\sum_{i=1}^{k}(wx_i+b-y_i)^2 \end{aligned} \tag{1}
f(w∗,b∗)=argmini=1∑k(y−yi)2=argmini=1∑k(wxi+b−yi)2(1)
为求的
w
w
w 和
b
b
b 的最小化参数
w
∗
w^*
w∗ 和
b
∗
b^*
b∗,可从式
(
1
)
(1)
(1),分别对
w
w
w 和
b
b
b 求一阶导为零。
对
w
w
w 求导
∂
f
(
w
,
b
)
∂
w
=
∂
∂
w
[
∑
i
=
1
k
(
w
x
i
+
b
−
y
i
)
2
]
=
∑
i
=
1
k
∂
∂
w
[
(
y
i
−
w
x
i
−
b
)
2
]
=
∑
i
=
1
k
[
2
⋅
(
y
i
−
w
x
i
−
b
)
⋅
(
−
x
i
)
]
=
∑
i
=
1
k
[
2
⋅
(
w
x
i
2
−
y
i
x
i
+
b
x
i
)
]
=
2
⋅
(
w
∑
i
=
1
k
x
i
2
−
∑
i
=
1
k
y
i
x
i
+
∑
i
=
1
k
b
x
i
)
(2)
\begin{aligned} \frac{\partial f(w,b)}{\partial w}&=\frac{\partial}{\partial w}\Bigg[\sum_{i=1}^{k}(wx_i+b-y_i)^2\Bigg] \\ &=\sum_{i=1}^{k}\frac{\partial}{\partial w}\Big[(y_i-wx_i-b)^2\Big] \\ &=\sum_{i=1}^{k}\big[2\cdot (y_i-wx_i-b)\cdot(-x_i)\big] \\ &=\sum_{i=1}^{k}\Big[2\cdot (wx_i^2-y_ix_i+bx_i)\Big] \\ &=2\cdot\Bigg(w\sum_{i=1}^{k}x_i^2-\sum_{i=1}^{k}y_ix_i+\sum_{i=1}^{k}bx_i\Bigg) \end{aligned} \tag{2}
∂w∂f(w,b)=∂w∂[i=1∑k(wxi+b−yi)2]=i=1∑k∂w∂[(yi−wxi−b)2]=i=1∑k[2⋅(yi−wxi−b)⋅(−xi)]=i=1∑k[2⋅(wxi2−yixi+bxi)]=2⋅(wi=1∑kxi2−i=1∑kyixi+i=1∑kbxi)(2)
对
b
b
b 求导
∂
f
(
w
,
b
)
∂
b
=
∂
∂
b
[
∑
i
=
1
k
(
w
x
i
+
b
−
y
i
)
2
]
=
∑
i
=
1
k
∂
∂
b
[
(
y
i
−
w
x
i
−
b
)
2
]
=
∑
i
=
1
k
[
2
⋅
(
y
i
−
w
x
i
−
b
)
⋅
(
−
1
)
]
=
∑
i
=
1
k
[
2
⋅
(
b
−
y
i
+
w
x
i
)
]
=
2
⋅
(
∑
i
=
1
k
b
−
∑
i
=
1
k
y
i
+
∑
i
=
1
k
w
x
i
)
=
2
⋅
(
k
b
−
∑
i
=
1
k
(
y
i
−
w
x
i
)
)
(3)
\begin{aligned} \frac{\partial f(w,b)}{\partial b}&=\frac{\partial}{\partial b}\Bigg[\sum_{i=1}^{k}(wx_i+b-y_i)^2\Bigg] \\ &=\sum_{i=1}^{k}\frac{\partial}{\partial b}\Big[(y_i-wx_i-b)^2\Big] \\ &=\sum_{i=1}^{k}\big[2\cdot (y_i-wx_i-b)\cdot(-1)\big] \\ &=\sum_{i=1}^{k}\Big[2\cdot (b-y_i+wx_i)\Big] \\ &=2\cdot\Bigg(\sum_{i=1}^{k}b-\sum_{i=1}^{k}y_i+\sum_{i=1}^{k}wx_i\Bigg) \\ &=2\cdot\Bigg(kb-\sum_{i=1}^{k}(y_i-wx_i)\Bigg) \end{aligned} \tag{3}
∂b∂f(w,b)=∂b∂[i=1∑k(wxi+b−yi)2]=i=1∑k∂b∂[(yi−wxi−b)2]=i=1∑k[2⋅(yi−wxi−b)⋅(−1)]=i=1∑k[2⋅(b−yi+wxi)]=2⋅(i=1∑kb−i=1∑kyi+i=1∑kwxi)=2⋅(kb−i=1∑k(yi−wxi))(3)
令式
(
2
)
(2)
(2) 为零
∂
f
(
w
,
b
)
∂
w
=
2
⋅
(
w
∑
i
=
1
k
x
i
2
−
∑
i
=
1
k
y
i
x
i
+
∑
i
=
1
k
b
x
i
)
=
0
⇔
w
∑
i
=
1
k
x
i
2
=
∑
i
=
1
k
y
i
x
i
−
∑
i
=
1
k
b
x
i
(4)
\begin{aligned} \frac{\partial f(w,b)}{\partial w}& =2\cdot\Bigg(w\sum_{i=1}^{k}x_i^2-\sum_{i=1}^{k}y_ix_i+\sum_{i=1}^{k}bx_i\Bigg)=0 \\ &\Leftrightarrow w\sum_{i=1}^{k}x_i^2=\sum_{i=1}^{k}y_ix_i-\sum_{i=1}^{k}bx_i \end{aligned} \tag{4}
∂w∂f(w,b)=2⋅(wi=1∑kxi2−i=1∑kyixi+i=1∑kbxi)=0⇔wi=1∑kxi2=i=1∑kyixi−i=1∑kbxi(4)
令式
(
3
)
(3)
(3) 为零
∂
f
(
w
,
b
)
∂
b
=
2
⋅
(
k
b
−
∑
i
=
1
k
(
y
i
−
w
x
i
)
)
=
0
⇔
b
=
1
k
∑
i
=
1
k
(
y
i
−
w
x
i
)
(5)
\begin{aligned} \frac{\partial f(w,b)}{\partial b}&=2\cdot\Bigg(kb-\sum_{i=1}^{k}(y_i-wx_i)\Bigg)=0 \\ &\Leftrightarrow b=\frac{1}{k}\sum_{i=1}^{k}(y_i-wx_i) \end{aligned} \tag{5}
∂b∂f(w,b)=2⋅(kb−i=1∑k(yi−wxi))=0⇔b=k1i=1∑k(yi−wxi)(5)
又因为:
1
k
∑
i
=
1
k
y
i
=
y
ˉ
1
k
∑
i
=
1
k
x
i
=
x
ˉ
(6)
\frac{1}{k}\sum_{i=1}^{k}y_i=\bar{y} \\\frac{1}{k}\sum_{i=1}^{k}x_i=\bar{x} \tag{6}
k1i=1∑kyi=yˉk1i=1∑kxi=xˉ(6)
所以:
b
=
y
ˉ
−
w
x
ˉ
(7)
b=\bar{y}-w\bar{x} \tag{7}
b=yˉ−wxˉ(7)
将
(
7
)
(7)
(7) 代入
(
4
)
(4)
(4) 得
w
∑
i
=
1
k
x
i
2
=
∑
i
=
1
k
y
i
x
i
−
∑
i
=
1
k
b
x
i
w
∑
i
=
1
k
x
i
2
=
∑
i
=
1
k
y
i
x
i
−
∑
i
=
1
k
(
y
ˉ
−
w
x
ˉ
)
x
i
w
∑
i
=
1
k
x
i
2
=
∑
i
=
1
k
y
i
x
i
−
y
ˉ
∑
i
=
1
k
x
i
+
w
x
ˉ
∑
i
=
1
k
x
i
w
(
∑
i
=
1
k
x
i
2
−
x
ˉ
∑
i
=
1
k
x
i
)
=
∑
i
=
1
k
y
i
x
i
−
y
ˉ
∑
i
=
1
k
x
i
w
=
∑
i
=
1
k
y
i
x
i
−
y
ˉ
∑
i
=
1
k
x
i
∑
i
=
1
k
x
i
2
−
x
ˉ
∑
i
=
1
k
x
i
(8)
\begin{aligned} &w\sum_{i=1}^{k}x_i^2=\sum_{i=1}^{k}y_ix_i-\sum_{i=1}^{k}bx_i \\ &w\sum_{i=1}^{k}x_i^2=\sum_{i=1}^{k}y_ix_i-\sum_{i=1}^{k}(\bar{y}-w\bar{x})x_i \\ &w\sum_{i=1}^{k}x_i^2=\sum_{i=1}^{k}y_ix_i-\bar{y}\sum_{i=1}^{k}x_i+w\bar{x}\sum_{i=1}^{k}x_i \\ &w\Bigg(\sum_{i=1}^{k}x_i^2-\bar{x}\sum_{i=1}^{k}x_i\Bigg)=\sum_{i=1}^{k}y_ix_i-\bar{y}\sum_{i=1}^{k}x_i \\ &w=\frac{\sum_{i=1}^{k}y_ix_i-\bar{y}\sum_{i=1}^{k}x_i}{\sum_{i=1}^{k}x_i^2-\bar{x}\sum_{i=1}^{k}x_i} \end{aligned} \tag{8}
wi=1∑kxi2=i=1∑kyixi−i=1∑kbxiwi=1∑kxi2=i=1∑kyixi−i=1∑k(yˉ−wxˉ)xiwi=1∑kxi2=i=1∑kyixi−yˉi=1∑kxi+wxˉi=1∑kxiw(i=1∑kxi2−xˉi=1∑kxi)=i=1∑kyixi−yˉi=1∑kxiw=∑i=1kxi2−xˉ∑i=1kxi∑i=1kyixi−yˉ∑i=1kxi(8)
又因为:
y
ˉ
∑
i
=
1
k
x
i
=
1
m
∑
i
=
1
k
y
i
∑
i
=
1
k
x
i
=
x
ˉ
∑
i
=
1
k
y
i
x
ˉ
∑
i
=
1
k
x
i
=
1
m
∑
i
=
1
k
x
i
∑
i
=
1
k
x
i
=
1
m
(
∑
i
=
1
k
y
i
)
2
(9)
\bar{y}\sum_{i=1}^{k}x_i=\frac{1}{m}\sum_{i=1}^{k}y_i\sum_{i=1}^{k}x_i=\bar{x}\sum_{i=1}^{k}y_i \\ \bar{x}\sum_{i=1}^{k}x_i=\frac{1}{m}\sum_{i=1}^{k}x_i\sum_{i=1}^{k}x_i=\frac{1}{m}\Bigg(\sum_{i=1}^{k}y_i\Bigg)^2 \tag{9}
yˉi=1∑kxi=m1i=1∑kyii=1∑kxi=xˉi=1∑kyixˉi=1∑kxi=m1i=1∑kxii=1∑kxi=m1(i=1∑kyi)2(9)
所以:
w
=
∑
i
=
1
k
y
i
(
x
i
−
x
ˉ
)
∑
i
=
1
k
x
i
2
−
1
m
(
∑
i
=
1
k
y
i
)
2
(10)
w=\frac{\sum_{i=1}^{k}y_i(x_i-\bar{x})}{\sum_{i=1}^{k}x_i^2-\frac{1}{m}\Big(\sum_{i=1}^{k}y_i\Big)^2} \tag{10}
w=∑i=1kxi2−m1(∑i=1kyi)2∑i=1kyi(xi−xˉ)(10)
- 将 ( 10 ) (10) (10) 进行向量化
如果想要用 Python 来实现的话, ( 10 ) (10) (10) 中的求和运算只能用循环来实现。但是如果能将上式向量化,也就是转换成矩阵运算的话,就可以用 N u m p y Numpy Numpy 这种专门加速矩阵运算的类库来进行编写。
将
(
9
)
(9)
(9) 代回
(
10
)
(10)
(10)
w
=
∑
i
=
1
k
y
i
(
x
i
−
x
ˉ
)
∑
i
=
1
k
x
i
2
−
x
ˉ
∑
i
=
1
k
x
i
=
∑
i
=
1
k
(
y
i
x
i
−
y
i
x
ˉ
)
∑
i
=
1
k
(
x
i
2
−
x
i
x
ˉ
)
(11)
\begin{aligned} w&=\frac{\sum_{i=1}^{k}y_i(x_i-\bar{x})}{\sum_{i=1}^{k}x_i^2-\bar{x}\sum_{i=1}^{k}x_i} \\ &=\frac{\sum_{i=1}^{k}(y_ix_i-y_i\bar{x})}{\sum_{i=1}^{k}(x_i^2-x_i\bar{x})} \end{aligned} \tag{11}
w=∑i=1kxi2−xˉ∑i=1kxi∑i=1kyi(xi−xˉ)=∑i=1k(xi2−xixˉ)∑i=1k(yixi−yixˉ)(11)
又因为:
y
ˉ
∑
i
=
1
k
x
i
=
x
ˉ
∑
i
=
1
k
y
i
=
∑
i
=
1
k
y
ˉ
x
i
=
∑
i
=
1
k
x
ˉ
y
i
=
k
x
ˉ
y
ˉ
=
∑
i
=
1
k
x
ˉ
y
ˉ
x
ˉ
∑
i
=
1
k
x
i
=
∑
i
=
1
k
x
i
x
ˉ
=
x
ˉ
m
1
k
∑
i
=
1
k
x
i
=
m
x
ˉ
2
=
∑
i
=
1
k
x
ˉ
2
(12)
\bar{y}\sum_{i=1}^{k}x_i=\bar{x}\sum_{i=1}^{k}y_i=\sum_{i=1}^{k}\bar{y}x_i=\sum_{i=1}^{k}\bar{x}y_i=k\bar{x}\bar{y}=\sum_{i=1}^{k}\bar{x}\bar{y} \\ \bar{x}\sum_{i=1}^{k}x_i=\sum_{i=1}^{k}x_i\bar{x}=\bar{x}m\frac{1}{k}\sum_{i=1}^{k}x_i=m\bar{x}^2=\sum_{i=1}^{k}\bar{x}^2 \tag{12}
yˉi=1∑kxi=xˉi=1∑kyi=i=1∑kyˉxi=i=1∑kxˉyi=kxˉyˉ=i=1∑kxˉyˉxˉi=1∑kxi=i=1∑kxixˉ=xˉmk1i=1∑kxi=mxˉ2=i=1∑kxˉ2(12)
所以:
w
=
∑
i
=
1
k
(
y
i
x
i
−
y
i
x
ˉ
−
x
i
y
ˉ
+
x
ˉ
y
ˉ
)
∑
i
=
1
k
(
x
i
2
−
x
i
x
ˉ
−
x
i
x
ˉ
+
x
ˉ
2
)
=
∑
i
=
1
k
(
x
i
−
x
ˉ
)
(
y
i
−
y
ˉ
)
∑
i
=
1
k
(
x
i
−
x
ˉ
)
2
(13)
\begin{aligned} w&=\frac{\sum_{i=1}^{k}(y_ix_i-y_i\bar{x}-x_i\bar{y}+\bar{x}\bar{y})}{\sum_{i=1}^{k}(x_i^2-x_i\bar{x}-x_i\bar{x}+\bar{x}^2)} \\ &=\frac{\sum_{i=1}^{k}(x_i-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^{k}(x_i-\bar{x})^2} \end{aligned} \tag{13}
w=∑i=1k(xi2−xixˉ−xixˉ+xˉ2)∑i=1k(yixi−yixˉ−xiyˉ+xˉyˉ)=∑i=1k(xi−xˉ)2∑i=1k(xi−xˉ)(yi−yˉ)(13)