No Free Lunch定理
定理(No Free Lunch): 假定 A \mathcal{A} A是一個在域 X \mathcal{X} X的二分類任務中任意一個機器學習演算法,其損失函式為 0 - 1 0\text{-}1 0-1損失,令 n n n是一個大小為 ∣ X ∣ / 2 |\mathcal{X}|/2 ∣X∣/2的訓練集,存在域 X \mathcal{X} X中的分布 D \mathcal{D} D,則有
(1)存在一個函式 f : X → { 0 , 1 } f:\mathcal{X}\rightarrow \{0,1\} f:X→{0,1},且有 L D ( f ) = 0 L_\mathcal{D}(f)=0 LD?(f)=0,
(2)對于子列 S ~ D n \mathcal{S\sim D}^n S~Dn,則概率不等式 P ( L D ( A ( S ) ) ≥ 1 / 8 : S ~ D n ) ≥ 1 / 7 P(L_\mathcal{D}(\mathcal{A}(\mathcal{S}))\ge 1/8: \mathcal{S}\sim\mathcal{D}^n)\ge 1/7 P(LD?(A(S))≥1/8:S~Dn)≥1/7成立,
證明:
(1) 令
C
\mathcal{C}
C表示域
X
\mathcal{X}
X中大小為
2
n
2n
2n的一個子集,主要的證明思路是只利用資料集
C
\mathcal{C}
C一半的資料樣本點并不能給出剩下一半資料點的資訊, 假定
H
\mathcal{H}
H表示資料集
C
\mathcal{C}
C到標簽集合
{
0
,
1
}
\{0,1\}
{0,1}所有可能的函式集合,且
T
T
T表示的是函式集合的基數,其中
H
=
{
f
1
,
?
?
,
f
T
}
\mathcal{H}=\{f_1,\cdots,f_T\}
H={f1?,?,fT?},
T
=
2
2
n
T=2^{2n}
T=22n,對于
H
\mathcal{H}
H中每一個函式假設,令
D
i
\mathcal{D}_i
Di?是
C
×
{
0
,
1
}
\mathcal{C}\times\{0,1\}
C×{0,1}中的分布
D
i
(
{
(
x
,
y
)
}
)
=
{
1
/
2
m
i
f
y
=
f
i
(
x
)
0
o
t
h
e
r
w
i
s
e
\mathcal{D}_i(\{(x,y)\})=\left\{\begin{array}{ll}1/2m & \mathrm{if}\text{ } y=f_i(x)\\0& \mathrm{otherwise}\end{array}\right.
Di?({(x,y)})={1/2m0?if y=fi?(x)otherwise?進而可知存在函式
f
i
f_i
fi?,在資料分布
D
i
\mathcal{D}_i
Di?上則有
L
D
i
(
f
i
)
=
0
L_{\mathcal{D}_i}(f_i)=0
LDi??(fi?)=0,
(2)主要證明的關鍵在于即對任意的學習演算法
A
\mathcal{A}
A有
max
?
i
∈
[
T
]
E
S
~
D
i
n
[
L
D
i
(
A
(
S
)
)
]
≥
1
/
4
\max\limits_{i \in [T]}\mathbb{E}_{\mathcal{S}\sim \mathcal{D}_i^n}[L_{\mathcal{D}_i}(\mathcal{A}(\mathcal{S}))]\ge 1 / 4
i∈[T]max?ES~Din??[LDi??(A(S))]≥1/4首先從
C
×
{
0
,
1
}
\mathcal{C}\times \{0,1\}
C×{0,1}中采樣出
n
n
n個樣本構造一個訓練集,其中采樣出的樣本可以重復,進而可知有
k
=
(
2
n
)
n
k=(2n)^n
k=(2n)n中可能的樣本序列,令這些樣本序列分別表示為
S
1
,
S
2
,
?
?
,
S
k
\mathcal{S}_1,\mathcal{S}_2,\cdots,\mathcal{S}_k
S1?,S2?,?,Sk?,
S
j
i
=
(
(
x
1
,
f
i
(
x
1
)
)
,
?
?
,
(
x
n
,
f
i
(
x
n
)
)
)
\mathcal{S}_j^i=((x_1,f_i(x_1)),\cdots,(x_n,f_i(x_n)))
Sji?=((x1?,fi?(x1?)),?,(xn?,fi?(xn?)))表示的是函式
f
j
f_{j}
fj?在樣本序列
S
j
S_j
Sj?中的資料集合,則有
E
S
~
D
i
n
[
L
D
i
(
A
(
S
)
)
]
=
1
k
∑
j
=
1
k
L
D
i
(
A
(
S
j
i
)
)
\mathbb{E}_{\mathcal{S}\sim \mathcal{D}_i^n}[L_{\mathcal{D}_i}(\mathcal{A}(\mathcal{S}))]=\frac{1}{k}\sum\limits_{j=1}^k L_{\mathcal{D}_i}(\mathcal{A}(\mathcal{S}^i_j))
ES~Din??[LDi??(A(S))]=k1?j=1∑k?LDi??(A(Sji?))又因為
"
m
a
x
i
m
u
m
"
≥
"
a
v
e
r
a
g
e
"
≥
"
m
i
n
i
m
u
m
"
\mathrm{"maximum"}\ge \mathrm{"average"}\ge \mathrm{"minimum"}
"maximum"≥"average"≥"minimum",所以則有
max
?
i
∈
[
T
]
1
T
∑
j
=
1
k
L
D
i
(
A
(
S
j
i
)
)
≥
1
T
∑
i
=
1
T
1
k
∑
j
=
1
k
L
D
i
(
A
(
S
j
i
)
)
=
1
k
∑
j
=
1
k
1
T
∑
i
=
1
T
L
D
i
(
A
(
S
j
i
)
)
≥
min
?
j
∈
[
k
]
1
T
∑
i
=
1
T
L
D
i
(
A
(
S
j
i
)
)
\begin{aligned}\max\limits_{i\in [T]}\frac{1}{T}\sum\limits_{j=1}^k L_{\mathcal{D}_i}(\mathcal{A}(\mathcal{S}_j^i))&\ge \frac{1}{T}\sum\limits_{i=1}^T\frac{1}{k}\sum\limits_{j=1}^kL_{\mathcal{D}_i}(\mathcal{A(\mathcal{S}_j^i)})\\&=\frac{1}{k}\sum\limits_{j=1}^k\frac{1}{T}\sum\limits_{i=1}^T L_{\mathcal{D}_i}(\mathcal{A}(\mathcal{S}^i_j))\\& \ge \min\limits_{j \in [k]}\frac{1}{T}\sum\limits_{i=1}^T L_{\mathcal{D}_i}(\mathcal{A}(\mathcal{S}_j^i))\end{aligned}
i∈[T]max?T1?j=1∑k?LDi??(A(Sji?))?≥T1?i=1∑T?k1?j=1∑k?LDi??(A(Sji?))=k1?j=1∑k?T1?i=1∑T?LDi??(A(Sji?))≥j∈[k]min?T1?i=1∑T?LDi??(A(Sji?))?現固定
j
∈
[
k
]
j \in [k]
j∈[k],令
S
j
=
{
x
1
,
?
?
,
x
n
}
\mathcal{S}_j=\{x_1,\cdots,x_n\}
Sj?={x1?,?,xn?},
C
\
S
j
=
{
v
1
,
?
?
,
v
p
}
C \backslash \mathcal{S}_j=\{v_1,\cdots,v_p\}
C\Sj?={v1?,?,vp?},其中
p
≥
n
p\ge n
p≥n是剩余沒有采樣的樣本數,對于每一個函式
h
:
C
→
{
0
,
1
}
h:C \rightarrow\{0,1\}
h:C→{0,1},
i
∈
[
T
]
i\in[T]
i∈[T]有
L
D
i
(
h
)
=
1
2
n
∑
x
∈
C
I
(
h
(
x
)
≠
f
i
(
x
)
)
=
1
2
n
(
∑
l
=
1
n
I
(
h
(
x
l
)
≠
f
i
(
x
l
)
)
+
∑
r
=
1
p
I
(
h
(
v
r
)
≠
f
i
(
v
r
)
)
)
≥
1
2
n
∑
r
=
1
p
I
(
h
(
v
r
)
≠
f
i
(
v
r
)
)
≥
1
2
p
∑
r
=
1
p
I
(
h
(
v
r
)
≠
f
i
(
v
r
)
)
\begin{aligned}L_{\mathcal{D}_i}(h)&=\frac{1}{2n}\sum\limits_{x \in C}\mathbb{I}(h(x)\ne f_i(x))\\&=\frac{1}{2n}\left(\sum\limits_{l=1}^n \mathbb{I}(h(x_l)\ne f_i(x_l))+\sum\limits_{r=1}^p \mathbb{I}(h(v_r)\ne f_i(v_r))\right)\\&\ge \frac{1}{2n}\sum\limits_{r=1}^p \mathbb{I}(h(v_r)\ne f_i(v_r))\\&\ge \frac{1}{2p}\sum\limits_{r=1}^{p}\mathbb{I}(h(v_r)\ne f_i(v_r))\end{aligned}
LDi??(h)?=2n1?x∈C∑?I(h(x)?=fi?(x))=2n1?(l=1∑n?I(h(xl?)?=fi?(xl?))+r=1∑p?I(h(vr?)?=fi?(vr?)))≥2n1?r=1∑p?I(h(vr?)?=fi?(vr?))≥2p1?r=1∑p?I(h(vr?)?=fi?(vr?))?所以
1
T
∑
i
=
1
T
L
D
i
(
A
(
S
j
i
)
)
≥
1
T
∑
i
=
1
T
1
2
p
∑
r
=
1
p
I
(
A
(
S
j
i
)
(
v
r
)
≠
f
i
(
v
r
)
)
=
1
2
p
∑
r
=
1
p
1
T
∑
i
=
1
T
I
(
A
(
S
j
i
)
(
v
r
)
≠
f
i
(
v
r
)
)
≥
1
2
min
?
r
∈
[
p
]
1
T
∑
i
=
1
T
I
(
A
(
S
j
i
)
(
v
r
)
≠
f
i
(
v
r
)
)
\begin{aligned}\frac{1}{T}\sum\limits_{i=1}^T L_{\mathcal{D}_i}(\mathcal{A}(\mathcal{S}_j^i))& \ge \frac{1}{T}\sum\limits_{i=1}^T\frac{1}{2p}\sum\limits_{r=1}^p \mathbb{I}(\mathcal{A(\mathcal{S}^i_j)(v_r)\ne f_i(v_r)})\\&=\frac{1}{2p}\sum\limits_{r=1}^p\frac{1}{T}\sum\limits_{i=1}^T \mathbb{I}(\mathcal{A}(\mathcal{S}^i_j)(v_r)\ne f_i(v_r))\\&\ge \frac{1}{2}\min\limits_{r \in [p]}\frac{1}{T}\sum\limits_{i=1}^T\mathbb{I}(\mathcal{A}(\mathcal{S}_j^i)(v_r)\ne f_i(v_r))\end{aligned}
T1?i=1∑T?LDi??(A(Sji?))?≥T1?i=1∑T?2p1?r=1∑p?I(A(Sji?)(vr?)?=fi?(vr?))=2p1?r=1∑p?T1?i=1∑T?I(A(Sji?)(vr?)?=fi?(vr?))≥21?r∈[p]min?T1?i=1∑T?I(A(Sji?)(vr?)?=fi?(vr?))?對于給定的
r
∈
[
p
]
r\in [p]
r∈[p],因為
T
T
T是所有可能函式映射的基數,所以總有成對存在的
a
,
b
∈
[
T
]
a,b\in [T]
a,b∈[T]有
I
(
A
(
S
j
i
)
(
v
r
)
≠
f
a
(
v
r
)
)
+
I
(
A
(
S
j
i
)
(
v
r
)
≠
f
b
(
v
r
)
)
=
1
\mathbb{I}(\mathcal{A}(\mathcal{S}^i_j)(v_r)\ne f_a(v_r))+\mathbb{I}(\mathcal{A}(\mathcal{S}^i_j)(v_r)\ne f_b(v_r))=1
I(A(Sji?)(vr?)?=fa?(vr?))+I(A(Sji?)(vr?)?=fb?(vr?))=1進而則有
E
S
~
D
i
n
[
L
D
i
(
A
(
S
)
)
]
=
e
1
T
∑
i
=
1
T
L
D
i
(
A
(
S
j
i
)
)
≥
1
2
min
?
r
∈
[
p
]
1
T
∑
i
=
1
T
I
(
A
(
S
j
i
)
(
v
r
)
≠
f
i
(
v
r
)
)
=
1
2
?
1
T
?
T
2
=
1
4
\begin{aligned}\mathbb{E}_{\mathcal{S}\sim \mathcal{D}_i^n}[L_{\mathcal{D}_i}(\mathcal{A}(\mathcal{S}))] &= e \frac{1}{T}\sum\limits_{i=1}^T L_{\mathcal{D}_i}(\mathcal{A}(\mathcal{S}^i_j))\\&\ge \frac{1}{2}\min\limits_{r \in [p]}\frac{1}{T}\sum\limits_{i=1}^T \mathbb{I}(\mathcal{A}(\mathcal{S}^i_j)(v_r)\ne f_i(v_r))\\&=\frac{1}{2}\cdot \frac{1}{T}\cdot \frac{T}{2}=\frac{1}{4}\end{aligned}
ES~Din??[LDi??(A(S))]?=eT1?i=1∑T?LDi??(A(Sji?))≥21?r∈[p]min?T1?i=1∑T?I(A(Sji?)(vr?)?=fi?(vr?))=21??T1??2T?=41??根據馬爾可夫不等式的修改版可知,給定一個隨機變數
X
∈
[
0
,
1
]
X \in[0,1]
X∈[0,1],給定一個常數
a
∈
[
0
,
1
]
a\in [0,1]
a∈[0,1],進而則有
E
[
X
]
=
∫
0
1
x
p
(
x
)
d
x
=
∫
0
a
x
p
(
x
)
d
x
+
∫
a
1
x
p
(
x
)
d
x
≤
a
∫
0
a
p
(
x
)
d
x
+
∫
a
1
p
(
x
)
d
x
=
a
(
1
?
p
{
X
≥
a
}
)
+
p
{
X
≥
a
}
=
a
+
(
1
?
a
)
P
{
X
≥
a
}
\begin{aligned}\mathbb{E}[X]&=\int_0^1 x p(x)dx\\&= \int_0^a x p(x)dx + \int_a^1xp(x)dx\\&\le a \int_0^a p(x)dx + \int^1_a p(x)dx\\&=a(1-p\{X\ge a\})+p\{X\ge a\}\\&=a+(1-a)P\{X\ge a\}\end{aligned}
E[X]?=∫01?xp(x)dx=∫0a?xp(x)dx+∫a1?xp(x)dx≤a∫0a?p(x)dx+∫a1?p(x)dx=a(1?p{X≥a})+p{X≥a}=a+(1?a)P{X≥a}?馬爾可夫不等式為
P
{
X
≥
a
}
≥
E
[
X
]
?
a
1
?
a
P\{X\ge a\}\ge \frac{\mathbb{E}[X]-a}{1-a}
P{X≥a}≥1?aE[X]?a?利用馬爾可夫不等可知
P
(
L
D
(
A
(
S
)
)
≥
1
/
8
:
S
~
D
n
)
=
E
S
~
D
i
n
[
L
D
i
(
A
(
S
)
)
]
?
1
/
8
1
?
1
/
8
≥
1
/
4
?
1
/
8
1
?
1
/
8
=
1
7
\begin{aligned}P(L_\mathcal{D}(\mathcal{A}(\mathcal{S}))\ge 1/8: \mathcal{S}\sim\mathcal{D}^n)&=\frac{\mathbb{E}_{\mathcal{S}\sim \mathcal{D}_i^n}[L_{\mathcal{D}_i}(\mathcal{A}(\mathcal{S}))]-1/8}{1-1/8}\\& \ge \frac{1/4-1/8}{1-1/8}=\frac{1}{7}\end{aligned}
P(LD?(A(S))≥1/8:S~Dn)?=1?1/8ES~Din??[LDi??(A(S))]?1/8?≥1?1/81/4?1/8?=71??
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/431472.html
標籤:AI
上一篇:數字視頻及應用
