贈券收集問題(Coupon collector's problem) 是概率論中的著名題目,其目的在解答以下問題:
- 假設有n種贈券,每種贈券獲取概率相同,而且贈券亦無限供應。若取贈券t張,能集齊n種贈券的概率多少?
計算得出,平均需要
次才能集齊n種贈券——這就是贈券收集問題的時間複雜度。例如n = 50時大約要取
次才能集齊50種贈券。
問題內容[編輯]
贈券收集問題的特徵是開始收集時,可以在短時間內收集多種不同的贈券,但最後數種則要花很長時間才能集齊。例如有50種贈券,在集齊49種以後要約多50次收集才能找到最後一張,所以贈券收集問題的答案t的期望值要比50要大得多。
計算期望值[編輯]
假設T是收集所有n種贈券的次數,
是在收集了第i-1種贈券以後,到收集到第i種贈券所花的次數,那麼T和
都是隨機變量。在收集到i-1種贈券後能再找到「新」一種贈券的概率是
,所以
是一種幾何分佈,並有期望值
。根據期望值的線性性質,
![{\displaystyle {\begin{aligned}\operatorname {E} (T)&=\operatorname {E} (t_{1})+\operatorname {E} (t_{2})+\cdots +\operatorname {E} (t_{n})={\frac {1}{p_{1}}}+{\frac {1}{p_{2}}}+\cdots +{\frac {1}{p_{n}}}\\&={\frac {n}{n}}+{\frac {n}{n-1}}+\cdots +{\frac {n}{1}}=n\cdot \left({\frac {1}{1}}+{\frac {1}{2}}+\cdots +{\frac {1}{n}}\right)\,=\,n\cdot H_{n}.\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1471cc4d6faa603ae4897328d2fcd81a357267a5)
其中
是調和數,根據其近似值,可化約為:
![{\displaystyle \operatorname {E} (T)=n\cdot H_{n}=n\ln n+\gamma n+{\frac {1}{2}}+o(1),\ \ {\text{as}}\ n\to \infty ,}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8ea10c71563d1cf9b3c84a84ba4abd64ecc1b94f)
其中
是歐拉-馬歇羅尼常數.
那麼,可用馬爾可夫不等式求取概率的上限:
![{\displaystyle \operatorname {P} (T\geq c\,nH_{n})\leq {\frac {1}{c}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/33284bbd7a6b9c72d6701d1ca80d925e1915fe46)
基於
相互獨立的特性,則有:
![{\displaystyle {\begin{aligned}\operatorname {Var} (T)&=\operatorname {Var} (t_{1})+\operatorname {Var} (t_{2})+\cdots +\operatorname {Var} (t_{n})\\&={\frac {1-p_{1}}{p_{1}^{2}}}+{\frac {1-p_{2}}{p_{2}^{2}}}+\cdots +{\frac {1-p_{n}}{p_{n}^{2}}}\\&\leq {\frac {n^{2}}{n^{2}}}+{\frac {n^{2}}{(n-1)^{2}}}+\cdots +{\frac {n^{2}}{1^{2}}}\\&\leq n^{2}\cdot \left({\frac {1}{1^{2}}}+{\frac {1}{2^{2}}}+\cdots \right)={\frac {\pi ^{2}}{6}}n^{2}\leq 2n^{2},\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7ea28a784d7819d40843e6f55ca4054d5291095a)
最末一行的等式來自黎曼ζ函數的巴塞爾問題。此式繼而可用柴比雪夫不等式求取概率上限:
![{\displaystyle \operatorname {P} \left(|T-nH_{n}|\geq c\,n\right)\leq {\frac {2}{c^{2}}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ea0e1e5c444161f05b82274877902503f55b566c)
尾部估算[編輯]
我們亦可用以下方法求另一個的上限:假設
表示在首r次收集中未有見到第i種贈券,則
![{\displaystyle {\begin{aligned}P\left[{Z}_{i}^{r}\right]=\left(1-{\frac {1}{n}}\right)^{r}\leq e^{-r/n}\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/67a53982df76dc7d07840b856926f1e1c66f71d8)
所以,若
,則有
.
![{\displaystyle {\begin{aligned}P\left[T>\beta n\log n\right]\leq P\left[\bigcup _{i}{Z}_{i}^{\beta n\log n}\right]\leq n\cdot P[{Z}_{1}]\leq n^{-\beta +1}\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/fb8eeb1aa9b184794402e9a32f0b74d40135734a)
用生成函數的解法[編輯]
另一種解決贈券收集問題的方法是用生成函數。
觀察得出,贈券收集的過程必然如下:
- 收集第一張贈券,其出現的概率是
![{\displaystyle n/n=1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/bfea45fe339fef2808313317eb0fb806da093fab)
- 收集了若干張第一種贈券
- 收集到一張第二種贈券,其出現的概率是
![{\displaystyle (n-1)/n}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5a06a444113d265460eea8a62a61c3dc047edd8b)
- 收集了若干張第一種或第二種贈券
- 收集到一張第三種贈券,其出現的概率是
![{\displaystyle (n-2)/n}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ac88aef34401dca9799a4314e65f5350ed081ea6)
- 收集了若干張第一種、第二種或第三種贈券
- 收集到一張第四種贈券,其出現的概率是
![{\displaystyle (n-3)/n}](https://wikimedia.org/api/rest_v1/media/math/render/svg/113122e8a6b39273df112a4dc5605808d1e44bbe)
![{\displaystyle \ldots }](https://wikimedia.org/api/rest_v1/media/math/render/svg/3b8619532e44ee1ccae3ab03405a6885260d09ed)
- 收集到一張最後一種贈券,其出現的概率是
![{\displaystyle 1/n}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f0e10667bad240500f5044257143510127e03d69)
若某一刻已若干種贈券,再收集到一張已重覆的贈券的概率是p,那麼,再收集到m張已重覆的贈券的概率就是
。則就此部分而言,有關m的概率母函數(PGF)是
![{\displaystyle G(z)=\sum _{m=0}^{\infty }p^{m}z^{m}=1+pz+p^{2}z^{2}+p^{3}z^{3}+\cdots ={\frac {1}{1-pz}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/e2b5853e15cb2b1469d3c3e875e7b9f9a66ec142)
若將上述收集過程分割為多個階段,則整個收集過程所花的時間的概率母函數為各部分的乘積,亦即
![{\displaystyle G(z)={\frac {n}{n}}z\cdot {\frac {1}{1-{\frac {1}{n}}z}}\cdot {\frac {n-1}{n}}z\cdot {\frac {1}{1-{\frac {2}{n}}z}}\cdot {\frac {n-2}{n}}z\cdot {\frac {1}{1-{\frac {3}{n}}z}}\cdot {\frac {n-3}{n}}z\cdots {\frac {1}{1-{\frac {n-1}{n}}z}}\cdot {\frac {n-(n-1)}{n}}z.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1e9902729895dff16e1eb0295451beb88d69a37c)
那麼,根據概率生成函數的特性,總收集次數T的期望值是
![{\displaystyle \operatorname {E} (T)=\left.{\frac {\mathrm {d} }{\mathrm {d} z}}G(z)\right|_{z=1}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/b45cab1253cb550b9d3c74198a588bd7402f40fb)
而某一T的概率則是
![{\displaystyle \Pr(T=k)=\left.{\frac {1}{k!}}{\frac {\mathrm {d} ^{k}G(z)}{\mathrm {d} z^{k}}}\right|_{z=0}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5283dd751af4abbb194071bb70919bc223eb7c90)
計算E(T)可先化簡
為
![{\displaystyle G(z)=z^{n}{\frac {n-1}{n-z}}{\frac {n-2}{n-2z}}{\frac {n-3}{n-3z}}\cdots {\frac {n-(n-1)}{n-(n-1)z}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/21e233f032721a95602e1d86f8a97bcff3d603b3)
因為
![{\displaystyle {\frac {\mathrm {d} }{\mathrm {d} z}}{\frac {n-k}{n-kz}}={\frac {k(n-k)}{(n-kz)^{2}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a3db58788fb8ad16ba4343c3e9b6bb24f5eff112)
所以
![{\displaystyle {\frac {\mathrm {d} }{\mathrm {d} z}}G(z)=G(z)\left({\frac {n}{z}}+{\frac {1}{n-z}}+{\frac {2}{n-2z}}+{\frac {3}{n-3z}}\cdots +{\frac {n-1}{n-(n-1)z}}\right)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/192c659411d846603922614047fba5a3c3a39fb4)
故此可得出
![{\displaystyle {\begin{aligned}\operatorname {E} (T)&=\left.{\frac {\mathrm {d} }{\mathrm {d} z}}G(z)\right|_{z=1}\\&=G(1)\left(n+{\frac {1}{n-1}}+{\frac {2}{n-2}}+{\frac {3}{n-3}}\cdots +{\frac {n-1}{n-(n-1)}}\right)\\&=n+\sum _{k=1}^{n-1}{\frac {k}{n-k}}\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8b1a27b649f6ed1ca55d65ff77e254f0df44801f)
其中的連加部分可化簡:
![{\displaystyle \sum _{k=1}^{n-1}{\frac {k}{n-k}}=\sum _{k=1}^{n-1}\left({\frac {k}{n-k}}-{\frac {n}{n-k}}\right)+nH_{n-1}=nH_{n-1}-(n-1)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1ad609ff5bab0dc13a083f7df95011f4dd5072aa)
所以得出:
用概率生成函數可同時求取變異量。變異量可寫作
![{\displaystyle \operatorname {Var} (T)=\operatorname {E} (T(T-1))+\operatorname {E} (T)-\operatorname {E} (T)^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/b77a78adb185623162744c3be964aae5fcc83fa9)
其中
![{\displaystyle {\begin{aligned}\operatorname {E} (T(T-1))=&\left.{\frac {\mathrm {d} ^{2}}{\mathrm {d} z^{2}}}G(z)\right|_{z=1}\\=&\left[G(z)\left({\frac {n}{z}}+{\frac {1}{n-z}}+{\frac {2}{n-2z}}+{\frac {3}{n-3z}}\cdots +{\frac {n-1}{n-(n-1)z}}\right)^{2}\right.\\&\;\left.\left.+G(z)\left(-{\frac {n}{z^{2}}}+{\frac {1^{2}}{(n-z)^{2}}}+{\frac {2^{2}}{(n-2z)^{2}}}+{\frac {3^{2}}{(n-3z)^{2}}}\cdots +{\frac {(n-1)^{2}}{(n-(n-1)z)^{2}}}\right)\right]\right|_{z=1}\\=&n^{2}H_{n}^{2}-n+\sum _{k=1}^{n-1}{\frac {k^{2}}{(n-k)^{2}}}\\=&n^{2}H_{n}^{2}-n+\sum _{k=1}^{n-1}{\frac {(n-k)^{2}}{k^{2}}}\\=&n^{2}H_{n}^{2}-n+n^{2}H_{n-1}^{(2)}-2nH_{n-1}+(n-1).\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a5676a52167ca8fdc95c974d97eef3f8aaea8c2d)
故得出:
![{\displaystyle {\begin{aligned}\operatorname {Var} (T)&=\;n^{2}H_{n}^{2}-1+n^{2}H_{n-1}^{(2)}-2nH_{n-1}+nH_{n-1}+1-n^{2}H_{n}^{2}\\&=\;n^{2}H_{n-1}^{(2)}-nH_{n-1}<{\frac {\pi ^{2}}{6}}n^{2}\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/01876375357292d779038e948858edb386545cf6)
參考文獻[編輯]
- Paul Erdős and Alfréd Rényi, On a classical problem of probability theory, Magyar Tud. Akad. Mat. Kutato Int. Kozl, 1961.
- William Feller, An introduction to Probability Theory and its Applications, 1957.
- Michael Mitzenmacher and Eli Upfal, Probability and Computing: Randomized Algorithms and Probabilistic Analysis, Cambridge University Press, 2005
- Donald J. Newman and Lawrence Shepp, The Double Dixie Cup Problem, American Mathematical Monthly, Vol. 67, No. 1 (Jan., 1960), pp. 58–61.
- Philippe Flajolet, Danièle Gardy, Loÿs Thimonier Birthday paradox, coupon collectors, caching algorithms and self-organizing search. (頁面存檔備份,存於互聯網檔案館), Discrete Applied Mathematics, Vol. 39, (1992), pp. 207–229
外部連結[編輯]