.. _sec_distributions: Distribuições ============= Agora que aprendemos como trabalhar com probabilidade tanto na configuração discreta quanto na contínua, vamos conhecer algumas das distribuições comuns encontradas. Dependendo da área de *machine learning*, podemos precisar estar familiarizados com muito mais delas ou, para algumas áreas de *deep learning*, possivelmente nenhuma. Esta é, no entanto, uma boa lista básica para se familiarizar. Vamos primeiro importar algumas bibliotecas comuns. .. raw:: html

mxnet pytorch tensorflow

.. raw:: html

.. code:: python %matplotlib inline from math import erf, factorial import numpy as np from IPython import display from d2l import mxnet as d2l .. raw:: html

.. raw:: html

.. code:: python %matplotlib inline from math import erf, factorial import torch from IPython import display from d2l import torch as d2l torch.pi = torch.acos(torch.zeros(1)) * 2 # Define pi in torch .. raw:: html

.. raw:: html

.. code:: python %matplotlib inline from math import erf, factorial import tensorflow as tf import tensorflow_probability as tfp from IPython import display from d2l import tensorflow as d2l tf.pi = tf.acos(tf.zeros(1)) * 2 # Define pi in TensorFlow .. raw:: html

.. raw:: html

Bernoulli --------- Esta é a variável aleatória mais simples normalmente encontrada. Esta variável aleatória codifica um lançamento de moeda que dá :math:`1` com probabilidade :math:`p` e :math:`0` com probabilidade :math:`1-p`. Se tivermos uma variável aleatória :math:`X` com esta distribuição, vamos escrever .. math:: X \sim \mathrm{Bernoulli}(p). A função de distribuição cumulativa é .. math:: F(x) = \begin{cases} 0 & x < 0, \\ 1-p & 0 \le x < 1, \\ 1 & x >= 1 . \end{cases} :label: eq_bernoulli_cdf A função de massa de probabilidade está representada abaixo. .. raw:: html

mxnet pytorch tensorflow

.. raw:: html

.. code:: python p = 0.3 d2l.set_figsize() d2l.plt.stem([0, 1], [1 - p, p], use_line_collection=True) d2l.plt.xlabel('x') d2l.plt.ylabel('p.m.f.') d2l.plt.show() .. figure:: output_distributions_c7d568_15_0.svg .. raw:: html

.. raw:: html

.. code:: python p = 0.3 d2l.set_figsize() d2l.plt.stem([0, 1], [1 - p, p], use_line_collection=True) d2l.plt.xlabel('x') d2l.plt.ylabel('p.m.f.') d2l.plt.show() .. figure:: output_distributions_c7d568_18_0.svg .. raw:: html

.. raw:: html

.. code:: python p = 0.3 d2l.set_figsize() d2l.plt.stem([0, 1], [1 - p, p], use_line_collection=True) d2l.plt.xlabel('x') d2l.plt.ylabel('p.m.f.') d2l.plt.show() .. figure:: output_distributions_c7d568_21_0.svg .. raw:: html

.. raw:: html

Agora, vamos representar graficamente a função de distribuição cumulativa :eq:`eq_bernoulli_cdf`. .. raw:: html

mxnet pytorch tensorflow

.. raw:: html

.. code:: python x = np.arange(-1, 2, 0.01) def F(x): return 0 if x < 0 else 1 if x > 1 else 1 - p d2l.plot(x, np.array([F(y) for y in x]), 'x', 'c.d.f.') .. figure:: output_distributions_c7d568_27_0.svg .. raw:: html

.. raw:: html

.. code:: python x = torch.arange(-1, 2, 0.01) def F(x): return 0 if x < 0 else 1 if x > 1 else 1 - p d2l.plot(x, torch.tensor([F(y) for y in x]), 'x', 'c.d.f.') .. figure:: output_distributions_c7d568_30_0.svg .. raw:: html

.. raw:: html

.. code:: python x = tf.range(-1, 2, 0.01) def F(x): return 0 if x < 0 else 1 if x > 1 else 1 - p d2l.plot(x, tf.constant([F(y) for y in x]), 'x', 'c.d.f.') .. figure:: output_distributions_c7d568_33_0.svg .. raw:: html

.. raw:: html

Se :math:`X \sim \mathrm{Bernoulli}(p)`, então: - :math:`\mu_X = p`, - :math:`\sigma_X^2 = p(1-p)`. Podemos amostrar uma matriz de forma arbitrária de uma variável aleatória de Bernoulli como segue. .. raw:: html

mxnet pytorch tensorflow

.. raw:: html

.. code:: python 1*(np.random.rand(10, 10) < p) .. parsed-literal:: :class: output array([[0, 0, 1, 1, 0, 1, 0, 1, 0, 1], [0, 0, 0, 1, 1, 0, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0, 1, 0], [0, 0, 1, 0, 0, 0, 0, 1, 0, 0], [0, 1, 0, 0, 1, 0, 1, 1, 1, 1], [1, 0, 1, 0, 0, 1, 1, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 1, 0, 1], [0, 0, 0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 1, 1, 0, 0, 0, 0, 0, 0]]) .. raw:: html

.. raw:: html

.. code:: python 1*(torch.rand(10, 10) < p) .. parsed-literal:: :class: output tensor([[0, 0, 0, 0, 0, 0, 0, 1, 0, 1], [0, 0, 1, 1, 0, 0, 0, 0, 0, 1], [0, 0, 1, 0, 0, 0, 0, 0, 0, 0], [0, 1, 1, 0, 1, 1, 0, 0, 1, 1], [1, 0, 0, 0, 0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 1, 1, 0, 1, 0], [0, 0, 0, 0, 1, 0, 0, 1, 0, 1], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0, 0, 1], [0, 0, 1, 0, 0, 1, 0, 1, 0, 0]]) .. raw:: html

.. raw:: html

.. code:: python tf.cast(tf.random.uniform((10, 10)) < p, dtype=tf.float32) .. parsed-literal:: :class: output .. raw:: html

.. raw:: html

Uniforme e Discreta ------------------- A próxima variável aleatória comumente encontrada é uma uniforme discreta. Para nossa discussão aqui, assumiremos que é suportada nos inteiros :math:`\{1, 2, \ldots, n\}`, entretanto qualquer outro conjunto de valores pode ser escolhido livremente. O significado da palavra *uniforme* neste contexto é que todos os valores possíveis são igualmente prováveis. A probabilidade para cada valor :math:`i \in \{1, 2, 3, \ldots, n\}` é :math:`p_i = \frac{1}{n}`. Vamos denotar uma variável aleatória :math:`X` com esta distribuição como .. math:: X \sim U(n). A função de distribuição cumulativa é .. math:: F(x) = \begin{cases} 0 & x < 1, \\ \frac{k}{n} & k \le x < k+1 \text{ with } 1 \le k < n, \\ 1 & x >= n . \end{cases} :label: eq_discrete_uniform_cdf Deixe-nos primeiro representar graficamente a função de massa de probabilidade. .. raw:: html

mxnet pytorch tensorflow

.. raw:: html

.. code:: python n = 5 d2l.plt.stem([i+1 for i in range(n)], n*[1 / n], use_line_collection=True) d2l.plt.xlabel('x') d2l.plt.ylabel('p.m.f.') d2l.plt.show() .. figure:: output_distributions_c7d568_51_0.svg .. raw:: html

.. raw:: html

.. code:: python n = 5 d2l.plt.stem([i+1 for i in range(n)], n*[1 / n], use_line_collection=True) d2l.plt.xlabel('x') d2l.plt.ylabel('p.m.f.') d2l.plt.show() .. figure:: output_distributions_c7d568_54_0.svg .. raw:: html

.. raw:: html

.. code:: python n = 5 d2l.plt.stem([i+1 for i in range(n)], n*[1 / n], use_line_collection=True) d2l.plt.xlabel('x') d2l.plt.ylabel('p.m.f.') d2l.plt.show() .. figure:: output_distributions_c7d568_57_0.svg .. raw:: html

.. raw:: html

Agora, vamos representar graficamente a função de distribuição cumulativa: eqref:\ ``eq_discrete_uniform_cdf``. .. raw:: html

mxnet pytorch tensorflow

.. raw:: html

.. code:: python x = np.arange(-1, 6, 0.01) def F(x): return 0 if x < 1 else 1 if x > n else np.floor(x) / n d2l.plot(x, np.array([F(y) for y in x]), 'x', 'c.d.f.') .. figure:: output_distributions_c7d568_63_0.svg .. raw:: html

.. raw:: html

.. code:: python x = torch.arange(-1, 6, 0.01) def F(x): return 0 if x < 1 else 1 if x > n else torch.floor(x) / n d2l.plot(x, torch.tensor([F(y) for y in x]), 'x', 'c.d.f.') .. figure:: output_distributions_c7d568_66_0.svg .. raw:: html

.. raw:: html

.. code:: python x = tf.range(-1, 6, 0.01) def F(x): return 0 if x < 1 else 1 if x > n else tf.floor(x) / n d2l.plot(x, [F(y) for y in x], 'x', 'c.d.f.') .. figure:: output_distributions_c7d568_69_0.svg .. raw:: html

.. raw:: html

If :math:`X \sim U(n)`, then: - :math:`\mu_X = \frac{1+n}{2}`, - :math:`\sigma_X^2 = \frac{n^2-1}{12}`. Podemos amostrar uma matriz de forma arbitrária a partir de uma variável aleatória uniforme discreta como segue. .. raw:: html

mxnet pytorch tensorflow

.. raw:: html

.. code:: python np.random.randint(1, n, size=(10, 10)) .. parsed-literal:: :class: output array([[1, 2, 3, 1, 2, 1, 4, 2, 1, 3], [3, 2, 4, 3, 1, 2, 3, 3, 1, 1], [3, 4, 2, 4, 3, 1, 2, 1, 1, 1], [4, 2, 2, 1, 3, 4, 1, 1, 3, 4], [2, 3, 2, 1, 1, 1, 1, 4, 3, 3], [2, 4, 2, 2, 3, 4, 4, 4, 1, 2], [3, 2, 2, 4, 3, 4, 3, 1, 2, 3], [1, 3, 4, 3, 1, 4, 2, 1, 2, 2], [4, 4, 4, 4, 2, 3, 2, 2, 4, 1], [2, 4, 3, 4, 4, 3, 4, 2, 1, 2]]) .. raw:: html

.. raw:: html

.. code:: python torch.randint(1, n, size=(10, 10)) .. parsed-literal:: :class: output tensor([[2, 4, 2, 4, 1, 3, 2, 2, 2, 4], [2, 4, 3, 3, 1, 4, 2, 4, 2, 3], [1, 2, 1, 2, 2, 3, 2, 1, 1, 3], [3, 4, 2, 4, 1, 1, 2, 1, 2, 2], [1, 3, 3, 3, 2, 3, 4, 4, 1, 1], [4, 2, 2, 4, 1, 3, 1, 3, 2, 4], [3, 3, 3, 3, 4, 3, 4, 2, 1, 2], [3, 3, 4, 2, 3, 1, 4, 2, 3, 1], [3, 4, 3, 1, 4, 1, 2, 2, 1, 3], [4, 4, 1, 3, 4, 4, 2, 2, 3, 1]]) .. raw:: html

.. raw:: html

.. code:: python tf.random.uniform((10, 10), 1, n, dtype=tf.int32) .. parsed-literal:: :class: output .. raw:: html

.. raw:: html

Uniforme e Contínua ------------------- A seguir, vamos discutir a distribuição uniforme contínua. A ideia por trás dessa variável aleatória é que, se aumentarmos :math:`n` na distribuição uniforme discreta e, em seguida, escaloná-la para caber no intervalo :math:`[a, b]`, abordaremos uma variável aleatória contínua que apenas escolhe um valor arbitrário em :math:`[a, b]` todos com probabilidade igual. Vamos denotar esta distribuição como .. math:: X \sim U(a, b). A função de densidade de probabilidade é .. math:: p(x) = \begin{cases} \frac{1}{b-a} & x \in [a, b], \\ 0 & x \not\in [a, b].\end{cases} :label: eq_cont_uniform_pdf A função de distribuição cumulativa é .. math:: F(x) = \begin{cases} 0 & x < a, \\ \frac{x-a}{b-a} & x \in [a, b], \\ 1 & x >= b . \end{cases} :label: eq_cont_uniform_cdf Vamos primeiro representar graficamente a função de densidade de probabilidade :eq:`eq_cont_uniform_pdf`. .. raw:: html

mxnet pytorch tensorflow

.. raw:: html

.. code:: python a, b = 1, 3 x = np.arange(0, 4, 0.01) p = (x > a)*(x < b)/(b - a) d2l.plot(x, p, 'x', 'p.d.f.') .. figure:: output_distributions_c7d568_87_0.svg .. raw:: html

.. raw:: html

.. code:: python a, b = 1, 3 x = torch.arange(0, 4, 0.01) p = (x > a).type(torch.float32)*(x < b).type(torch.float32)/(b-a) d2l.plot(x, p, 'x', 'p.d.f.') .. figure:: output_distributions_c7d568_90_0.svg .. raw:: html

.. raw:: html

.. code:: python a, b = 1, 3 x = tf.range(0, 4, 0.01) p = tf.cast(x > a, tf.float32) * tf.cast(x < b, tf.float32) / (b - a) d2l.plot(x, p, 'x', 'p.d.f.') .. figure:: output_distributions_c7d568_93_0.svg .. raw:: html

.. raw:: html

Agora, vamos representar graficamente a função de distribuição cumulativa :eq:`eq_cont_uniform_cdf`. .. raw:: html

mxnet pytorch tensorflow

.. raw:: html

.. code:: python def F(x): return 0 if x < a else 1 if x > b else (x - a) / (b - a) d2l.plot(x, np.array([F(y) for y in x]), 'x', 'c.d.f.') .. figure:: output_distributions_c7d568_99_0.svg .. raw:: html

.. raw:: html

.. code:: python def F(x): return 0 if x < a else 1 if x > b else (x - a) / (b - a) d2l.plot(x, torch.tensor([F(y) for y in x]), 'x', 'c.d.f.') .. figure:: output_distributions_c7d568_102_0.svg .. raw:: html

.. raw:: html

.. code:: python def F(x): return 0 if x < a else 1 if x > b else (x - a) / (b - a) d2l.plot(x, [F(y) for y in x], 'x', 'c.d.f.') .. figure:: output_distributions_c7d568_105_0.svg .. raw:: html

.. raw:: html

Se :math:`X \sim U(a, b)`, então: - :math:`\mu_X = \frac{a+b}{2}`, - :math:`\sigma_X^2 = \frac{(b-a)^2}{12}`. Podemos amostrar uma matriz de forma arbitrária a partir de uma variável aleatória uniforme da seguinte maneira. Observe que, por padrão, é uma amostra de :math:`U(0,1)`, portanto, se quisermos um intervalo diferente, precisamos escaloná-lo. .. raw:: html

mxnet pytorch tensorflow

.. raw:: html

.. code:: python (b - a) * np.random.rand(10, 10) + a .. parsed-literal:: :class: output array([[2.74743298, 2.03159855, 1.07516107, 1.0899085 , 1.56670805, 2.83306649, 2.85052916, 1.20927866, 1.78954319, 1.42051624], [1.42525046, 2.40676598, 1.97061896, 2.58075441, 1.79179163, 1.2028032 , 2.57083447, 2.16239145, 2.22592097, 2.2660454 ], [2.51061663, 2.82806143, 1.32962985, 1.94371877, 2.81431313, 2.35960726, 2.91475722, 1.36934338, 1.85225797, 1.06361245], [2.89271126, 1.65814835, 1.59879384, 2.53255822, 1.15259466, 1.17028711, 2.5488143 , 2.55240722, 1.61276803, 1.44948278], [2.97974614, 2.91643107, 1.68463885, 2.70473872, 2.19648016, 1.23668542, 2.51995671, 1.48999533, 2.38393232, 1.99265306], [2.41501795, 2.04038989, 2.27832101, 1.6753532 , 1.42217372, 1.91593201, 2.46707199, 2.55517501, 2.03477175, 2.98953302], [2.24644313, 1.95294047, 2.4238319 , 2.88006702, 2.46055798, 1.48457728, 1.59907046, 2.95518618, 2.97693772, 1.66927585], [1.1555122 , 1.63482392, 2.00827154, 1.09354961, 1.1059348 , 2.10055009, 1.27081608, 2.37430206, 2.74896879, 1.69641912], [1.88719054, 1.23264616, 1.55329472, 2.18297288, 1.00754217, 1.07738796, 1.74940998, 2.70668132, 1.65448103, 2.94373046], [1.08165866, 1.97111515, 2.90011096, 1.41029456, 1.54814817, 2.56687118, 1.32947441, 2.85149868, 2.99477963, 1.60930713]]) .. raw:: html

.. raw:: html

.. code:: python (b - a) * torch.rand(10, 10) + a .. parsed-literal:: :class: output tensor([[2.8223, 1.5381, 2.7166, 1.7558, 1.2417, 1.2230, 2.2913, 1.5042, 1.6630, 2.5570], [1.5648, 1.4812, 1.9697, 1.5672, 1.9874, 1.5999, 1.4540, 1.8198, 2.4150, 1.5929], [1.1754, 2.5958, 1.7233, 1.3277, 2.0125, 2.1797, 1.5919, 1.0497, 2.1812, 1.3366], [2.4354, 1.6165, 1.1859, 2.9547, 2.0595, 1.8269, 1.1119, 2.3576, 1.9823, 2.4681], [2.7830, 2.8629, 2.5027, 1.6039, 1.5405, 1.1954, 2.5066, 2.0107, 2.6782, 1.3883], [1.2284, 2.3149, 2.1947, 2.4759, 1.0598, 1.1362, 2.7960, 2.9847, 2.3273, 1.5196], [2.0935, 1.1542, 1.8042, 1.0448, 1.0768, 1.8240, 1.2579, 2.1343, 1.5140, 1.9848], [1.5720, 2.2829, 1.2114, 2.2404, 1.8355, 1.2281, 1.1693, 2.3944, 2.2222, 2.2738], [1.9178, 1.3517, 1.0908, 1.3381, 1.0433, 1.1117, 1.6149, 1.4846, 1.2166, 2.9854], [2.3129, 1.2611, 2.3859, 2.4383, 1.8703, 1.3239, 1.5865, 2.9044, 2.6024, 1.2154]]) .. raw:: html

.. raw:: html

.. code:: python (b - a) * tf.random.uniform((10, 10)) + a .. parsed-literal:: :class: output .. raw:: html

.. raw:: html

Binomial -------- Deixe-nos tornar as coisas um pouco mais complexas e examinar a variável aleatória *binomial*. Essa variável aleatória se origina da execução de uma sequência de :math:`n` experimentos independentes, cada um dos quais tem probabilidade :math:`p` de sucesso, e perguntando quantos sucessos esperamos ver. Vamos expressar isso matematicamente. Cada experimento é uma variável aleatória independente :math:`X_i`, onde usaremos :math:`1` para codificar o sucesso e :math:`0` para codificar a falha. Como cada um é um lançamento de moeda independente que é bem-sucedido com a probabilidade :math:`p`, podemos dizer que :math:`X_i \sim \mathrm{Bernoulli}(p)`. Então, a variável aleatória binomial é .. math:: X = \sum_{i=1}^n X_i. Neste caso, vamos escrever .. math:: X \sim \mathrm{Binomial}(n, p). Para obter a função de distribuição cumulativa, precisamos observar que obter exatamente :math:`k` sucessos podem ocorrer em :math:`\binom{n}{k} = \frac{n!}{k!(n-k)!}` maneiras, cada uma das quais tem uma probabilidade de :math:`p^k(1-p)^{n-k}` de ocorrer. Assim, a função de distribuição cumulativa é .. math:: F(x) = \begin{cases} 0 & x < 0, \\ \sum_{m \le k} \binom{n}{m} p^m(1-p)^{n-m} & k \le x < k+1 \text{ with } 0 \le k < n, \\ 1 & x >= n . \end{cases} :label: eq_binomial_cdf Deixe-nos primeiro representar graficamente a função de massa de probabilidade. .. raw:: html

mxnet pytorch tensorflow

.. raw:: html

.. code:: python n, p = 10, 0.2 # Compute binomial coefficient def binom(n, k): comb = 1 for i in range(min(k, n - k)): comb = comb * (n - i) // (i + 1) return comb pmf = np.array([p**i * (1-p)**(n - i) * binom(n, i) for i in range(n + 1)]) d2l.plt.stem([i for i in range(n + 1)], pmf, use_line_collection=True) d2l.plt.xlabel('x') d2l.plt.ylabel('p.m.f.') d2l.plt.show() .. figure:: output_distributions_c7d568_123_0.svg .. raw:: html

.. raw:: html

.. code:: python n, p = 10, 0.2 # Compute binomial coefficient def binom(n, k): comb = 1 for i in range(min(k, n - k)): comb = comb * (n - i) // (i + 1) return comb pmf = torch.tensor([p**i * (1-p)**(n - i) * binom(n, i) for i in range(n + 1)]) d2l.plt.stem([i for i in range(n + 1)], pmf, use_line_collection=True) d2l.plt.xlabel('x') d2l.plt.ylabel('p.m.f.') d2l.plt.show() .. figure:: output_distributions_c7d568_126_0.svg .. raw:: html

.. raw:: html

.. code:: python n, p = 10, 0.2 # Compute binomial coefficient def binom(n, k): comb = 1 for i in range(min(k, n - k)): comb = comb * (n - i) // (i + 1) return comb pmf = tf.constant([p**i * (1-p)**(n - i) * binom(n, i) for i in range(n + 1)]) d2l.plt.stem([i for i in range(n + 1)], pmf, use_line_collection=True) d2l.plt.xlabel('x') d2l.plt.ylabel('p.m.f.') d2l.plt.show() .. figure:: output_distributions_c7d568_129_0.svg .. raw:: html

.. raw:: html

Now, let us plot the cumulative distribution function :eq:`eq_binomial_cdf`. .. raw:: html

mxnet pytorch tensorflow

.. raw:: html

.. code:: python x = np.arange(-1, 11, 0.01) cmf = np.cumsum(pmf) def F(x): return 0 if x < 0 else 1 if x > n else cmf[int(x)] d2l.plot(x, np.array([F(y) for y in x.tolist()]), 'x', 'c.d.f.') .. figure:: output_distributions_c7d568_135_0.svg .. raw:: html

.. raw:: html

.. code:: python x = torch.arange(-1, 11, 0.01) cmf = torch.cumsum(pmf, dim=0) def F(x): return 0 if x < 0 else 1 if x > n else cmf[int(x)] d2l.plot(x, torch.tensor([F(y) for y in x.tolist()]), 'x', 'c.d.f.') .. figure:: output_distributions_c7d568_138_0.svg .. raw:: html

.. raw:: html

.. code:: python x = tf.range(-1, 11, 0.01) cmf = tf.cumsum(pmf) def F(x): return 0 if x < 0 else 1 if x > n else cmf[int(x)] d2l.plot(x, [F(y) for y in x.numpy().tolist()], 'x', 'c.d.f.') .. figure:: output_distributions_c7d568_141_0.svg .. raw:: html

.. raw:: html

Se :math:`X \sim \mathrm{Binomial}(n, p)`, então: - :math:`\mu_X = np`, - :math:`\sigma_X^2 = np(1-p)`. Isso decorre da linearidade do valor esperado sobre a soma das :math:`n` variáveis aleatórias de Bernoulli e do fato de que a variância da soma das variáveis aleatórias independentes é a soma das variâncias. Isso pode ser amostrado da seguinte maneira. .. raw:: html

mxnet pytorch tensorflow

.. raw:: html

.. code:: python np.random.binomial(n, p, size=(10, 10)) .. parsed-literal:: :class: output array([[1, 1, 3, 1, 3, 4, 0, 2, 1, 0], [1, 1, 2, 1, 1, 1, 2, 2, 1, 2], [3, 4, 2, 2, 4, 2, 3, 2, 2, 3], [4, 1, 1, 3, 1, 2, 3, 3, 2, 3], [3, 3, 2, 3, 0, 1, 2, 3, 2, 3], [3, 2, 1, 2, 3, 2, 1, 0, 1, 0], [2, 0, 2, 4, 2, 4, 3, 2, 1, 3], [2, 3, 2, 1, 3, 2, 2, 4, 0, 3], [4, 1, 3, 3, 0, 1, 2, 1, 2, 2], [0, 1, 3, 0, 1, 0, 5, 2, 4, 4]]) .. raw:: html

.. raw:: html

.. code:: python m = torch.distributions.binomial.Binomial(n, p) m.sample(sample_shape=(10, 10)) .. parsed-literal:: :class: output tensor([[3., 3., 3., 2., 1., 4., 3., 1., 2., 2.], [1., 3., 1., 2., 1., 1., 3., 5., 2., 1.], [1., 2., 1., 0., 3., 4., 2., 1., 3., 1.], [2., 3., 0., 0., 3., 3., 3., 1., 3., 3.], [2., 3., 3., 1., 4., 1., 2., 2., 3., 2.], [1., 3., 2., 4., 1., 1., 1., 2., 3., 3.], [2., 4., 2., 1., 2., 4., 4., 1., 1., 3.], [2., 0., 3., 3., 0., 2., 2., 0., 3., 4.], [2., 1., 2., 3., 1., 1., 0., 3., 0., 2.], [3., 1., 2., 1., 1., 0., 0., 0., 1., 2.]]) .. raw:: html

.. raw:: html

.. code:: python m = tfp.distributions.Binomial(n, p) m.sample(sample_shape=(10, 10)) .. parsed-literal:: :class: output .. raw:: html

.. raw:: html

Poisson ------- Vamos agora realizar um experimento mental. Estamos parados em um ponto de ônibus e queremos saber quantos ônibus chegarão no próximo minuto. Vamos começar considerando :math:`X^{(1)} \sim \mathrm{Bernoulli}(p)` que é simplesmente a probabilidade de que um ônibus chegue na janela de um minuto. Para paradas de ônibus longe de um centro urbano, essa pode ser uma boa aproximação. Podemos nunca ver mais de um ônibus por minuto. Porém, se estivermos em uma área movimentada, é possível ou mesmo provável que cheguem dois ônibus. Podemos modelar isso dividindo nossa variável aleatória em duas partes nos primeiros 30 segundos ou nos segundos 30 segundos. Neste caso, podemos escrever .. math:: X^{(2)} \sim X^{(2)}_1 + X^{(2)}_2, onde :math:`X^{(2)}` é a soma total, e :math:`X^{(2)}_i \sim \mathrm{Bernoulli}(p/2)`. A distribuição total é então :math:`X^{(2)} \sim \mathrm{Binomial}(2, p/2)`. Why stop here? Let us continue to split that minute into :math:`n` parts. By the same reasoning as above, we see that .. math:: X^{(n)} \sim \mathrm{Binomial}(n, p/n). :label: eq_eq_poisson_approx Considere essas variáveis aleatórias. Pela seção anterior, sabemos que :eq:`eq_eq_poisson_approx` tem média :math:`\mu_{X^{(n)}} = n(p/n) = p`, e variância :math:`\sigma_{X^{(n)}}^2 = n(p/n)(1-(p/n)) = p(1-p/n)`. Se tomarmos :math:`n \rightarrow \infty`, podemos ver que esses números se estabilizam em :math:`\mu_{X^{(\infty)}} = p`, e variância :math:`\sigma_{X^{(\infty)}}^2 = p`. Isso indica que *pode haver* alguma variável aleatória que podemos definir neste limite de subdivisão infinito. Isso não deve ser uma surpresa, já que no mundo real podemos apenas contar o número de chegadas de ônibus, no entanto, é bom ver que nosso modelo matemático está bem definido. Essa discussão pode ser formalizada como a *lei dos eventos raros*. Seguindo esse raciocínio com cuidado, podemos chegar ao seguinte modelo. Diremos que :math:`X \sim \mathrm{Poisson}(\lambda)` se for uma variável aleatória que assume os valores :math:`\{0,1,2, \ldots\}` com probabilidade .. math:: p_k = \frac{\lambda^ke^{-\lambda}}{k!}. :label: eq_poisson_mass O valor :math:`\lambda > 0` é conhecido como *taxa* (ou o parâmetro *forma*) e denota o número médio de chegadas que esperamos em uma unidade de tempo. Podemos somar essa função de massa de probabilidade para obter a função de distribuição cumulativa. .. math:: F(x) = \begin{cases} 0 & x < 0, \\ e^{-\lambda}\sum_{m = 0}^k \frac{\lambda^m}{m!} & k \le x < k+1 \text{ with } 0 \le k. \end{cases} :label: eq_poisson_cdf Vamos primeiro representar graficamente a função de massa de probabilidade :eq:`eq_poisson_mass`. .. raw:: html

mxnet pytorch tensorflow

.. raw:: html

.. code:: python lam = 5.0 xs = [i for i in range(20)] pmf = np.array([np.exp(-lam) * lam**k / factorial(k) for k in xs]) d2l.plt.stem(xs, pmf, use_line_collection=True) d2l.plt.xlabel('x') d2l.plt.ylabel('p.m.f.') d2l.plt.show() .. figure:: output_distributions_c7d568_159_0.svg .. raw:: html

.. raw:: html

.. code:: python lam = 5.0 xs = [i for i in range(20)] pmf = torch.tensor([torch.exp(torch.tensor(-lam)) * lam**k / factorial(k) for k in xs]) d2l.plt.stem(xs, pmf, use_line_collection=True) d2l.plt.xlabel('x') d2l.plt.ylabel('p.m.f.') d2l.plt.show() .. figure:: output_distributions_c7d568_162_0.svg .. raw:: html

.. raw:: html

.. code:: python lam = 5.0 xs = [i for i in range(20)] pmf = tf.constant([tf.exp(tf.constant(-lam)).numpy() * lam**k / factorial(k) for k in xs]) d2l.plt.stem(xs, pmf, use_line_collection=True) d2l.plt.xlabel('x') d2l.plt.ylabel('p.m.f.') d2l.plt.show() .. figure:: output_distributions_c7d568_165_0.svg .. raw:: html

.. raw:: html

Agora, vamos representar graficamente a função de distribuição cumulativa :eq:`eq_poisson_cdf`. .. raw:: html

mxnet pytorch tensorflow

.. raw:: html

.. code:: python x = np.arange(-1, 21, 0.01) cmf = np.cumsum(pmf) def F(x): return 0 if x < 0 else 1 if x > n else cmf[int(x)] d2l.plot(x, np.array([F(y) for y in x.tolist()]), 'x', 'c.d.f.') .. figure:: output_distributions_c7d568_171_0.svg .. raw:: html

.. raw:: html

.. code:: python x = torch.arange(-1, 21, 0.01) cmf = torch.cumsum(pmf, dim=0) def F(x): return 0 if x < 0 else 1 if x > n else cmf[int(x)] d2l.plot(x, torch.tensor([F(y) for y in x.tolist()]), 'x', 'c.d.f.') .. figure:: output_distributions_c7d568_174_0.svg .. raw:: html

.. raw:: html

.. code:: python x = tf.range(-1, 21, 0.01) cmf = tf.cumsum(pmf) def F(x): return 0 if x < 0 else 1 if x > n else cmf[int(x)] d2l.plot(x, [F(y) for y in x.numpy().tolist()], 'x', 'c.d.f.') .. figure:: output_distributions_c7d568_177_0.svg .. raw:: html

.. raw:: html

Como vimos acima, as médias e variações são particularmente concisas. Se :math:`X \sim \mathrm{Poisson}(\lambda)`, então: - :math:`\mu_X = \lambda`, - :math:`\sigma_X^2 = \lambda`. Isso pode ser amostrado da seguinte maneira. .. raw:: html

mxnet pytorch tensorflow

.. raw:: html

.. code:: python np.random.poisson(lam, size=(10, 10)) .. parsed-literal:: :class: output array([[3, 6, 4, 5, 4, 6, 5, 1, 3, 3], [2, 0, 9, 6, 9, 1, 5, 5, 9, 1], [6, 4, 3, 6, 7, 2, 6, 6, 3, 4], [5, 3, 5, 2, 4, 3, 7, 5, 4, 4], [5, 8, 9, 4, 7, 3, 2, 8, 6, 3], [7, 3, 5, 3, 2, 7, 9, 6, 4, 7], [7, 4, 4, 6, 3, 9, 8, 5, 7, 2], [3, 5, 2, 4, 6, 3, 7, 6, 5, 5], [5, 2, 4, 8, 3, 3, 6, 4, 6, 6], [8, 4, 3, 3, 4, 6, 5, 7, 5, 5]]) .. raw:: html

.. raw:: html

.. code:: python m = torch.distributions.poisson.Poisson(lam) m.sample((10, 10)) .. parsed-literal:: :class: output tensor([[ 4., 7., 5., 6., 4., 7., 4., 6., 3., 3.], [ 5., 6., 7., 9., 6., 5., 3., 7., 3., 6.], [10., 5., 2., 2., 6., 4., 5., 6., 3., 4.], [ 3., 8., 7., 8., 3., 1., 7., 3., 5., 6.], [ 6., 4., 6., 9., 8., 7., 4., 7., 3., 7.], [ 5., 9., 3., 3., 3., 3., 2., 5., 4., 8.], [ 4., 2., 5., 2., 4., 8., 2., 9., 9., 3.], [ 2., 2., 5., 6., 1., 5., 4., 6., 6., 2.], [ 8., 5., 6., 4., 5., 7., 9., 9., 3., 6.], [ 3., 2., 4., 7., 3., 6., 5., 2., 4., 4.]]) .. raw:: html

.. raw:: html

.. code:: python m = tfp.distributions.Poisson(lam) m.sample((10, 10)) .. parsed-literal:: :class: output .. raw:: html

.. raw:: html

Gaussiana --------- Agora, vamos tentar um experimento diferente, mas relacionado. Digamos que estamos novamente realizando :math:`n` medidas independentes de :math:`\mathrm{Bernoulli}(p)` :math:`X_i`. A distribuição da soma delas é :math:`X^{(n)} \sim \mathrm{Binomial}(n, p)`. Em vez de considerar um limite à medida que :math:`n` aumenta e :math:`p` diminui, vamos corrigir :math:`p` e enviar :math:`n \rightarrow \infty`. Neste caso :math:`\mu_{X^{(n)}} = np \rightarrow \infty` e :math:`\sigma_{X^{(n)}}^2 = np(1-p) \rightarrow \infty`, portanto, não há razão para pensar que esse limite deva ser bem definido. No entanto, nem toda esperança está perdida! Vamos apenas fazer com que a média e a variância sejam bem comportadas, definindo .. math:: Y^{(n)} = \frac{X^{(n)} - \mu_{X^{(n)}}}{\sigma_{X^{(n)}}}. Pode-se ver que isso tem média zero e variância um e, portanto, é plausível acreditar que convergirá para alguma distribuição limitante. Se traçarmos a aparência dessas distribuições, ficaremos ainda mais convencidos de que funcionará. .. raw:: html

mxnet pytorch tensorflow

.. raw:: html

.. code:: python p = 0.2 ns = [1, 10, 100, 1000] d2l.plt.figure(figsize=(10, 3)) for i in range(4): n = ns[i] pmf = np.array([p**i * (1-p)**(n-i) * binom(n, i) for i in range(n + 1)]) d2l.plt.subplot(1, 4, i + 1) d2l.plt.stem([(i - n*p)/np.sqrt(n*p*(1 - p)) for i in range(n + 1)], pmf, use_line_collection=True) d2l.plt.xlim([-4, 4]) d2l.plt.xlabel('x') d2l.plt.ylabel('p.m.f.') d2l.plt.title("n = {}".format(n)) d2l.plt.show() .. figure:: output_distributions_c7d568_195_0.svg .. raw:: html

.. raw:: html

.. code:: python p = 0.2 ns = [1, 10, 100, 1000] d2l.plt.figure(figsize=(10, 3)) for i in range(4): n = ns[i] pmf = torch.tensor([p**i * (1-p)**(n-i) * binom(n, i) for i in range(n + 1)]) d2l.plt.subplot(1, 4, i + 1) d2l.plt.stem([(i - n*p)/torch.sqrt(torch.tensor(n*p*(1 - p))) for i in range(n + 1)], pmf, use_line_collection=True) d2l.plt.xlim([-4, 4]) d2l.plt.xlabel('x') d2l.plt.ylabel('p.m.f.') d2l.plt.title("n = {}".format(n)) d2l.plt.show() .. figure:: output_distributions_c7d568_198_0.svg .. raw:: html

.. raw:: html

.. code:: python p = 0.2 ns = [1, 10, 100, 1000] d2l.plt.figure(figsize=(10, 3)) for i in range(4): n = ns[i] pmf = tf.constant([p**i * (1-p)**(n-i) * binom(n, i) for i in range(n + 1)]) d2l.plt.subplot(1, 4, i + 1) d2l.plt.stem([(i - n*p)/tf.sqrt(tf.constant(n*p*(1 - p))) for i in range(n + 1)], pmf, use_line_collection=True) d2l.plt.xlim([-4, 4]) d2l.plt.xlabel('x') d2l.plt.ylabel('p.m.f.') d2l.plt.title("n = {}".format(n)) d2l.plt.show() .. figure:: output_distributions_c7d568_201_0.svg .. raw:: html

.. raw:: html

Uma coisa a observar: em comparação com o caso de Poisson, agora estamos dividindo pelo desvio padrão, o que significa que estamos comprimindo os resultados possíveis em áreas cada vez menores. Isso é uma indicação de que nosso limite não será mais discreto, mas sim contínuo. Uma derivação do que ocorre está além do escopo deste documento, mas o *teorema do limite central* afirma que, como :math:`n \rightarrow \infty`, isso resultará na Distribuição Gaussiana (ou as vezes na distribuição normal). Mais explicitamente, para qualquer :math:`a, b`: .. math:: \lim_{n \rightarrow \infty} P(Y^{(n)} \in [a, b]) = P(\mathcal{N}(0,1) \in [a, b]), onde dizemos que uma variável aleatória é normalmente distribuída com dada média :math:`\mu` e variância :math:`\sigma^2`, escrita :math:`X \sim \mathcal{N}(\mu, \sigma^2)` se :math:`X` tem densidade .. math:: p_X(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}. :label: eq_gaussian_pdf Vamos primeiro representar graficamente a função de densidade de probabilidade :eq:`eq_gaussian_pdf`. .. raw:: html

mxnet pytorch tensorflow

.. raw:: html

.. code:: python mu, sigma = 0, 1 x = np.arange(-3, 3, 0.01) p = 1 / np.sqrt(2 * np.pi * sigma**2) * np.exp(-(x - mu)**2 / (2 * sigma**2)) d2l.plot(x, p, 'x', 'p.d.f.') .. figure:: output_distributions_c7d568_207_0.svg .. raw:: html

.. raw:: html

.. code:: python mu, sigma = 0, 1 x = torch.arange(-3, 3, 0.01) p = 1 / torch.sqrt(2 * torch.pi * sigma**2) * torch.exp( -(x - mu)**2 / (2 * sigma**2)) d2l.plot(x, p, 'x', 'p.d.f.') .. figure:: output_distributions_c7d568_210_0.svg .. raw:: html

.. raw:: html

.. code:: python mu, sigma = 0, 1 x = tf.range(-3, 3, 0.01) p = 1 / tf.sqrt(2 * tf.pi * sigma**2) * tf.exp( -(x - mu)**2 / (2 * sigma**2)) d2l.plot(x, p, 'x', 'p.d.f.') .. figure:: output_distributions_c7d568_213_0.svg .. raw:: html

.. raw:: html

Agora, vamos representar graficamente a função de distribuição cumulativa. Está além do escopo deste apêndice, mas a f.d.c. Gaussiana não tem uma fórmula de forma fechada em termos de funções mais elementares. Usaremos ``erf``, que fornece uma maneira de calcular essa integral numericamente. .. raw:: html

mxnet pytorch tensorflow

.. raw:: html

.. code:: python def phi(x): return (1.0 + erf((x - mu) / (sigma * np.sqrt(2)))) / 2.0 d2l.plot(x, np.array([phi(y) for y in x.tolist()]), 'x', 'c.d.f.') .. figure:: output_distributions_c7d568_219_0.svg .. raw:: html

.. raw:: html

.. code:: python def phi(x): return (1.0 + erf((x - mu) / (sigma * torch.sqrt(torch.tensor(2.))))) / 2.0 d2l.plot(x, torch.tensor([phi(y) for y in x.tolist()]), 'x', 'c.d.f.') .. figure:: output_distributions_c7d568_222_0.svg .. raw:: html

.. raw:: html

.. code:: python def phi(x): return (1.0 + erf((x - mu) / (sigma * tf.sqrt(tf.constant(2.))))) / 2.0 d2l.plot(x, [phi(y) for y in x.numpy().tolist()], 'x', 'c.d.f.') .. figure:: output_distributions_c7d568_225_0.svg .. raw:: html

.. raw:: html

Os leitores mais atentos reconhecerão alguns desses termos. Na verdade, encontramos essa integral em :numref:`sec_integral_calculus`. Na verdade, precisamos exatamente desse cálculo para ver que esse :math:`p_X (x)` tem área total um e, portanto, é uma densidade válida. Nossa escolha de trabalhar com cara ou coroa tornou os cálculos mais curtos, mas nada nessa escolha foi fundamental. De fato, se tomarmos qualquer coleção de variáveis aleatórias independentes distribuídas de forma idêntica :math:`X_i`, e formar .. math:: X^{(N)} = \sum_{i=1}^N X_i. Then .. math:: \frac{X^{(N)} - \mu_{X^{(N)}}}{\sigma_{X^{(N)}}} será aproximadamente gaussiana. Existem requisitos adicionais necessários para fazê-la funcionar, mais comumente :math:`E[X^4] < \infty`, mas a filosofia é clara. O teorema do limite central é a razão pela qual o Gaussiano é fundamental para probabilidade, estatística e aprendizado de máquina. Sempre que podemos dizer que algo que medimos é a soma de muitas pequenas contribuições independentes, podemos supor que o que está sendo medido será próximo de gaussiano. Existem muitas outras propriedades fascinantes das gaussianas, e gostaríamos de discutir mais uma aqui. A Gaussiana é conhecida como *distribuição de entropia máxima*. Entraremos em entropia mais profundamente em :numref:`sec_information_theory`, no entanto, tudo o que precisamos saber neste ponto é que é uma medida de aleatoriedade. Em um sentido matemático rigoroso, podemos pensar no gaussiano como a escolha *mais* aleatória de variável aleatória com média e variância fixas. Portanto, se sabemos que nossa variável aleatória tem alguma média e variância, a Gaussiana é, de certo modo, a escolha de distribuição mais conservadora que podemos fazer. Para fechar a seção, vamos lembrar que se :math:`X \sim \mathcal{N}(\mu, \sigma^2)`, então: - :math:`\mu_X = \mu`, - :math:`\sigma_X^2 = \sigma^2`. Podemos obter uma amostra da distribuição gaussiana (ou normal padrão), conforme mostrado abaixo. .. raw:: html

mxnet pytorch tensorflow

.. raw:: html

.. code:: python np.random.normal(mu, sigma, size=(10, 10)) .. parsed-literal:: :class: output array([[ 1.49917742, -1.89738727, -0.15765172, -1.30406481, 1.39513244, 0.73954538, 1.15326125, 1.27176545, -1.5794067 , -0.99068548], [-1.15778637, 1.6457573 , 0.30449236, 0.74880239, -1.09216768, -0.572821 , -0.51135549, 1.19231727, 0.45957271, -0.02058818], [-0.05529317, 0.29419677, 0.18805374, 0.76098036, -0.97031427, 1.96972514, 1.88326016, 0.6229381 , 0.54420812, -0.7531442 ], [ 2.27807991, -0.21409296, 0.45279401, -0.89506324, 0.28100847, -1.86260982, 0.07614099, -0.24923532, 0.17389077, -0.7427528 ], [-0.36021216, -1.15600105, -0.25003882, -0.83905388, 0.01927618, -1.56239161, 1.14385223, -0.24816723, 0.50232904, 0.18643067], [-1.31329087, 0.05678681, 0.74464929, 0.61052243, -1.06151488, -0.40652285, 0.72652966, 1.12828664, -0.77463378, -0.90476107], [ 0.20437941, -1.15966042, 0.14822335, 1.34530682, -0.43604435, -0.48720988, -0.22021717, -0.03575601, 0.03048036, -1.27704417], [-1.23587311, 1.01658176, -0.30615902, 0.79378662, 0.62172223, 2.20654089, -0.18224328, -1.90468774, -0.38497634, -0.40938212], [-1.69134761, 1.54025185, -0.33894673, -0.55753099, -0.19299495, -0.34038202, -0.26246888, 0.67041897, -1.5196233 , 0.11619327], [ 0.23278424, 1.82530067, 1.8425147 , 1.38355034, 2.09754797, 0.68892611, -0.16139531, -1.61466365, -0.35005664, 0.92306538]]) .. raw:: html

.. raw:: html

.. code:: python torch.normal(mu, sigma, size=(10, 10)) .. parsed-literal:: :class: output tensor([[ 0.6520, 0.3000, -0.4574, -0.2548, 0.0422, -1.8848, -0.0027, 0.1869, -0.8954, 0.4869], [-0.2392, -1.8471, -1.2784, 0.1939, -0.3280, 2.1257, -0.5251, -0.2571, 1.5258, 0.1374], [-0.1976, 1.2392, -1.3503, -1.1267, 1.2744, 0.2877, -0.8623, 0.6505, -0.0413, -0.6328], [ 0.2136, -0.6770, 0.5709, 0.2465, -0.4408, 1.0694, 0.9333, -0.8263, -0.3953, -1.5467], [-0.0639, 1.3519, 1.1904, 0.4413, -0.2036, -0.8335, -0.6494, 1.0609, 0.3457, -0.0494], [-0.1203, -0.0650, 1.6500, -0.0922, -0.7590, -0.8886, -0.6484, 0.1333, 0.9615, 0.8912], [ 0.9083, -0.0155, 0.3283, 1.1933, -0.5716, -0.0458, -0.0481, -1.4992, 0.4722, -0.3956], [-0.6736, -1.6859, 0.1728, -1.0743, 0.8750, 0.1384, -0.1763, 0.7425, 0.7856, 2.4783], [-1.5720, 0.0452, -0.0454, 0.1253, -0.2715, 0.4259, -1.0297, -0.9537, 1.1536, 1.6208], [-0.7449, 1.9093, -0.1521, 1.6758, 0.7218, 0.7647, -2.6015, 0.0139, 0.5314, -0.0911]]) .. raw:: html

.. raw:: html

.. code:: python tf.random.normal((10, 10), mu, sigma) .. parsed-literal:: :class: output .. raw:: html

.. raw:: html

.. _subsec_exponential_family: Família Exponencial ------------------- Uma propriedade compartilhada para todas as distribuições listadas acima é que todas pertencem à conhecida como *família exponencial*. A família exponencial é um conjunto de distribuições cuja densidade pode ser expressa no seguinte Formato: .. math:: p(\mathbf{x} | \boldsymbol{\eta}) = h(\mathbf{x}) \cdot \mathrm{exp} \left( \boldsymbol{\eta}^{\top} \cdot T(\mathbf{x}) - A(\boldsymbol{\eta}) \right) :label: eq_exp_pdf Como essa definição pode ser um pouco sutil, vamos examiná-la de perto. Primeiro, :math:`h(\mathbf{x})` é conhecido como a *medida subjacente* ou a *medida de base*. Isso pode ser visto como uma escolha original da medida que estamos modificando com nosso peso exponencial. Em segundo lugar, temos o vetor :math:`\boldsymbol{\eta} = (\eta_1, \eta_2, ..., \eta_l) \in\mathbb{R}^l` chamado de *parâmetros naturais* ou *parâmetros canônicos*. Eles definem como a medida base será modificada. Os parâmetros naturais entram na nova medida tomando o produto escalar desses parâmetros em relação a alguma função :math:`T(\cdot)` de :raw-latex:`\mathbf{x}`= (x_1, x_2, …, x_n) :raw-latex:`\in`:raw-latex:`\mathbb{R}`^n$ e exponenciado. O vetor :math:`T(\mathbf{x})= (T_1(\mathbf{x}), T_2(\mathbf{x}), ..., T_l(\mathbf{x}))` é chamado de *estatísticas suficientes* para :math:`\boldsymbol{\eta}`. Este nome é usado uma vez que a informação representada por :math:`T(\mathbf{x})` é suficiente para calcular a densidade de probabilidade e nenhuma outra informação da amostra :math:`\mathbf{x}` é requerida. Terceiro, temos :math:`A(\boldsymbol{\eta})`, que é referido como a *função cumulativa*, que garante que a distribuição acima :eq:`eq_exp_pdf` integra-se a um, ou seja, .. math:: A(\boldsymbol{\eta}) = \log \left[\int h(\mathbf{x}) \cdot \mathrm{exp} \left(\boldsymbol{\eta}^{\top} \cdot T(\mathbf{x}) \right) d\mathbf{x} \right]. Para sermos concretos, consideremos o gaussiano. Supondo que :math:`\mathbf{x}` seja uma variável univariada, vimos que ela tinha uma densidade de .. math:: \begin{aligned} p(x | \mu, \sigma) &= \frac{1}{\sqrt{2 \pi \sigma^2}} \cdot \mathrm{exp} \left\{ \frac{-(x-\mu)^2}{2 \sigma^2} \right\} \\ &= \frac{1}{\sqrt{2 \pi}} \cdot \mathrm{exp} \left\{ \frac{\mu}{\sigma^2}x -\frac{1}{2 \sigma^2} x^2 - \left( \frac{1}{2 \sigma^2} \mu^2 +\log(\sigma) \right) \right\}. \end{aligned} Isso corresponde à definição da família exponencial com: - *medida subjacente*: :math:`h(x) = \frac{1}{\sqrt{2 \pi}}`, - *parâmetros naturais*: :math:`\boldsymbol{\eta} = \begin{bmatrix} \eta_1 \\ \eta_2 \end{bmatrix} = \begin{bmatrix} \frac{\mu}{\sigma^2} \\ \frac{1}{2 \sigma^2} \end{bmatrix}`, - *estatísticas suficientes*: :math:`T(x) = \begin{bmatrix}x\\-x^2\end{bmatrix}`, and - *função cumulativa*: :math:`A({\boldsymbol\eta}) = \frac{1}{2 \sigma^2} \mu^2 + \log(\sigma) = \frac{\eta_1^2}{4 \eta_2} - \frac{1}{2}\log(2 \eta_2)`. É importante notar que a escolha exata de cada um dos termos acima é um pouco arbitrária. Na verdade, a característica importante é que a distribuição pode ser expressa nesta forma, não na forma exata em si. Como aludimos em :numref:`subsec_softmax_and_derivatives`, uma técnica amplamente utilizada é assumir que a saída final :math:`\mathbf{y}` segue uma distribuição da família exponencial. A família exponencial é uma comum e poderosa família de distribuições encontradas com frequência no *machine learning*. Resumo ------ - Variáveis aleatórias de Bernoulli podem ser usadas para modelar eventos com um resultado sim/não. - O modelo de distribuições uniformes discretas seleciona a partir de um conjunto finito de possibilidades. - Distribuições uniformes contínuas selecionam a partir de um intervalo. - As distribuições binomiais modelam uma série de variáveis aleatórias de Bernoulli e contam o número de sucessos. - Variáveis aleatórias de Poisson modelam a chegada de eventos raros. - Variáveis aleatórias gaussianas modelam o resultado da adição de um grande número de variáveis aleatórias independentes. - Todas as distribuições acima pertencem à família exponencial. Exercícios ---------- 1. Qual é o desvio padrão de uma variável aleatória que é a diferença :math:`X-Y` de duas variáveis aleatórias binomiais independentes :math:`X, Y \sim \mathrm{Binomial}(16, 1/2)`. 2. Se tomarmos uma variável aleatória de Poisson\ :math:`X \sim \mathrm{Poisson}(\lambda)` e considerar :math:`(X - \lambda)/\sqrt{\lambda}` como :math:`\lambda \rightarrow \infty`, podemos mostrar que isso se torna aproximadamente gaussiano. Por que isso faz sentido? 3. Qual é a função de massa de probabilidade para uma soma de duas variáveis aleatórias uniformes discretas em :math:`n` elementos? .. raw:: html

mxnet pytorch tensorflow

.. raw:: html

`Discussões `__ .. raw:: html

.. raw:: html

`Discussões `__ .. raw:: html

.. raw:: html

`Discussões `__ .. raw:: html

.. raw:: html

.. raw:: html