$Maximum Likelihood Estimation$

Question

$Maximum Likelihood Estimation$

A little disclaimer:
I started learning ML and currently reading Ian Goodfellow's "Deep learning" and these are my first attempts to use my Statistics and Probabilities Theory knowledge, so some concepts are not directly clear for me.

Everything was more or less clear prior to Maximum Likelihood Estimation.
Below the author is trying to explain how it works.

"Consider a set of m examples ... drawn independently from the true but unknown data generating distribution pdata(x)"
What is meant here? That x is some random variable where pdata is some unknown probability density function?

Then "Let pmodel(x;θ) be a parametric family of probability distributions over the same space indexed by θ"
So "parametric family of probability distributions" means any PDF with some parameters, right? For example Normal with it's mean and variance? But what does "over the same space indexed by θ" mean? What is θ in current context? Is it a parameter(or set of parameters) of PDF?

Probability Statistics

Babaduras

106

Report

Answer

The answer is accepted.

Join Matchmaticians Affiliate Marketing Program to earn up to a 50% commission on every question that your affiliated users ask or answer.

Answer 1

Answers can only be viewed under the following conditions:

The questioner was satisfied with and accepted the answer, or
The answer was evaluated as being 100% correct by the judge.

View the answer

OK, he set of examples X=x(1),...,x(m) refers to a collection of data points, where each x(i) is an individual example. These are drawn independently from the true but unknown data generating distribution pdata(x). To put it in simple words, it means you have some real-world data points, and pdata(x) represents the underlying probability distribution that generated these data points. This distribution is unknown to us.

Now, regarding the "parametric family of probability distributions", it refers to a group of probability distributions that share the same mathematical form but differ in terms of their parameters. For example, the normal distribution is a parametric family of distributions, where the mean (μ) and variance (σ2) are the parameters that can vary (or for example the exponential distribution, Poisson distribution, etc.)

Regarding your other question, θ represents the parameters that define the probability distribution within the parametric family. For the normal distribution, θ would include both the mean and the variance θ={μ,σ2} for the above example. When the text says "over the same space indexed by θ," it means that for each value of θ (each set of parameters), you have a specific probability distribution within the family. So, for a normal distribution, different values of θ give you different normal distributions, each with its own mean and variance. This is what is meant by "indexed by θ"

The MLE principle aims to find the values of the parameters θ that make the observed data most probable under the model pmodel(x;θ). IOW, MLE seeks the parameter values that maximize the likelihood of the observed data given the model.

Hope this helps.

Kav10

2.1K

Babaduras

0

@Kav10 so pdata is some random value from pmodel(x;θ)?
- Kav10
  
  0
  
  No, pdata is not a random value. It is the true but unknown probability distribution that generated the actual data points in your dataset. It's not a single random value; rather, it's a function that describes how likely each possible value x is to occur in the real data distribution.
- Kav10
  
  0
  
  pmodel is a function within a parametric family of probability distributions. It's a model you create to approximate the true data distribution pdata (x).
- Kav10
  
  0
  
  So, when you perform MLE, you're trying to find the values of θ that make the model's probability distribution pmodel (x;θ) align as closely as possible with the true data distribution pdata (x). It's not about treating pdata (x) as a random value, but about matching the characteristics of the model's distribution to the characteristics of real data distribution.
- Babaduras
  
  0
  
  What do you mean by "make the model's probability distribution pmodel (x;θ) align as closely as possible with the true data distribution pdata (x)"? pmodel is a family, but pdata is a single pd.
Babaduras

0

@Kav10 I mean that pdata is some pdf from pmodel?
- Kav10
  
  0
  
  No, they're not the same thing. See above comments.
- Babaduras
  
  0
  
  didn't say that they are the same, I said that pdata is one particular case from pmodel family with some particular parameters, isn't it?
- Kav10
  
  0
  
  pdata (x) is not a specific case of pmodel (x;θ), but rather the actual, true distribution that generated the observed data. It is not a member of the parametric family.
- Kav10
  
  0
  
  pdata (x) is the distribution that generated your real-world data, pmodel (x;θ) is a distribution you've chosen (from a family of distributions) in an attempt to approximate pdata (x). The goal of MLE is to find the θ that makes pmodel (x;θ) most closely resemble pdata (x), so making your model's generated data as similar as possible to the real data.

$Maximum Likelihood Estimation$

Answer

Related Questions

Search