Benford's law describes the frequency of the initial digits in datasets of numbers where the numbers span several orders of magnitude. It should be visible if you plot country populations or land areas, it can also show up in accounting. In the past Greece's official economic data has shown the greatest divergence from what Benford's law predicts, see here, at least within the EU.
I tried this myself, creating a dataset with country populations from 2017. Then used the following code to plot the data:
plot value counts of initial digits for country populations
The number 1 is around 30% as expected but some numbers are out of sequence, 7 at the end - should be 9, 3 and 4 are also out of order. But the overall shape is roughly as expected.
You can get population data from places like the U.N.
According to the font of all truthful and accurate knowledge - Wikipedia - "Bayesian probability is an interpretation of the concept of probability, in which, instead of frequency or propensity of some phenomenon, probability is interpreted as reasonable expectation representing a state of knowledge or as quantification of a personal belief."
So what does that mean, the YouTube video below gives a good visual explanation.
Here is another explanation:
According to Wikipedia dizygotic (fraternal) twins usually occur when two fertilized eggs are implanted in the uterus wall at the same time while monozygotic (identical) twins occur when a single egg is fertilized to form one zygote (hence, "monozygotic") which then divides into two separate embryos.
Fraternal twins can be mm, mf, fm or ff (where m = male and f = female), identical twins can only be mm, or ff.
For the sake of this example let's say the probability of each option is equal, so P(mm) = P(mf) = P(fm) = P(ff) = 0.25 for Fraternal twins and P(mm) = P(ff) = 0.5 for identical twins. The probability that twins are identical is P(I) = 0.1 so P(F) = 0.9 (probability of Fraternal), assuming twins must be either identical or fraternal (not strictly true but let's not make things too complicated).
If we have two brothers who are twins what is the probability that they are identical twins?
The non-Baysean answer might be 0.1 or 10% because I said above that 10% of twins are identical. However the Bayesian approach gives a different answer:
The probability of identical twins given that both twins are brothers written as P(I|B) = P(B|I)P(I)/P(B)
and since we are assuming twins must be either identical or fraternal then: P(B) = P(B|I)P(I) + P(B|F)P(F)
substituting this into the above gives: P(I|B) = P(B|I)P(I)/P(B|I)P(I) + P(B|F)P(F)
then putting in the numbers gives (0.5 x 0.1)/((0.5 x 0.1) + (0.25 x 0.9)) = 2/11 (about 18.2%) - so the knowledge that both twins are male makes the probability they are identical higher.
The video below makes a good point about the advantages and at least one disadvantage of a Bayesian approach.
The term regression originated with a 19th century English guy. His name was Galton and he loved to measure stuff, for example he measured the height of people who had tall parents and found that their average height was less than the parents' average height. He called this regression to the mean. The name 'regression' stuck.
In machine learning you'll come across two common algorithms that include the term regression in their title.
Linear Regression Example
Linear regression is all about predicting numerical values, for example the number of customers in a restaurant on a given day, the price of some commodity or in the example below, the maximum temperature for a given minimum temperature. Using a dataset of weather observations recorded during the second world war we can use some linear regression to build a predictive model. The dataset contains min and max temperatures for each day, we can plot this:
There is some scatter but the plot is quite linear. So this seems to be a good case for using linear regression. Linear regression has the following relationship between the input x and the output y:
y = mx + b, m is the gradient of the line and b is the intercept
Linear regression is all about predicting a numerical value. Logistic regression however is about predicting which class something belongs to. In the example below I use a list of Titanic passengers to classify which passengers survived and which died. The code uses two thirds of the rows as training data then attempts to predict the Survived column value for the remaining one third.
Linear regression is used to predict numerical values, it can be extended to include non-linear regression for example see here. While logistic regression is used in classification problems, real world examples could include classifying customers into categories, classifying network activity into benign or suspicious activity ...