3-Statistics-Correlation

correlation in statistics

Two population-sample measurements can have relation {correlation, statistics}| {regression coefficient}. Sample members are measurement pairs. Both measurement values can increase {positive correlation}, decrease {negative correlation}, or have no correlation.

correlation coefficient

For paired measurements, such as individual weight and height, sum of products of z scores, divided by number of individuals, makes a number {correlation coefficient}| between -1 and +1. Correlation coefficient R measures relations between two variables: R^2 = SSR/SST = 1 - SSE/SST, where SSR is residual sum of squares, SST is term sum of squares, and SSE is error sum of squares.

z scores

Sum from i = 1 to i = N of z1(i) * z2(i) / N, where z is z score and N is population size.

means

Sum from i = 1 to i = N of (n1(i) * n2(i) / N - x1 * x2) / (s1 * s2), where n is value, x is mean, s is standard deviation, and N is population size.

values

(N * (sum from i = 1 to i = N of n1(i) * n2(i)) - (sum from i = 1 to i = N of n1(i)) * (sum from i = 1 to i = N of n2(i))) / ((N * (sum from i = 1 to i = N of (n1(i))^2) - (sum from i = 1 to i = N of n1(i)) * (sum from i = 1 to i = N of n1(i)))^0.5) * ((N * (sum from i = 1 to i = N of (n2(i))^2) - (sum from i = 1 to i = N of n2(i)) * (sum from i = 1 to i = N of n2(i)))^0.5).

correlation coefficient test

If population distribution is not normal, testing correlation coefficient {correlation coefficient test} can show if two event sets relate. Hypothesize that correlation coefficient is zero. Choose significance level. Degrees of freedom are sample size minus two for the two events. Convert correlation coefficient to t value: t = r * (N - 2)^0.5 / (1 - r^2)^0.5, where r is correlation coefficient and N is number of individuals. If t value is less than t-distribution value at that significance level and degrees of freedom, do not reject hypothesis.

3-Statistics-Correlation-Regression

regression line

Lines {regression line}| {regression curve} closest to all points can go through correlation graphs.

linear regression

Regression curve can be straight line {linear regression}|.

best fit

Regression lines pass as close as possible {best fit}| to all points and so minimize sum of distances from points to regression line. Lines can pass closest to points if sum of squares of differences from line to points is minimum. Best-fit lines must pass through all property means: x2 = m * x1 + b, where x1 and x2 are means, m is slope, and b is y-intercept. m = (N * (sum from i = 1 to i = N of n1(i) * n2(i)) - (sum from i = 1 to i = N of n1(i)) * (sum from i = 1 to i = N of n2(i))) / (N * (sum from i = 1 to i = N of n1(i)^2) - (sum from i = 1 to i = N of n1(i)^2)). b = x2 - m * x1.

prediction

Regression curves can predict {prediction} second-variable amount from first-variable amount. Second-variable value y equals regression-curve slope m times first variable x plus intercept b: y = m*x + b.

Drawings

Technical Information

Date Modified: 2022.0225

3-Statistics-Correlation

correlation in statistics

correlation coefficient

correlation coefficient test

3-Statistics-Correlation-Regression

regression line

linear regression

best fit

prediction

Related Topics in Table of Contents

Drawings

Contents and Indexes of Topics, Names, and Works

Searching

Database Information, Disclaimer, Privacy Statement, and Rights

References and Bibliography

Technical Information