Two population-sample measurements can have relation {correlation, statistics}| {regression coefficient}. Sample members are measurement pairs. Both measurement values can increase {positive correlation}, decrease {negative correlation}, or have no correlation.
For paired measurements, such as individual weight and height, sum of products of z scores, divided by number of individuals, makes a number {correlation coefficient}| between -1 and +1. Correlation coefficient R measures relations between two variables: R^2 = SSR/SST = 1 - SSE/SST, where SSR is residual sum of squares, SST is term sum of squares, and SSE is error sum of squares.
z scores
Sum from i = 1 to i = N of z1(i) * z2(i) / N, where z is z score and N is population size.
means
Sum from i = 1 to i = N of (n1(i) * n2(i) / N - x1 * x2) / (s1 * s2), where n is value, x is mean, s is standard deviation, and N is population size.
values
(N * (sum from i = 1 to i = N of n1(i) * n2(i)) - (sum from i = 1 to i = N of n1(i)) * (sum from i = 1 to i = N of n2(i))) / ((N * (sum from i = 1 to i = N of (n1(i))^2) - (sum from i = 1 to i = N of n1(i)) * (sum from i = 1 to i = N of n1(i)))^0.5) * ((N * (sum from i = 1 to i = N of (n2(i))^2) - (sum from i = 1 to i = N of n2(i)) * (sum from i = 1 to i = N of n2(i)))^0.5).
If population distribution is not normal, testing correlation coefficient {correlation coefficient test} can show if two event sets relate. Hypothesize that correlation coefficient is zero. Choose significance level. Degrees of freedom are sample size minus two for the two events. Convert correlation coefficient to t value: t = r * (N - 2)^0.5 / (1 - r^2)^0.5, where r is correlation coefficient and N is number of individuals. If t value is less than t-distribution value at that significance level and degrees of freedom, do not reject hypothesis.
Lines {regression line}| {regression curve} closest to all points can go through correlation graphs.
Regression curve can be straight line {linear regression}|.
Regression lines pass as close as possible {best fit}| to all points and so minimize sum of distances from points to regression line. Lines can pass closest to points if sum of squares of differences from line to points is minimum. Best-fit lines must pass through all property means: x2 = m * x1 + b, where x1 and x2 are means, m is slope, and b is y-intercept. m = (N * (sum from i = 1 to i = N of n1(i) * n2(i)) - (sum from i = 1 to i = N of n1(i)) * (sum from i = 1 to i = N of n2(i))) / (N * (sum from i = 1 to i = N of n1(i)^2) - (sum from i = 1 to i = N of n1(i)^2)). b = x2 - m * x1.
Regression curves can predict {prediction} second-variable amount from first-variable amount. Second-variable value y equals regression-curve slope m times first variable x plus intercept b: y = m*x + b.
Outline of Knowledge Database Home Page
Description of Outline of Knowledge Database
Date Modified: 2022.0225