# Statistics & Regression

Statistical analysis of observations arises almost in any case where a given dataset is investigated, in scientific research as well as industrial projects. A lot of procedures, techniques, mathematical models and computer codes have been proposed, however, they can be grouped in four simple steps: data preprocessing, variables distributions, pairwise correlations, and multivariate modelling. The data preprocessing -or cleaning- although a critical problem of any further step, is often skipped, influencing all the consequent calculations and conclusions. The variables distributions, aim to identify how the data are dispersed among intervals, their type (categorical, binary, continuous), the minimum and maximum values, as well as the existence and handling of outliers. Pairwise correlations and measures of effect size is also a vastly important step, as it can reveal strong patterns of association among the involved variables. Regression analysis is a statistical modelling technique used to investigate the correlations between a dependent variable and one or more independent variables. Regression analysis illuminates the effect of a change in each of the independent variables xi to the dependent variable Y, while the other independent variables remain constant. In linear regression, the requirement of the model produced is that the dependent variable Y is a linear combination of the independent variables. This is not always true, thus more complex models like nonlinear regression, artificial neural networks can be investigated depending on the assumption of the underlying mathematical model. However in these cases, the problem of overfitting arises, thus special care should be given to the validation of the model (test set) as well as the investigation of the regression residuals.

Descriptive statistics

• min, max, median, percentiles, variance
• Distributions, fitting and outliers
• Correlation coefficients, covariance, chi2 test
• Analysis of variance & Effect size
• Timeseries, smoothing, moving statistics, predictions

Regression Analysis

• Data preparation, Normalization, Outliers
• Coefficients, p-value, residuals, heteroscedasticity, bias
• Importance vs Significance
• Test sets, Ensemble models & Stepwise regression
• Logistic & Nonlinear Regression
• Conceptual Interpretation