The descriptive statistic aims to display and organize empirical data using tables, key figures (and also measures) and graphs. This is especially useful for large amounts of data, as it is not easy to understand. The univariate analysis involves describing the distribution of a single variable, and its central tendency including the mean, median, mode and dispersion. In bivariate and multivariate analysis if
a sample consists of more than one variable, then descriptive statistics can be used to describe the relationship between pairs of variables.
In addition to descriptive statistics, the statistics also include exploratory data analysis (exploratory statistics) and mathematical statistics (inferential statistics). Exploratory statistics aims to find previously unknown structures and correlations in the data and thereby generate new hypotheses. These hypotheses, based on sample data, can then be examined for their general validity using probability-theoretical methods. Descriptive statistics differ from inductive or inferential statistics in that they do not make statements about a population that goes beyond the cases studied and does not allow for the verification of hypotheses. The descriptive statistic does not use stochastic models, so the statements made are not backed up by error probabilities.
The methods of descriptive statistics can, therefore, be applied to any type of sampling, while the methods of inductive statistics must be subject to several conditions, including sampling. The methods of exploratory statistics are usually identical to those of descriptive statistics; it is rather the aim of the analysis, which distinguishes both sub-areas.
Any description of a phenomenon requires observing or knowing certain things about this phenomenon. The available observations are always made up of synchronous observations. For example a temperature, pressure and density measurement at a given moment in a specific tank. These three synchronous variables can be observed several times (on several dates) in several locations (in several tanks). The available knowledge consists of formulas that link certain variables. For example, the law of perfect gases is PV=nRT. The statistical view of the description of a phenomenon comes from the view that the available observations are considered to be different manifestations of the same abstract phenomenon. To stay on the example of temperature, pressure and density measured in several moments, we will consider that each time we take these three measurements, we observe the same phenomenon. The measurements will not be the same; it is the distribution of these measures that we will describe statistically. In the medical field, for example, weight can be measured before and after taking a drug for several people. This results in a collection of data pairs (before and after weight) indexed by the person’s name. In sociology or marketing, we can measure the number of books read per year for many people, whose age and level of study are also known. Here too we get a collection of triplets of data, indexed by the name of the reader.
Methods of Descriptive Statistics
There are three main methods to present the data:
Tables display data in a matrix of rows and columns if the data structure allows. Typically, one row corresponds to an observation and a column of a variable of the data. The disadvantage of a table is that for even small data sets, the structure of the data is difficult to capture. Sometimes reordering columns or rows can help.
In diagrams and graphs, the data or certain aspects of the same are displayed graphically. However, this usually requires a summary of the data, so that information from the data is lost. For example, in a scatter plot of two variables, you can easily see the relationship between the data, but the number of observations with the same numeric values is lost.
In parameters (also measures or measures) an aspect of the data is reduced (aggregated) to a single number. To describe the data, a variety of different parameters are then calculated to compensate for the loss of information due to the strong summary.
Three types of parameters are of particular interest:
- Location measurement : as a central tendency of a frequency distribution. From the position of the different values for the central tendency to each other, skewness and excess of a frequency distribution can be determined.
- Dispersion measurement : for the variability (dispersion or dispersion) of a frequency distribution
- Measurement of context : for the correlation of two variables. The choice of suitable parameters depends on the scale or measurement level of the data and on the robustness of the characteristic.