Абай атындағы ҚазҰпу-нің хабаршысы, «Физика-математика ғылымдары» сериясы, №3 (7 9 ), 2022 150 мрнти

Preparation and Development of a System Indicators

жүктеу/скачать 1,21 Mb.

Pdf көрінісі

бет	5/7
Дата	30.12.2023
өлшемі	1,21 Mb.
	#199874

1 2 3 4 5 6 7

Байланысты:
вестник КазНПУ 2

3 Preparation and Development of a System Indicators
The data preparation process begins with data collection, commonly referred to as an ETL (extract-
transform-load) move. Data integration brings all kinds of informants together using data joining and grouping.
As a rule, this requires the manipulation of relational tables with the implementation of several rules of unity,
such as entity unity, referential unity, and domain unity [9]. Applying one-to-one, one-to-many, or many-to-
many cases, the data is aggregated to an important analysis value, resulting in the original signature of the
buyer. The process of preparing data for filling in the scorecard is shown in Figure 2.

Figure 2. Data preparation process

Before deciding how to cultivate missing meanings, we need to understand the basis of missing data and
understand the distribution of missing data so that we can systematize them as:
- Completely absent by accident (MCAR);
- Missing by accident (MAR);
- Missing is not accidental (MNAR).
Handling missing data is often associated with MCAR and MAR, during which time it is more difficult to
work with MNAR.

Абай атындағы ҚазҰПУ-нің ХАБАРШЫСЫ, «Физика-математика ғылымдары» сериясы, №
3
(7
9
), 2022

154
The presence of outliers has the potential to fail the statistical assumptions on which we intend to build the
model. Subsequently identifying is fundamentally to understand the background of the outliers before using
any kind of healing. For example, outliers have every chance of being a valuable source of information when
fraud is detected; as a result, it would be a bad idea to change them with the mean or median meaning.
Data mining and data cleansing are considered mutually cyclical steps Data mining includes both
univariates, eg, and bivariate testing and ranges from univariate statistics and frequency spreads to correlations,
crosstabs, and data analysis. A univariate exploratory data test is shown in Figure 3.

Figure 3. EDA (one-dimensional view)
Subsequently, exploratory data analysis (EDA) data is processed to increase properties [10]. Data cleansing
requires good business conduct and data awareness so that the data can be correctly interpreted. It is an iterative
process designed to eliminate violations and replace, reconfigure, or remove these violations as needed. The 2
main difficulties with dirty data are missing meanings and outliers; both have every chance of strongly
influencing the accuracy of the model, because of which prudent intervention is needed.

жүктеу/скачать 1,21 Mb.

Достарыңызбен бөлісу:

1 2 3 4 5 6 7