# Chapter 9 Risk Probability Occurrence Model – Artificial Intelligence for Risk Management

CHAPTER 9

Risk Probability
Occurrence Model

Determining the probability of occurrences based on expert knowledge and derived from historical occurrences.

Define Goal

Define the goal of the model for risk probability impact.

Evaluation Steps

Evaluate Measure or Key Performance Indicator

From the given measure:

Input:

• Output of the step identified risk or threat.
• Output of the step identified risk category.
• Output of the step predicted risk impact score.

Output:

• Risk occurrence probability (0.0 to 1.0).

Input:

• Feed the list of risk measures identified in the step identified risk or threat.

Output:

• List of respective risk probability.

Identify the list of measures > Identify the risks list > Identify the list of risk occurrence probabilities

Evaluate the Given Dataset

From the given data [retail industry such as point of sale (POS) data].

Input:

• Feed the list of risk measures identified in the step identified risk or threat.

Output:

• List of respective risk probability

Identify the business process > Identify the list of measures > Identify the risks list > Identify the list of risk occurrence probabilities

Evaluate Project-Related Documents

From the given project-related documents:

Collect risk probability data from historical occurrences and from an expert.

By using expert knowledge, capture the system of the annotation tool and historical dataset.

Input:

• Management plan
• Risk supporting documents
• Risk policy and standards
• Organization rules, regulations, and policy

Design Algorithm

Design an algorithm to predict risk probability occurrence.

Identify Features

Identify features from the previous inputs

From the POS transaction dataset:

• Sales transaction amounts, store name, store location, transaction dates (measures and attributes in the POS transaction dataset).

From unstructured documents:

• Extract using NLP

From retail industry statistics:

• Retail industry average sales transaction amount by category, store location, and month.

From the organization rules document:

• Minimum monthly sales decrease percent to be risked.

From the risk standards document:

• Three-month revenue decrease

From the risk supporting document:

• Past sales transaction amounts, risk flags, impact, and action taken.

From previous steps:

• Output from the step identified risk or threat.
• Output from the step identified risk category.
• Output from the step predicted risk-impact score.

Output:

• What is the probability of risk occurrence?
• Risk occurrence probability. The probability of occurrence is based on a range of numbers, typically 0.0 to 1.0 with 0 indicating no occurrence and 1.0 indicating 100 percent predicted events. This is predicting the probabilistic model.

Machine learning (ML)/artificial intelligence (AI) use case:

The ML/AI use case is a regression problem using supervised training. Now we dive deeply into the regression steps.

Identify the List of Regression Models

Same as the step predicting the risk-impact score model

Data Preparation

See Table 9.1 for Sample data risk occurrence probability.

Table 9.1 Sample data risk occurrence probability

 Measure Risk Name Risk Category Change Percentage Risk-Impact Score Risk Occurrence Probability Sales amount decrease Financial/competitive risk −10% 10 0.20 Sales amount increase Inventory risk 30% 5 0.46

See Table 9.2 for data all the features.

Table 9.2 Sample data with all features

 Risk name (from step 1) Sales amount decrease Risk category Financial/competitive risk Risk-impact score 10 Sales 10M Store store1 Location Atlanta Date 43103 Industry average sales prev month 30M industry average sales prev month – 1 40M Minimum monthly sales decrease percent 0.15 Three months revenue decrease 0.2 Past sales transaction amount 5M Past risk flag Y Impact High Action taken Store Closed Risk occurrence probability 0.2

Data Preprocessing

• When a category variable contains many categories, grouping them into smaller numbers of categories would help achieve better training and accuracy.
• Balance imbalanced classes because most ML algorithms work best when the number of samples in each class is about equal.
• When there are many features, keep only the features that are good predictors of the output.
• Remove outliers, if possible, because some algorithms are very sensitive to outliers.
• Use oversampling techniques like synthetic minority oversampling technique algorithms to increase the training samples.

Predict variables (desired target):

• Has the risk occurred?
• Value 1 or 0 (1 means “Yes,” 0 means “No”).
• Derive this from the risk occurrence probability. If the probability is above 0.80, it means 1 else 0.
• This process is based on the company’s policy and the application of human intelligence.

Train the Model

Use the identified datasets to train the model with the annotated dataset.

Test the Model

Predict the test set results and calculate accuracy.

Test the accuracy of the logistic regression classifier on the test set: 0.83.

Test the model using the test dataset and expert knowledge.

Evaluate the Model

Evaluate the model using accuracy and mean square error and determine the learning rate.

Confusion Matrix

 Precision Recall F1-score Support 0 0.71 0.80 0.75 7666 1 0.77 0.67 0.72 7675 Avg/total 0.74 0.74 0.74 15341

With the entire test set, 83 percent of the time risk occurred.

Model Conclusion

The logistic regression classifier performed better than other algorithms for the risk probability occurrence model.

Publish/Produce the Model

Same as the predicting risk-impact score model.

Conclusion

Same as the predicting risk-impact score model.