USE OF CONFUSION MATRIX IN CYBERCRIME

Cybercrime is a criminal activity that either targets or uses a computer, a computer network, or a networked device.

Most, but not all, cybercrime is committed by cybercriminals or hackers who want to make money. Cybercrime is carried out by individuals or organizations. Some cybercriminals are organized, use advanced techniques, and are highly technically skilled. Others are novice hackers.

Rarely, cybercrime aims to damage computers for reasons other than profit. These could be political or personal.

There are a lot of cyberattacks that we usually see or hear about. Some of them are:

  • Email and internet fraud.
  • Identity fraud (where personal information is stolen and used).
  • Theft of financial or card payment data.
  • Theft and sale of corporate data.
  • Cyberextortion (demanding money to prevent a threatened attack).
  • Ransomware attacks (a type of cyber extortion).
  • Cryptojacking (where hackers mine cryptocurrency using resources they do not own).
  • Cyberespionage (where hackers access government or company data).
  • Malware. Malware is a term used to describe malicious software, including spyware, ransomware, viruses, and worms. …
  • Phishing. …
  • Man-in-the-middle attack. …
  • Denial-of-service attack. …
  • SQL injection. …
  • Zero-day exploit. …
  • DNS Tunneling.
  • Criminal activity that targets
  • Criminal activity that uses computers to commit other crimes.

In this particular blog, we will focus more on cyber attacks like fishing attacks or DOS attacks on some kind of server. In such a kind of attack, a hacker tries to crash the server for accessing data or for some other purpose.

For providing extra security to the servers generally, we have a Security team in Organization. But as we know it’s a world of Intelligent working so why don’t we use AI ( Artificial Intelligence) for doing the same task. Well, that’s nowadays possible and most of the organizations are using the same approach.

For protecting servers we use IDS ( Intrusion detection system ). IDS works on the principle of AI as it uses the ML platform for checking the requests coming to the server. It detects malicious requests and informs the organization, But we can’t relay 100% on such a system as it’s just a system for helping us in improving security but we should keep in mind that it’s a Machine Learning Model and none of the models can predict 100% accurately.

It might predict wrong but the prediction ( right or wrong ) may be of a different type. To understand the point we need to know about the Confusion Matrix. So let’s understand it first.

What is Confusion Matrix and why we need it?

When we get the data, after data cleaning, pre-processing, and wrangling the first step we do is to feed it to an outstanding model and of course, get output in probabilities. But hold on! How in the hell can we measure the effectiveness of our model. Better the effectiveness, better the performance, and that is exactly what we want. And it is where the Confusion matrix comes into the limelight. Confusion Matrix is a performance measurement for machine learning classification.

The confusion matrix visualizes the accuracy of a classifier by comparing the actual and predicted classes. The binary confusion matrix is composed of squares:

True positive (TP): Intrusions that are successfully detected by the IDS.

False-positive (FP): Normal/non-intrusive behavior that is wrongly classified as intrusive by the IDS. Also known as Type1 Error.

True Negative (TN): Normal/non-intrusive behavior that is successfully labeled as normal/non-intrusive by the IDS.

False Negative ( FN): Intrusions that are missed by the IDS, and classified as normal/non-intrusive. Also known as Type2 Error.

Need for Confusion Matrix in Machine learning:
1. It evaluates the performance of the classification models, when they make predictions on test data, and tells how good our classification model is.
2. It not only tells the error made by the classifiers but also the type of errors such as it is either type-I or type-II error.

For the Security Team in an Organization Type, 1 error is most dangerous as the IDS will inform wrong ( false ) that there isn’t any Attack, But it’s wrong so the security team has to work on such kind of IDS errors. In this way, the security team gets to know when and where to take action. The Confusion Matrix of IDS helps them to take corrective actions in time.

You can compute the accuracy-test from the confusion matrix:

Accuracy Formula

The most dangerous error is the False Positive [FP] error as the machine predicted false but it was not false it was true. For example, the machine predicted student fails but actually student was a pass.
This error causes problems in the cybersecurity world where the tools used are based on machine learning or ai, it may give a False Negative error that may cause dangerous impacts.

Example of Confusion Matrix:

Confusion Matrix is a useful machine learning method that allows you to measure Recall, Precision, Accuracy, and AUC-ROC curve. Below given is an example to know the terms True Positive, True Negative, False Negative, and True Negative.

True Positive: You projected positive and it turns out to be true. For example, you had predicted that France would win the world cup, and it won.

True Negative: When you predicted negative, and it’s true. You had predicted that England would not win and it lost.

False Positive: Your prediction is positive, and it is false. You had predicted that England would win, but it lost.

False Negative: Your prediction is negative, and the result is also false. You had predicted that France would not win, but it won.

You should remember that we describe predicted values as either True or False or Positive and Negative.

Two Types of Errors in Confusion Matrix

The first way is to re-write False Negative and False Positive. False Positive is a Type I error because False Positive = False True and that only has one F. False Negative is a Type II error because False Negative = False False so thus there are two F’s making it a Type II.

The second way is to consider the meanings of these words. False Positive contains one negative word (False) so it’s a Type I error. False Negative has two negative words (False + Negative) so it’s a Type II error.

This research presents new cyber attack detection and classification system to classify cyber-attacks. In this, we developed the performance of IDS using a parallel support vector machine for distributed cyber-attack detection and classification. The new PSVM is shown more efficient for the detection and classification of different types of cyber-attacks compared to SDF. The experimental results on the KDD99 benchmark dataset manifest that the proposed algorithm achieved a high detection rate on different types of network attacks.

Thank you for reading!!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store