Importance of Confusion Matrix in Cyber Crime Cases

Let’s talk about about the Confusion matrix and how it helps in the Cyber Security world where Machine Learning is used greatly.
First let us understand what is Confusion Matrix.
Confusion Matrix
It is a table that specifies the performance measurement for machine learning classification problem where output can be two or more classes. It is a table with 4 different combinations of predicted and actual values.
To measure the effectiveness and accuracy of the trained model confusion matrix comes into play.

Let us understand the terms TP, FP, FN, TN with an example here.
True positive-
These are the cases in which model has predicted yes and is true in reality too.
Eg. here you predicted a woman is pregnant and she is pregnant its true.
True Negative-
Model has predicted ‘no’ and in reality and its true.
Eg. Model predicted the man is not pregnant and in reality he is not.
False positive-
It is also called as the type 1 error. Here model has predicted ‘yes’ and in reality its not true.
Eg. ML model predicted the man is pregnant when in reality he isn’t.
False negative-
It is also called as Type 2 error. Model has predicted ‘no’ but in reality it is false.
Eg. ML model predicted the woman is not pregnant but that’s not true because she is pregnant.
Confusion Matrix is extremely useful for measuring Recall, Precision, Specificity, Accuracy.
Type 1 error ie. false positive is very dangerous error in many circumstances because they cause us to conclude that a finding exists when in fact it does not. It means that it is worse to conclude that we found an effect that does not exist, than miss an effect that does exist.
Cybersecurity
Cybersecurity is a way in which systems, networks utilize technologies and processes to protect against digital attacks. Cyber criminals often attack sensitive information and data and when they gain access to it , cyber attackers take down entire site or extort money from users.
Cyber crime can be like-
- For stealing organizational data
- Steal bank card details
- Hack emails for gaining information
- Stealing of personal data
Hence cybersecurity is very crucial to any business and protecting data against cyber attacks is very important but also challenging.
Examples where confusion matrix plays a role in cyber crime threats-
If a company has a server or website which contains important data and we predict using our ML model if the hack on server will happen or not.
Here attack to be happen is a negative reaction and attack to not happen is a positive reaction.
While testing the model it predicts 100 times a negative reaction and other 200 times a positive reaction.
- When it predicts the hack will happen 100 times ie. negative reaction but actual data shows 20 predictions out of it are wrong(False negative)
- When it predicts hack will not happen 200 times ie. a positive reaction but actual data tells out of it 150 times only it was correct and had right prediction but the other 50 times model gives a wrong prediction(False positive)
Here, due to general nature of human, we tend to skip the times it gives False prediction and then it results in hacking of the system that we weren’t aware of it in the first place. Hence Type 1 error ie. false positive is very dangerous in cybersecurity purposes and confusion matrix helps to see and solve it.
Another example where confusion matrix comes into play for cybersecurity is-
Here, suspecting a virus is negative reaction and not having virus is a positive reaction.
Let’s say an anti virus company came with an AI based anti virus that detects all the suspecting files. This model is giving 97 percent accuracy. Let’s say the model is working on your PC and you are there working on the next big thing. You just created an executable script which is very crucial for you but the anti virus being an AI model gave a False Negative that your file is a virus.
But on the other hand let’s say that you downloaded a few music videos that might have contained some malicious package but your model was unable to detect it and gave a False Positive.
The tolerance for false positive and false negative also depends on use case to use case.
One of the other cybersecurity threats is Malware Analysis.
Malware is a type of software which gains access to computer and damages it. It contains a piece of code which enter any computer and attacks any device or a server. Machine learning is widely used mechanism for malware detection which select features to make data for analysis.
In this, algorithms used are Random Forest, naive bayes, and support vector machines which give out accuracy and there are metrics which calculate the effectiveness of it. We use confusion matrix to calculate the mean by assigning weight of each of its elements which contributes to the final result based on how much importance it carries.
Thankyou for reading. :)