Cybersecurity still a battle among a group of humans with firewalls, access controls, dynamic passwords, and more mechanism. Unfortunately, the methodologies going towards inadequate as the battlefield going towards human versus machine. With the increasing number of devices, security breaches are becoming more frequent. Smartphones, Cloud, IoT, BYOD are all newer endpoints which themselves can become vectors for the attacks. Current technical progress and newer trend of attacks are definitely alarming when we will consider that the future attacks will be machine engineered. Such malware will inherit capabilities of well-designed software for data sciences. It will be easier to infiltrate, collect data, transmit data and still remain undetected for a longer time. Our current approaches to the analysis of data breaches and phishing activities definitely slower work to prevent smarter cyber terrorism. Our current threat-detection systems take time to react to the unanticipated or new type of threats.
|Table of Contents|
AI and Machine Learning in the Field of Cybersecurity
Cybersecurity is a set. It consists of various technologies. Cybersecurity systems are composed of network security systems and computer security systems. Each of these has, firewall, antivirus, an intrusion detection system (IDS). Intrusion detection systems help to discover, determine, and identify unauthorized use, duplication, alteration, and destruction of information systems. There are mainly three types of cyber analytics in IDSs – misuse-based, anomaly-based, and hybrid.
It is normal that with time, newer approaches will be also thought of, which not always foolproof but gives birth to more thoughts. We need to continuously monitor a large number of parameters which may point towards detecting abnormal activity. That way of thinking gives birth to trying to apply self-learning techniques to detect malicious activities. AI, machine learning algorithms essentially can recognize security breaches or attacks using the described basic mechanism to automate.
It can reboot, restart services or shut down systems under perceived threat to meet current demand in a stable manner reducing risks to the entire business. Unfortunately, machine learning does not have the knowledge of human to identify real threats giving birth to false positives and undesired shut off the down of systems. For these reasons, with today’s technical progress, we think of a hybrid human-machine collaborative approach to fix the potential upcoming machine made cybersecurity threats. Of course, neural networks can learn from data, access texts written by cybersecurity experts. This kind of approach may make the machines smarter to autonomously take corrective actions even in absence of a human.
Under the Hood
Machine learning and data mining have difficulties for usage in the cyber domain. Those odds are related to how often the model needs to be retrained and also the availability of labelled data.
In most other use cases of machine learning and data mining applications, a model is trained and then used for a long time. In those use cases, the processes are quasi-stationary and retraining of the model does not happen often. But the case in cyber intrusion detection is different. Models are trained daily, when the analyst needs, each time a new intrusion is identified. When the models are needed to be trained daily, then their training time becomes important. Further research on this area to investigate data mining of fast incremental learning becoming mandatory. These factors contribute to higher cost to apply AI, ML in cybersecurity.
Machine learning is tolerant of human overlook while analyzing large data streams. Our intention is to detect anomalies, find malicious behaviour or entities. Machine learning may bring higher capabilities to revolutionize cybersecurity by :
In terms of the methodology of application, we can :
- Apply supervised learning to historical to improve prediction capabilities
- Apply unsupervised learning to get meaning out of data
It is common to have confusion about the terms of machine learning, deep learning, data mining, and knowledge discovery in databases (KDD). Knowledge discovery in the database is the process which extracts useful, previously unknown information from data. A machine learning approach consists of two phases – training and testing and the following steps are usually performed:
- Identify class attributes and classes from training data
- Identify a subset of the attributes necessary for classification
- Learn the model using training data
- Use the trained model to classify the unknown data
The CRISP-DM model is composed of the following six phases:
- Business understanding: Defining the DM problem
- Data understanding: Data collection and testing.
- Data preparation: Data preparation to reach the final set.
- Modelling: Applying data mining and machine learning methods to fit the best model.
- Evaluation: To verify business goals.
Different machine learning, data mining methods are used in the field of cyber security such as artificial neural network, a Bayesian network, clustering, decision trees, Hidden Markov Models etc.
How AI and Machine Learning Working in the Field of Cybersecurity
There are some existing examples of such kind of human-machine platforms. Each of them differs by technologies and not all yet gained enough maturity to test and deploy beyond test environments. AI2 is a platform from MIT’s Computer Science and Artificial Intelligence Lab. It is a security platform which utilizes machine learning plus inputs from human experts. For the machine learning part, algorithms are trained to predict the potential threats. While human experts are for handling the judgmental tasks for validation, classification of threats, adding severity tags.
IBM’s Watson has the NLP capabilities to analyze an almost unlimited number of documents, server logs, and also read documents to identify, classify and present a complete picture of the possible threat. Today’s cybersecurity attacks definitely can conceal their presence within the systems. So it is not only a difficult task for the analysts to point out the attacks correctly within a limited time. Watson generates real-time reports of the threats to speed up issue identification and subsequent resolution.
Many experts worry that promising vendors are not paying enough attention to the risks. Risks are obvious with the usage of heavy technologies. The amount of data required to made locally available is enormous. Also, many systems such as MySQL avoids all kind of logging to increase performance.
The truth is, both the vendors and the experts are correct. The cyberterrorists run exploits thinking from an angle which will collapse most of the systems! AI and ML definitely working great for the cybersecurity but at present, unfortunately, is too costly to implement in the systems where the general public is the consumer.
IBM recognized the hurdles. Watson for Cyber Security, IBM Security Connect are specialized products for cybersecurity and implements an open standard for higher compatibility with different products. Their product line for the general audience have various official use cases, which are lesser intense :
- Security intelligence: IBM QRadar Advisor with Watson
- Intelligent orchestration: IBM Resilient Incidence Report Platform
- Unified Mobile and Endpoint Management: IBM MAS360 Advisor with Watson
- Application security: Application Security on the Cloud with Watson
IBM Resilient Incidence Report Platform purely for the IT related industries. With declining pricing, it may become quite affordable to the developers. For obvious reasons, their case studies on the official website are around larger enterprises. However, some of their above-listed services are affordable even for an individual. IBM MAS360 Advisor with Watson costs $10/month/device for the top end plan.
Example Approach for Implementation
IBM Resilient Incidence Report Platform and Application Security on the Cloud are two commonly required tools.
IBM themselves has snippets, sample apps, and plugins to support integrating IBM Application Security on Cloud in the landscape of DevOps automation. Jenkins, GitHub, Docker often used tools and on this GitHub repo, the example plugins and documentation provided. It is also ready to use solution for Travis CI. AppScan, not a really new product. AppScan and IBM Application Security on Cloud essentially scan and fix security vulnerabilities.
AppScan Issue Management Gateway service synchronizes issues between IBM Application Security on Cloud and other issue management systems such as JIRA. This system can be a part of an automated scanning workflow. This the GitHub repo of the AppScan Issue Management Gateway Service. Here is a generic Python Resilient Provider. Resilient Python SDK has easy to install two library modules – resilient and resilient circuits. They can be installed by usual Python pip command :
pip install resilient-circuits
pip install resilient
Here is detailed documentation of the above.
AppScan is a handy tool for the webmasters too – it can scan the running website and web applications too. AppScan has historical security vulnerability warnings for WordPress vulnerabilities. The biggest advantage is that it was ready to use bash script for use on the server. For AppScan, only IBM ID (better a confirmed account with credit card) required to sign up from their portal for 30 days trial.
Rest is reading this general purpose documentation for the required bash script and this documentation for web URL scan. AppScan supports various languages including PHP 7.0. A list can be found here.
Developmental Challenges and To-Dos
As for Most of the solutions from IBM has enough tools and documentation for the development and testing phase. However, despite PHP support and wider usage of WordPress and MySQL, there is a lack of easy to use WordPress plugin. Regular command line users can easily use bash scripts or Python tools but it will be practical for the community to develop some WordPress plugin to integrate the vulnerability checking services into WordPress like popular application platform.
AI does not mean the end of the human role in cybersecurity but a coexistence. Laymen may misinterpret the phrase AI depend on automation to inviting disaster. It is indeed very important to teach the managers and CIOs the real role of the tools for their enterprise.
Anyone will agree that at the moment AI in cybersecurity is a specialized topic. The problem of progress in data sciences, AI is that the attackers probably already all the tools to power their attacks and really if we refrain to deploy AI-powered defences we simply can not grow this segment from real life feedback. Unless the larger brands, governmental agencies and international events test, deploy and provide feedback, the progress of technical development to make it cheaper for the general public will be difficult to reach the goal. Till the mainstream media understand the reality, it is difficult for the engineers to convince reach the governmental agencies. It is fact that there are many research paper publications, many discussion on the technical sites but to reach the mass users mainstream media plays a big role. Most of the current mass media journalists, due to a gap of technical knowledge fail to realize the fact that terrorists using AI tools are our global enemies. Power stations, banking to public air transport are completely dependent on computers and in-between networks. Hacking of autopiloted aircraft no longer sci-fi movie today.
Artificial Intelligence and Cybersecurity for Dummies (PDF distributed by IBM)