The input of the algorithm is the network characteristic data of a certain host device, such as traffic, the number of SMB read requests, and the number of DNS requests. The host’s behavior data consists of two parts:
- Original network characteristic data;
- Calculated data such as square root, logarithm, SVM support vector machine output, deviation factor, etc
For historical data and real-time monitored data, there are mainly three types of algorithms for calculation:
- Normal behavior modeling: use statistical probability distribution and extreme value distribution models.
- New feature data anomaly detection: using Bayes’ theorem.
- Calculation of the degree of abnormality: measured using Bayes factor.
Modeling Normal Behavior
The behavior data is a vector. If M represents a data collection point, m represents any possible collection data, and m (1), m (2), m (3), …, m (N) represents N. Collect historical data. Note that these historical data are arranged in ascending order. Then the normal behavior model consists of two parts:
- When m <u, Empirical curve fitting;
- When m> u, use extreme value distribution algorithm for fitting:
The specific algorithm is to try the value of u one by one until the sum of the integrals of the two curves is 1.
With this normal behavior probability model P (m), for any collected data m, its side tail probability P (M> m) has meaning:
- P (M> m) is a measure of the anomaly of m data. The smaller the value, the greater the possibility of anomaly.
Abnormal behavior detection
Abnormal behavior detection is divided into two dimensions:
- Anomalies at a single collection point;
- Anomalies in the entire data vector;
Anomaly detection of single data
The normal and abnormal probabilities of a single data according to Bayes’ theorem are:
Among them, π (N) and π (A) are the prior probability of normal state and abnormal state in the network where the host computer is located. This value can be set directly by security personnel and then adjusted based on experience.
The anomaly probability of a single data cannot be used to calculate the anomaly probability of the entire data vector, because the anomaly of the entire host is a combination of multiple single data anomalies; therefore, the probability of a single data normal is used to calculate the anomaly of the entire host.
Anomaly detection of the entire data vector
Calculation idea: Excluding all data is normal, and the rest are abnormal. Therefore, the abnormal probability of the entire host is:
This algorithm has an assumption: each collection point is independent and irrelevant, but the actual situation is that many collection point data are related, so the correlation matrix of historical collection data needs to be calculated. This process is more complicated. The general idea of removing correlation is:
- Calculate the probability density function of each data collection point
- Calculate the transition matrix of historical data based on the probability density function
- Calculate the correlation matrix through the transition matrix
- Calculate a probability threshold through the correlation matrix
- For each data vector Discard the data points whose probability exceeds this threshold to get a new number of data vectors n ‘, n’ <N
- According to the probability density function, a transition vector of n ′ data is constructed, and a corresponding new probability distribution function is constructed.
As a result, there were N input data. After removing the correlation, it was transformed into n ‘transfer vector data and probability distribution. As a result, the data was reduced in dimension. Then the anomaly probability of removing the correlation is:
After Darktrace detects the abnormal probability, it needs to present an understandable alert to the administrator. The size of the probability is on the one hand, and the other is to understand the severity of the event.
Through Bayes’ theorem, the Bayesian factor in the formula of abnormal event probability can be used as a measure of severity. Another factor is the normal probability π (N) and abnormal probability π (A) of the event in the network environment. In order to get rid of the dependence on these two factors, you can find a certain reference event, calculate the abnormal severity values of these two events, and subtract them to get a relative severity value:
In actual use, you need to set a threshold to determine which severity events need to be processed preferentially. According to experience, the following functions are used to adjust the severity, which is convenient to adjust to a certain range in actual use, such as [0, 100].
Then the severity value can be adjusted to:
Where αß is the adjustment parameter.