Anomaly Models

Here we post our collection of anomaly models to be used with FLAME. We have three different classes of anomalies: Network scans, spam and denial of service attacks.

Network scans

We have captured and analyzed the characteristics of five different network scans: the Nachi scan, an SSH scan, an Radmin scan, a DCOM/RPC scan, and a Netbios scan. Network scans are by far the most commonly observed class of anomalous activity. They typically originate at a single source IP address and target many different destinations. While the time of large worm outbreaks is passed, we still observe occasionally scanning activity related to worms such as Nachi, Blaster, or Welchia. Network scans may use TCP, UDP, or ICMP as transport protocol. Scans are of importance to network administrators when they originate from internal hosts, since this might be a sign of an infection on the affected machine.

Nachi Scan
We found in our traces several instances of an ICMP scan that can be attributed to the Nachi worm released in 2003. Nachi uses fixed size 92-byte ICMP echo packets to scan for vulnerable hosts and is thus easy to recognize. The Nachi scan flow attributes have the following characteristics.

Transport protocol: is set to ICMP.
Source IP addresses: are set to the address of the scanning host.
Source port numbers: are set to ICMP code 0 type 8, which corresponds to echo-request messages.
Destination port numbers: are set to 0 or not set.
Flow sizes: are set to 1 packet and 92 bytes.
Flow durations: are set to 0 msecs, which is the default for flows that contain 1 packet.
Destination IP addresses: show a periodic pattern show in the Figure below. We observe an interesting periodical pattern that resembles a fishbone with two different cycles. Every 200 flows we observe a few positive and negative shifts by 400 IP addresses. Moreover, every 800 flows we observe periods with shifts of 45 to 110 IP addresses. All remaining flows have IP address differences in the range [-40:40].
Inter-arrival times: show a periodic behavior that has a stochastic component. The Figure below shows for different flows with index i their inter-arrival time ti measured relative to the previous flow. After five scans that arrive at the router with an inter-arrival time of 0 or 1 msec 3, no flow is observed for 61 msec until the next scan period starts. Furthermore, we do not see any flows in reply to the Nachi scans in our traces as the targeted network is probably filtering these packets.

SSH scan
Password-probing scans for SSH are quite common in todays networks. We have extracted two SSH scan instances from the SWITCH traces. Both instances show a very similar behavior.

Transport protocol: is set to TCP.
Source IP addresses: are set to the address of the scanning host.
Destination port numbers: are set to 22.
Destination IP addresses: show a very irregular pattern that contains many atomic scan periods of 30 to 200 flows that each cover a range of approximately 400 IP addresses. In the Figure below we plot the histogram of the difference in destination IP addresses between consecutive flows within such an atomic scan period. After each period a positive or negative shift by 200 to 400 IP addresses occurs.
Source port numbers: are selected randomly from the range [32,000:61,000].
Flow sizes: Each flow contains between 1 and 4 packets where approximately 80% of the flows have a size of 2 packets and 120 bytes 4.
Inter-arrival times: show a periodic behavior with two cycles and a stochastic component. The inter-arrival times for 8000 SSH scan flows are plotted in the Figure below. Every 800th flow has an inter-arrival time of 5 seconds, every 300th flow has an inter-arrival time between 10 and 50 msec, while all other flows have inter-arrival times of either 0 or 1 msec.

Radmin scan
We have observed one instance of a scan on destination port number 4899. This port is used by the Radmin remote administration application. A remotely exploitable vulnerability in the Radmin server version 2.0 and 2.1 that allows for code execution was reported in July 2004. The Radmin scan has the following characteristics.

Transport protocol: is set to TCP.
Source IP addresses: are set to address of scanning host.
Destination port numbers: are set to 4899.
Destination IP addresses: show a highly irregular pattern. The difference between consecutive IP addresses of the majority of flows varies in the range [-40:40] as shown in the histogram given in the Figure below. In addition we observe positive or negative shifts that exhibit no particular patterns. Source port numbers: also show a highly irregular pattern, but in addition they are limited to the interval [1,000:5,000]. The distribution of the difference between port numbers of consecutive flows is very similar to the distribution of IP address differences. Flow sizes: 89% of all scan flows contain 2 packets and 96 bytes, while 10% of the scan flows contain 1 packet and 48 bytes.
Flow durations: 2-packet flows have a duration of either 46*64 msec or 47*64 msec.
Inter-arrival times: show a periodical behavior with two cycles and a stochastic component. The timing behavior of the Radmin scan anomaly is illustrated in the Figure below that shows the inter-arrival times for 400 flows. Every 25th flow is received with a delay 25 to 30 msec, and every 4000th flow is received with a delay of 1 to 2 seconds. All remaining flows have an inter-arrival time of either 0 or 1 msec.

DCE-RPC scan
Destination port 135 is one of the top-scanned ports as various vulnerabilities have been reported in the RPC service running on this port. Also the famous Blaster worm used port 135 for propagation. DCE-RPC flows have the following characteristics.

Transport protocol: is set to TCP.
Source IP addresses: are set to the address of the scanning host.
Destination port numbers: are set to 135.
Flow sizes: are set to 3 packets and 144 bytes as port 135 is open on most machines.
Flow durations: are set to either 19*64, 20*64, or 21*64 msec.
Destination IP addresses: do not show any regular patterns. The difference between successively scanned IP addresses varies in the range [-256:256]. Their distribution is depicted in the Figure below. Note that this distribution differs from the ones we have previously encountered. Additionally, we find random shifts at irregular times.
Source port numbers: are irregular as well, but in addition they are limited to the range starting at 1,200 and ending at 4,800. The range of variation between port numbers of successive flows is approximately [-200:200]. Again, the distribution for source port differences resembles the distribution of IP address differences. Additionally, we have a periodic component that introduces a positive shift of 250 to 500 source port numbers after 300 to 600 received flows and has the effect that certain port ranges are skipped.
Inter-arrival times: show a periodical behavior. The timing behavior of the RPC scan is periodical and has a stochastic component. Every 256th flow has a delay of 2 seconds, while all other flows have an inter-arrival time of either 0 or 1 msec.

Netbios scan
We found two instances of scans for the netbios service that runs on UDP port 137 in our traces. Several vulnerabilities for the netbios service exist.

Transport protocol: is set to UDP.
Source IP addresses: are set to the address of the scanning host.
Destination port numbers: are set to 137.
Source port numbers: are set to a fixed value larger than 10,000.
Flow sizes: are set to 1 packet and 78 bytes.
Flow durations: are set to 0 msec.
Destination IP addresses: show a periodic behavior. The IP addresses for 100 to 200 flows are selected sequentially until a negative shift of 60 to 70 IP addresses occurs. The sequential target selection behavior within each scan interval of the Netbios scan is plotted in the Figure below. Most of the time the scanner simply increases the destination IP address by one. However, from time to time we observe a positive shift of 2, i.e., one IP address is skipped, followed by a negative shift of 1, i.e., the missed IP address is scanned, followed by a positive shift of 2, i.e., the normal scanning continues.
Inter-arrival times: show a periodic behavior. The timing behavior of the Netbios scan is plotted in the Figure below. Every 5th scan has a delay of 0 msec, while all other scans have an inter-arrival time between 60 and 70 msec. Hence, this Netbios scan is considerably slower than the previously analyzed scans.

Spam

We did not find any anomalies related to e-mail spam such as massive spam campaigns caused by botnets in the three analyzed weeks of data. Instead, we detected several instances of Windows Messenger pop-up spam. We call them variant A and variant B. Windows Messenger Popup spam targets UDP destination ports 1026 and 1027.

Popup Spam Variant A
Transport protocol: is set to UDP.
Source IP addresses: are set to the address of host that is sending the spam.
Destination port numbers: are set to 1026 or 1027.
Flow sizes: are set to 1 packet and 925 bytes.
Flow durations: are set to 0 msec.
Inter-arrival times: show a periodical behavior with two cycles and a stochastic component. Approximately every 200th flow has a delay of 64 msec, and every 550th flow has a delay of 250 msec. The remaining flows have an inter-arrival time of either 0 or 1 msec.
Destination IP addresses: show no regular patterns. The difference distribution of variant A is shown in the Figure below. IP address difference values vary in the range [-200:200].
Source port numbers: Variant A selects the source port sequentially from the range [32,000:61,000]. The difference between source ports of consecutive flows varies in the range [-1,000:1,000] and resembles the distribution of IP address differences. Additionally we observe a periodical component: After 550 flows a positive shift of 2,000 source port numbers occurs.

Popup Spam Variant B
We only report the attributes that differ from the popup-spam variant A anomaly in the following.

Destination IP addresses: Variant B selects destination IP addresses more or less randomly from blocks of 3000 IP addresses according to the difference distribution given in the Figure below. In this distribution the spikes at multiples of 256 IP addresses are interesting. However, no regular pattern involving multiples of 256 IP addresses is visible. After approximately 300 flows the next block of IP addresses is used.
Source port numbers: Variant B uses a different mechanism for sequential port selection. It randomly chooses a source port number to start with. After 550 flows have been sent with the same source port, it increases the port number by 1 to 4 ports.

Denial of Service

The third large group of anomalies that we have found are denial of service (DoS) attacks. DoS attacks have been extensively studied in previous work. Mirkovic et al. provide a taxonomy of DDoS attacks and defense mechanisms. We complement this work by providing a detailed analysis of the network behavior for different types of denial of service attacks such as UDP bandwidth flood or TCP SYN flood.

UDP Bandwidth Flood Variant A
We have found three instances of two one-to-one UDP bandwidth flood variants. Again, we call them variant A and variant B. In the following we report the characteristics for variant A.

Transport protocol: is set to UDP.
Source IP addresses: are set to the address of attacking host.
Destination IP addresses: are set to the address of the victim host.
Flow sizes: are set to 1 packet and 540 bytes for variant A.
Source port numbers: are selected uniformly from the range [x:x+19] where x is randomly chosen.
Destination port numbers: are selected sequentially between 20 and 1024. For each flow the destination port number is increased by 1 port every time a flow is sent.
Inter-arrival times: show a periodical behavior. The flow inter-arrival time distribution of variant A is depicted in the Figure below. Every 40th flow has a delay of 60 or 120 msec, while the remaining flows have shorter inter-arrival times of 0 or 1 msec.

UDP Bandwidth Flood Variant B
Again, we report only differences to variant A of this attack.

Flow sizes: are set to 1 packet and 1028 bytes for variant B.
Source port numbers: are selected randomly from the interval [1:6,000].
Destination port numbers: are selected from the range [1,000:5,000]. The distribution of port number differences between consecutive flows is shown in the Figure below. The positive and negative difference values of 200 to 300 port numbers stem from the fact that two processes with smaller port differences run in parallel.
Inter-arrival times: show a periodical behavior. Every 75th flow is received with a delay of 60 msec, while all other flows have inter-arrival times of 0 or 1 msec.

TCP Flood Variant A
We have observed two instances of one-to-one TCP floods on destination port 80. Both attacks target the same web server.

Transport protocol: is set to TCP.
Source IP addresses: are set to the address of the attacking host.
Destination IP addresses: are set to the address of the victim host.
Destination port numbers: are set to 80.
Flow sizes: are set to 3 packets and 128 bytes.
Flow durations: are set to either 11*64 or 12*64 msec.
Source port numbers: are selected from the interval [1,000:3,000] and the difference between consecutive flows shows the regular but rather complex pattern depicted in the Figure below.
Inter-arrival times: Every 10th flow of TCPFlood-A has a delay of either 60 or 120 msec, while all other flows are sent with an inter-arrival time of either 0 or 1 msec.

TCP Flood Variant B
Flow sizes: are set to 1 packet (26.4%) or 2 packets (73.6%).
Flow durations: 2-packet flows have lengths between 2*64 msec and 15*64 msec. 1-packet flows have a length of 0 msec.
Source port numbers: show no particular patterns and are selected from the interval [49,000:65,400]. The difference between consecutive flows has the distribution shown in the Figure below.
Inter-arrival times: Every 10th flow of TCPFlood-B has a delay between 21 and 35 msec, while the remaining flows are sent with inter-arrival times less or equal to 1 msec.

TCP Backscatter
We found 11 instances of TCP backscatter in the SWITCH traces. Backscatter flows are replies of a DoS victim that has been overflown by packets with spoofed source IP addresses. The replies of the victim are then routed towards the owner of the spoofed address space.

Transport protocol: is set to TCP.
Source IP addresses: are set to the victim of the DoS attack.
Flow sizes: are set to 1 packet and 44 or 46 bytes.
Destination port numbers: are selected randomly from the interval [1,000:2,000] according to the distribution given in the Figure below.
Source IP addresses: show no regular pattern. The difference in source IP addresses between consecutive flows varies in the range [-600:600].
Inter-arrival times: show a periodical behavior with three cycles. Approximately every 720th flows has an inter-arrival time of 1 msec, every 3000th flow has a delay of 60 msec, and every 8000th flow has a delay of 380 msec. The remaining flows have inter-arrival times of 0 msec.