P4-baesd IPS-Charles H.-P. Wen

Abstract

IoT security is important nowadays. The number of IoTs connected to the Internet is growing rapidly. IPS provides abilities of detecting and blocking malware packets. In addition to hardware IPSs, another approach is conducted by SDN with VNF technique which is deploying software (e.g., Zeek) in VM or server. However, existing SDN-based IPS methods have one big challenge “Long response time”.

 

 P4-IPS: In-switch detecting and blocking malware

  • Flow Filter : Match malware detection table to do action (e.g., forward, drop) or do feature extraction later.
  • Feature Extractor : Mirror packets which built with self-defined headers and truncated payload to the control plane. ⇨ Forwarding pipeline offloading
  • Malware Detector : Use multi-thread methods to parallelly detect packets with the neural network model and add entries to the table. ⇨ Reduce time of detecting packet

Fig. 1: P4-IPS Architecture

Environment setting

Host1 acts as sender which use ”tcpreplay” to send flows from the pcap file.

Fig. 2: Simple Test Environment

Evaluation

For Zeek, its processing capacity is less than 2 flows and response time is 119.63 ms.

Processing capacity (unit): single thread : 2950 flow/s and 8 threads : 9345 flow/s (4672x flows)

Average response time: single thread : 0.339 ms (352x faster)

Fig. 3: Processing Capacity for Edgecore Wedge 100BF-32X

 

 

Using a P4 Hardware Switch to Block Trackers and Ads for All Devices on an Edge Network-Shie-Yuan Wang

Abstract

Nowadays, when a user downloads a web page, many unwanted advertisements are embedded into the downloaded web page. To solve these problems, in this paper we design and implement a method inside a P4 hardware switch to block trackers and advertisements. Our method is to be deployed at a switch that connects an edge network to the Internet. Using such a configuration, all devices on the edge network will be automatically protected by our method without the need to install a per-device protection software.

Fig. 1: Ads on the web page

Design and Implemtation

We use a P4 hardware switch to parse all DNS request packets passing through it. Furthermore, we import a filter list that contains lots of domain names related to trackers and advertisements. If the domain name carried in a DNS request packet is found to be on the filter list, our method will return 0.0.0.0 in the DNS reply packet as the resolved IP address for the queried domain name. Then, the HTTP requests of these domains will fail because of the invalid IP address.

Performance and Evaluation

We activated our method and observed its blocking effects on many commercial websites in this test. To clearly show its effectiveness, we selected Yahoo’s home page because it always displays many ads. Fig. 1 shows the difference between a web page with ads and without ads, respectively. This page was downloaded successfully without losing its original contents. The differences between the top and bottom pages are visible to naked eyes. Our method blocked the ads and thus caused the blank space at the top and bottom-right corner of the page.

We compared how much ads traffic can be blocked by “P4 ad blocker” when compared against the original traffic. We collected 34 websites for this test that contained many trackers and ads. Then, we wrote a Python script auto browse these websites repeatedly and recorded the difference of traffic with our method. The result shows in Fig. 2, “P4 ad blocker” can reduce the number of “All protocols” packets by 40.92% and reduce the number of TCP-only packets by 40.33%. We also compares the average number of “All protocols” bytes and TCP-only bytes under these methods. One can see that after enabling “P4 ad blocker,” the number of “All protocols” bytes can be reduced by 20.73% while the number of TCP-only bytes can be reduced by 20.55%.

Fig. 2: P4 AD Blocker

 

Enhancing the Security of a Private Network by Using A Multi-level Hierarchical NAT Scheme-Shie-Yuan Wang

Motivation and Objective

In this work, we exploit NATs and propose a multi-level hierarchical NAT scheme to protect and enhance the security of a private network. We have implemented NAT in P4 hardware switches and cascaded them together so that protected hosts can hide behind multi-level NATs. Based on the mechanism of NAT, our scheme can prevent adversaries on the Internet from directly accessing the hosts in a private network unless they compromise all of the NATs in the hierarchy

Implementation

We have implemented both a software-based NAT scheme and a hardware-based NAT scheme. We configured the Netfilter with iptables on a Linux Ubuntu 18.04 machine as a software-based NAT. We implemented a hardware-based NAT in a P4 hardware switch, and there are two versions of it — static and dynamic versions. In the static version, we installed predefined entries into the NAT table. However, in the dynamic version, we executed a local controller on the P4 hardware switch. It would dynamically install entries into the NAT table when detecting the first packet of a new outbound flow.

Experimental Setup

There are two scenarios for evaluation. In Fig. 1 (top), each host is behind a 2-level NAT scheme. We used iPerf to confirm functionality. In Fig. 1 (bottom), the client host is on the left and behind an N-level NAT scheme, where N can be 1, 2, 3, or 4. We used iPerf, curl, and ping to evaluate the performance. We also used a simple Python program to test the stability of our scheme. In all of the topologies used for experiments, the bandwidth of every link is 10 Gbps, and the length of every link is 1 meter.

Fig.1: The Experiment Topologies

Evaluation

In the functionality test, experimental results confirm that our schemes can function correctly.
In Fig. 2 (left), the throughput of software-based NAT gradually declines as the number of hops (i.e., the value of N) increases. However, the throughput of the hardware-based NAT is not affected. In Fig. 2 (right), the latency of dynamic hardware-based NAT increases with the number of hops and is higher than the other two schemes.

Fig.2: iPerf and curl testing results

 

In Fig. 3 (left), the latency of software-based NAT linearly rises as the number of hops increases. However, the latency of static hardware-based NAT is not affected. To get insights into the phenomenon in Fig. 2 (right), we used ping to measure the latency of the first few packets of a new outbound flow. In Fig. 3 (right), except for the first ping packet, the latency of all subsequent ping packets dropped to only 0.1x milliseconds. This means that our simplified controller is the primary reason for the large latency of dynamic hardware-basde NAT.

Fig. 3: curl and ping testing results

After the 3-hour test, all connections initiated from the client host are finished successfully. This result shows that our dynamic hardware-based NAT scheme can operate stably under high load.

 

 

Video Link: https://youtu.be/KvD2ju0bCrE

 

Longer Stay Less Priority: Flow Length Approximation Used in Information-Agnostic Traffic Scheduling in Data Center Networks-Chien Chen

Abstract

Numerous scheduling approaches have been proposed to improve user experiences in a data center network(DCN) by reducing flow completion time (FCT). Mimicking the shortest job first (SJF) has been proved to be the prominent way to improve FCT. To do so, some approaches require flow size or completion time information in advance, which is not possible in scenarios like HTTP chunk transfer or database query response. Some information-agnostic schemes require involving end-hosts for counting the number of bytes sent. We present Longer Stay Less Priority (LSLP), an information-agnostic flow scheduling scheme, like Multi-Level Feedback Queue (MLFQ) scheduler in operating systems, that aims to mimic SJF using P4 switches in a DCN. LSLP considers all the flows as short flows initially and assigns them to the highest priority queue, and flows get demoted to the lower priority queues over time. LSLP estimates the active time of a flow by leveraging the state-of-the-art P4 switch’s programmable nature. LSLP estimates the active time of a group of new flows that arrive during a time interval and assigns their packets to the highest priority. At the beginning of the next time interval, arriving packets of old flows are placed one priority lower except for those already in the lowest priority queue. Therefore, short flows can be completed in the few higher priority queues while long flows are demoted to lower priority queues. We have evaluated LSLP via a series of tests and shown that its performance is comparable to the existing scheduling schemes.

 

LSLP Mechanism

Beginning of new time slot

  • All new flows (during a time slot) are grouped in one version: Packets assigned same priority
  • All existing flows: Demoted by one level except already in lowest priority

Fig. 1: Overview of LSLP

Implementation In P4

Packet is matched to Time-version table

  • successful: get for TimeSlotID
  • match Priority table for Qid and submitted to Q

Packet is not matched to Time-version table

  • Submit to controller
  • Controller updates time-version and priority table

Fig. 2: Workflow of LSLP

Performance Evalluation

  • Overall average FCT at different load: (a) Web search workload, (b) Data mining workload

  • Web Search (Left) and Data Mining (Right) Workload FCT across different flow sizes  (a)(0; 100KB] Avg. (b) (0; 100KB] 99th Percentile (c) (100KB; 10MB] (d)(10MB;∞]

 

Video Link: https://youtu.be/YUVZ-BGVoYo

Neural-Network Based Malware Detection on P4 Switch-Charles H.-P. Wen

Introduction

The traditional IDS(Intrusion Detection System) costs too much time and bandwidth, so we use machine learning and P4 switch to improve the efficiency of malware detection.

Fig. 1: IDS and P4-IDS

P4 Malware Detection

  • Machine Learning model identifies malware faster than traditional IDS.
  • P4 Switch is a programmable switch, so we can define the field for ML prediction.
  • Combining the features of ML and P4, we propose P4 malware detection.
  • Process shows in Fig.2

Fig. 2: Flowchart of P4 Malware Detection

 

In-switch P4 Tofino ASIC Pipeline side:

  • Truncate packet and send the self-defined fields to CPU for machine learning model prediction.
  • One table to block the malwares.

In-switch x86 CPU Platform side:

  • Identify the uncertain flows by machine learning model.
    • Neural Network for fast prediction
    • Detail of NN model presents in fig.3
    • Model accuracy : 99.6%
  • Add/Modify entry to pipeline

Fig. 3: NN Model (Accuracy: 99.6%)

Experiments

  • Time Saving:
    • It dose not mirror packet to external device.
    • ML model prediction is faster than software IDS.
  • Bandwidth Saving
    • P4 switch truncate packet for ML model.
  • Compare with Software IDS (Zeek)
    • Identify speed improve about 200 times.
    • Response time improve about 240 times.

 

Publication: H.-F. Chang, M. I.-C. Wang, C.-H. Hung, and C. H.-P. Wen, “Enabling Malware Detection with Machine Learning on Programmable Switch,” in NOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium, Apr. 2022, pp. 1–5. doi: 10.1109/NOMS54207.2022.9789939.

A Novel Per-Hop Per-Flow Flow Control Scheme-Shie-Yuan Wang

Abstract

Performing flow control inside a network can effectively avoid packet loss due to buffer overflow in switches. IEEE 802.1Qbb Priority-based Flow Control (PFC) exercises a scheme to achieve this goal. But it still suffers from several serious problems such as congestion spreading, deadlock, and packet loss.

In this work, we propose a Per-hop per-Flow Flow Control scheme (PFFC) to avoid all of these problems. We design, implement, and evaluate the performance of PFFC in P4 hardware switches. Experimental results show that

  • PFFC outperforms PFC in many aspects including avoiding congestion spreading, deadlock, and packet loss
  • The bandwidth overhead of PFFC is only slightly higher than that of PFC.

Design and Implementation

Our design and implementation including 3 main mechanisms: (1) Per-flow buffer usage accounting, (2) Per-flow control frames generation and (3) Per-flow control frames reaction

  • Per-flow Buffer Usage Accounting

Fig. 1 illustrates the process of per-flow buffer usage accounting. When a packet enter IMAP (Ingress Match Action Pipeline), the FBU (Per-flow Buffer Usage) and PBU (Packet Buffer Usage) counters are incremented by the packet size respectively. When a packet is cloned in EMAP (Egress Match Action Pipeline), PFFC recirculate (re-enter IMAP) the cloned packet for decrement the FBU and PBU counters.

Fig. 1: The per-flow buffer usage accounting in PFFC

  • Per-flow Control Frames Generation

If the FBU or PBU counter triggers the given threshold Xoff / Xon , a PAUSE/RESUME frame and a MIGRATE/MIGRATE-BACK frame will be generated and sent to upstream.

  • Per-flow Control Frames Reaction

As shown in Fig. 2, if a flow is not paused, we direct its packet the DEQ (Default Egress Queue). When the node receives PAUSE frame, it pauses a PEQ (Paused Egress Queue). The MIGRATE frame directs the flow which is to be paused to a PEQ. When the node receives RESUME frame, it resumes a PEQ, The MIGRATE-BACK frame redirect the paused flow to DEQ.

Fig. 2: The DEQ and PEQs in an output port in PFFC

Contributions

  • PFFC avoids many serious problems in PFC such as congestion spreading, deadlock and packet loss.
  • In addition, the average flow completion time of mice flows in PFFC is shorter than that in PFC.
  • The bandwidth overhead of PFFC is only slightly higher than that of PFC.
  • PFFC is a good candidate scheme to be used for losses networks.

Publication: S.Y. Wang, Y.R. Chen, H.C. Hsieh, R.S. Lai, and Y.B. Lin, “A Flow Control Scheme based on Per Hop and Per Flow in Commodity Switches for Lossless Networks,” IEEE Access, Volume 9, pages 156013-166029, 2021. (Digital Object Identifier: 10.1109/ACCESS.2021.3129595)

Aggregating and Disaggregating Packets with Various Sizes of Payload in P4 Switches at 100 Gbps Line Rate-Shie-Yuan Wang

Project Description

Aggregating multiple small packets into a large packet provides many advantages. For example, multiple small packets can share a single copy of common Ethernet/IP/UDP headers to reduce the percentage of network bandwidth spent on transmitting headers. In the past, packet aggregation and disaggregation were done by a server CPU or a switch CPU, resulting in low throughputs. In this paper, we design and implement packet aggregation and disaggregation functions in the packet processing pipelines of P4 switches. Our novel designs allow packets with various sizes of payload to be aggregated and disaggregated purely in the data plane of a P4 switch. Experimental results show that the achieved throughputs of our aggregation and disaggregation methods can reach 100 Gbps, which is the line rate of the used P4 switch.

  • Propose a design to aggregate multiple small packets of various sizes into a large one in a P4 hardware switch
  • Propose a design to disaggregate a large packet back to multiple original small packets in a P4 hardware switch
  • Implement both designs in Tofino-based P4 hardware switches
  • Experimental results show that the achieved throughputs of our aggregation and disaggregation methods can reach 100 Gbps, which is the line rate of the used P4 switch.

Team Members

Shie-Yuan Wang (shieyuan@cs.nctu.edu.tw)

Jun-Yi Li (gary841208c2@gmail.com)

Yi-Bing Lin (liny@nctu.edu.tw)

Open Source Code

Not available

Publications

  • S.Y. Wang, J.Y. Li, and Y.B. Lin, “Aggregating and Disaggregating Packets with Various Sizes of Payload in P4 Switches at 100 Gbps Line Rate,” Journal of Network and Computer Applications, Vol. 165, September 1, 2020.
  • S.Y. Wang, C.M. Wu, Y.B. Lin, and C.C. Huang, “High-Speed Data-Plane Packet Aggregation and Disaggregation by P4 Switches,” Journal of Network and Computer Applications, Vol. 142, pp. 98-110, 2019.
  • Y.B. Lin, S.Y. Wang, C.C. Huang, and C.M. Wu, “The SDN Approach for Aggregation/Disaggregation of Sensor Data,” Sensors, 18(7), 2018.