Simplified Networking: Crafting Isolated Echo Server with Rust

21 December 2023
Cover image

In recent days, I've been deeply immersed in exploring eBPF, particularly focusing on XDP (eXpress Data Path) and Traffic Control (TC) for some moderately complex projects. If you've been following my eBPF serie, you might recall my last two articles delved into the intricacies of XDP. eBPF is a revolutionary technology, especially in the realm of networking, where it plays a pivotal role.

As my exploration into eBPF has grown, so has my understanding of its applications and the environments needed for effective experimentation. Real-world network scenarios often involve intricate segmentation and isolation – conditions that are essential to replicate for meaningful eBPF experimentation. Recognizing this, I saw an opportunity to create something that would not only aid my learning but could also be a valuable tool for others: a simple program to run TCP/UDP servers in isolated network namespaces. This setup would allow for a more realistic testing ground, crucial for experimenting with various eBPF programs, particularly those that handle network data across different layers.

Building on this need for a practical network environment, I decided to challenge myself with a new project, especially since I am still learning Rust. My goal was to create a simple and user-friendly command-line tool that could launch TCP and UDP echo servers, each in its separate network namespace. To achieve this, I used the clone syscall for creating isolated network spaces. I also set up a network bridge and veth pairs to enable communication within these spaces. This setup is particularly great for testing because I can attach different eBPF programs to the virtual interfaces to see how they work. While the project is relatively straightforward, it's been a significant part of my learning journey, allowing me to apply Rust in a practical setting and tackle some real network programming challenges.

Why Isolated Network?

In the dynamic field of network programming the ability to launch isolated network spaces is invaluable. This isolation offers a multitude of benefits: it ensures secure and controlled environments for testing and deploying applications, mitigates the risk of system-wide disruptions, and allows for the simulation of complex network configurations. Tools like Docker excel in this area by providing lightweight, standalone containers that replicate production environments without the overhead of full virtual machines. This isolation is especially critical for developing and experimenting with network-related programs, such as those involving eBPF, where precise control over network behavior and interactions is essential.

Why Not Just Use Docker?

Sure, Docker is a great tool for creating isolated networks, but for what I needed, it seemed a bit too much. Docker comes with a lot of features that I wouldn't use for this project. I wanted something simpler and more focused. Plus, I was looking for a good excuse to really dive into Rust programming. Building this tool from scratch gave me the perfect opportunity to learn more about Rust and network programming, while keeping things simple and tailored to my specific needs.

Understanding Process Isolation: The Basic

Process isolation is a key concept in computing where different processes are kept separate from each other, ensuring they don't interfere or compromise the overall system. Imagine it like having several different workspaces on the same desk, where each task is contained in its own area. Docker, a popular containerization platform, uses process isolation effectively. It creates containers, each acting like a mini-computer within your main computer, running its own applications and using its own isolated portion of the system resources.

Linux namespaces, chroot, and cgroups are foundational elements for achieving isolation in Linux, and they are crucial for Docker's containerization technology. Namespaces in Linux provide a way to isolate and virtualize system resources, allowing processes to run in separate environments as if they were on different machines. For instance, network namespaces isolate network interfaces, ensuring that processes in different namespaces don't interfere with each other's network communications. Chroot, short for 'change root', is a way of isolating process filesystems. It changes the apparent root directory for a process, effectively sandboxing its access to the file system. Lastly, cgroups, or control groups, manage the allocation of resources such as CPU time, system memory, network bandwidth, or combinations of these resources among user-defined groups of tasks. Together, these technologies form the backbone of Linux containerization, providing robust isolation and resource control.

In a highly simplified explanation, when you create a container using a platform like Docker, which internally utilizes containerd and runc, what actually happens is a new process gets initiated. This process is then moved into its own set of isolated namespaces. These namespaces include network (for isolating network interfaces), PID (for PID namespace), UTS (for hostname isolation), among others. Alongside this, Docker uses chroot to change the apparent root directory for the container, effectively sandboxing its filesystem. Additionally, cgroups are employed to manage and limit the container's resource usage, such as CPU and memory.This setup is more complex than it sounds, but it's what allows each container to work like it's in its own little world. This means every container is kept separate from others and from the main computer it's running on

A namespace wraps a global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource. Changes to the global resource are visible to other processes that are members of the namespace, but are invisible to other processes. One use of namespaces is to implement containers. https://man7.org/linux/man-pages/man7/namespaces.7.html

If you're interested in delving deeper into this topic, I highly recommend checking out a couple of YouTube videos by Liz Rice. Her explanations are fantastic for gaining a more in-depth understanding of containers and how they work from the ground up. Liz rice - Containers from scratch

For this project, I don't need to use all the capabilities that namespaces, chroot, and cgroups offer. I'm not trying to build a full-blown containerization system like Docker. My aim is simpler: to run a server in its own network space using just network namespaces. This way, I can quickly launch a server with a single command, and it will have its own isolated network area without the complexities of a complete container setup.

The Syscall Clone

Like I mentioned earlier, I'm going to use the clone syscall for this project. But it's worth noting that there's another syscall, unshare, that can also do similar things. Clone is great for making new processes that are already separated into different network spaces, while unshare can take an existing process and isolate it from the rest of the system. Both of these are pretty handy tools in Linux when you want to create isolated environments, like what I need for my server.

Clone by contrast with fork(2), this system call provide more precise control over what pieces of execution context are shared between the calling process and the child process. For example, using this system call, the caller can control whether or not the two processes share the virtual address space, the table of file descriptors, and the table of signal handlers. This system call also allow the new child process to be placed in separate namespaces(7). https://man7.org/linux/man-pages/man2/clone.2.html

The reason I'm opting for clone over unshare is because of the specific network setup I'm planning. After creating a new process with clone, the parent process needs to perform some network configurations. This setup is a bit easier to manage with clone, as it allows the parent process to set up the network right after the new process starts. Essentially, clone fits well with the flow of creating and then immediately configuring isolated network spaces for my servers, simplifying the whole process.

+--------------------------+         +----------------------------------------------+
| Parent Process           |         | Child Process                                |
| 1. Executes clone        | ------> | 2. Starts in New Net NS                      |
|   with CLONE_NEWNET flag |         |  Inherits other namespaces (PID, Mount, etc.)|
+--------------------------+         +----------------------------------------------+
                               |
3. Parent Configures Network -> | 
                               V
+-------------------------------------------------------------+
| Network Configuration.                                      |
| (e.g., setup veth pair, bridge and more)                    |
+-------------------------------------------------------------+

Isolated Network NS Needs Communication

Just having a new process in an isolated network namespace isn't enough. I also need a way for this process to talk to the host and for the host to talk back. To do this, I'm going to set up a bridge and a pair of virtual Ethernet (veth) interfaces. This is kind of like what Docker does. The bridge acts like a link between the isolated network and the main network, and the veth pair creates a network tunnel between the isolated process and the rest of the system. It's a simple but effective way to make sure the host can communicate with the isolated servers.

bridge: A bridge is a way to connect two Ethernet segments together in a protocol independent way. Packets are forwarded based on Ethernet address, rather than IP address (like a router). https://wiki.linuxfoundation.org/networking/bridge

veth: Packets transmitted on one device in the pair are immediately received on the other device. When either device is down, the link state of the pair is down. https://man7.org/linux/man-pages/man4/veth.4.html

+----------------------+                   +---------------------------------+
|     Host System      |                   |   Isolated Network Namespace   |
|                      |                   |   (e.g., ns-01)                |
|  +----------------+  |                   |                                |
|  |    Bridge      |  |                   |  +--------------------------+  |
|  |     (br0)      |  |                   |  |                          |  |
|  |                |  |                   |  |  Virtual Ethernet Pair   |  |
|  |  +-----------+ |  |                   |  |   (veth0, veth-peer).    |  |
|  |  | veth-peer |---- Network Tunnel -------|                          |  |
|  |  +-----------+ |  |                   |  +--------------------------+  |
|  +----------------+  |                   +---------------------------------+
|                      |        
+----------------------+       

Using isolated network namespaces with veth pairs offers significant advantages, particularly in terms of network traffic management and security. Each isolated environment in this setup is connected to the host system through a veth pair, akin to a virtual wire. This configuration allows for precise monitoring and control of the network traffic entering and exiting each isolated network. By attaching eBPF programs to the veth pairs, we can efficiently inspect and manage the network traffic. This includes attaching eBPF programs to the host side of these veth pairs, enabling detailed monitoring and policy enforcement on all traffic from them.

Crafting the program in Rust

To keep things simple, the project relies on three main Rust libraries, plus one for the user interface. First, there's nix, which I use for all the Linux syscall API stuff – it's really handy for interacting directly with the operating system. Then I've got rtnetlink for setting up the network, which makes handling network configurations a lot smoother. And for the asynchronous runtime, I'm using tokio, ensuring the program remains efficient and responsive, especially during network operations. Lastly, for creating a user-friendly command-line interface, I'm using clap. It's great for parsing command-line arguments and making the tool easy to use. Together, these libraries form the backbone of this network isolation tool, combining functionality with ease of use.

Let's take a look at the main parts of the code, presented in a straightforward way. I’ll explain each part clearly, focusing on the essentials and leaving out any uninteresting or extra checks. For those eager to dive into all the details, the complete code is waiting in the repo.

Network setup

So, the first step in the program is to set up a bridge. This bridge will let the isolated process talk to the host and other processes. Here's the create_bridge function that does just that:


async fn create_bridge(name: String, bridge_ip: &str, subnet: u8) -> Result<u32, NetworkError> {
    let (connection, handle, _) = new_connection()?;
    tokio::spawn(connection);
    
    // Create a bridge
    handle.link().add().bridge(name.clone()).execute().await.map_err(...)?;
    let bridge_idx = handle.link().get().match_name(name).execute()
        .try_next().await?
        .ok_or_else(...)?.header.index;

    // add ip address to bridge
    let bridge_addr = std::net::IpAddr::V4(Ipv4Addr::from_str(bridge_ip)?);
    AddressHandle::new(handle.clone())
        .add(bridge_idx, bridge_addr, subnet).execute().await
        .map_err(...)?;
        
    // set bridge up
    handle.link().set(bridge_idx).up().execute().await.map_err(..)?;
    Ok(bridge_idx)
}

This function is doing what you'd typically do with network setup commands in Linux. For example, creating a bridge, assigning it an IP address, and bringing it up, which you might normally do with commands like:

$ ip link add name br0 type bridge
$ ip addr add 178.18.0.1/16 dev br0
$ ip link set br0 up

When we run these commands, or the function with similar settings, what we get is a network bridge on the host system, labeled as br0, we assign it the IP address 178.18.0.1/16. This IP serves as the network identity for the bridge within the host system. Think of it like the main door to a building.

 +---------------------------+
 |         Host System       |
 |                           |
 |  +---------------------+  |
 |  | Network Bridge (br0)|  |
 |  | IP: 178.18.0.1/16.  |  |
 |  +---------------------+  |
 |                           |
 +---------------------------+

Next up in the project is creating the veth pair. This step is crucial because the veth pair is what connects our isolated network namespace to the host system, using the bridge we set up.

async fn create_veth_pair(bridge_idx: u32) -> Result<(u32, u32), NetworkError> {
    let (connection, handle, _) = new_connection()?;
    tokio::spawn(connection);

    // create veth interfaces
    let veth: String = format!("veth{}", random_suffix());
    let veth_2: String = format!("{}_peer", veth.clone());

    handle.link().add().veth(veth.clone(), veth_2.clone()).execute()
        .await.map_err(...)?;

	// Get veth pair idxs
    let veth_idx = handle.link().get().match_name(veth.clone())
        .execute().try_next().await?.ok_or_else(...)?.header.index;

    let veth_2_idx = handle.link().get().match_name(veth_2.clone())
        .execute().try_next().await?.ok_or_else(...)?.header.index;

    // set master veth up
    handle.link().set(veth_idx).up().execute().await.map_err(...)?;

    // set master veth to bridge
    handle.link().set(veth_idx).controller(bridge_idx).execute()
    .await.map_err(...)?;

    Ok((veth_idx, veth_2_idx))
}

In this function, we create a pair of virtual Ethernet (veth) interfaces. One part of this pair will be connected to our isolated network namespace later. The other part stays in the host system and gets connected to our bridge, br0. This is how we create a communication path between the isolated environment and the host network using the bridge.

You could set up something similar manually with Linux ip commands. Here’s how it goes:

$ ip link add veth0 type veth peer name veth0_peer # create the veth pair
$ ip link set veth0 up # activate the veth interface
$ ip link set veth0 master br0 # connect one end of the veth to the bridge

So, what we've achieved with this is something like this setup:

+-----------------------+
|      Host System      |
|                       |
|  +-----------------+  |
|  |    Bridge (br0) |  |
|  |  178.18.0.1/16  |  |
|  |   +---------+   |  |
|  |   |  veth   |   |  |
|  |   +----|----+   |  |
|  +--------|-------+   |
|           |           |
|  +--------|--------+  |
|  |  veth-peer      |  |
|  +-----------------+  |
|                       |
+-----------------------+

After we've connected veth to the bridge in the host system, we need to move veth-peer to a specific isolated network namespace. To do this, we require two key pieces of information: the index of veth-peer (referred to as veth_idx in the code) and the process ID (PID) of the process that owns the namespace we want to use. Here’s the function that handles this:

pub async fn join_veth_to_ns(veth_idx: u32, pid: u32) -> Result<(), NetworkError> {
    let (connection, handle, _) = new_connection()?;
    tokio::spawn(connection);
    // set veth to the process network namespace
    handle.link().set(veth_idx).setns_by_pid(pid).execute().await.map_err(...)?;

    Ok(())
}

We’re assigning veth-peer to the network namespace of a process by its PID. This is crucial for ensuring that veth-peer is part of the desired isolated environment. By executing this function, veth-peer becomes attached to the network namespace of the process with the given PID, allowing it to communicate within that isolated space, while veth remains connected to the host's bridge.

+----------------------+                   +--------------------------------+
|     Host System      |                   |   Isolated Network Namespace   |
|                      |                   |   (e.g., newns)                |
|  +----------------+  |                   |                                |
|  |    Bridge      |  |                   |                                |
|  |    (br0)       |  |                   |                                |
|  |  178.18.0.1/16 |  |                   |  |+-------------------------+  |
|  |  +-----------+ |  |                   |  |                          |  |
|  |  |   veth    |---------------------------|        veth-peer         |  |
|  |  +-----------+ |  |                   |  +--------------------------+  |
|  +----------------+  |                   +--------------------------------+
|                      |                                                     
+----------------------+                                                      

The final step in setting up our network is configuring veth-peer within the new network namespace. We need to give it an IP address and get it ready to use. It's important to make sure that this IP address is in the same subnet as the bridge's IP, so they can talk to each other properly.

pub async fn setup_veth_peer(
    veth_idx: u32,
    ns_ip: &String,
    subnet: u8,
) -> Result<(), NetworkError> {
    let (connection, handle, _) = new_connection()?;
    tokio::spawn(connection);

    info!("setup veth peer with ip: {}/{}", ns_ip, subnet);

    // set veth peer address
    let veth_2_addr = std::net::IpAddr::V4(Ipv4Addr::from_str(ns_ip)?);
    AddressHandle::new(handle.clone()).add(veth_idx, veth_2_addr, subnet)
    .execute().await.map_err(...)?;

    handle.link().set(veth_idx).up().execute().await.map_err(...)?;

    // set lo interface to up
    let lo_idx = handle.link().get().match_name("lo".to_string()).execute().try_next()
    .await?.ok_or_else(...)?.header.index;
        
    handle.link().set(lo_idx).up().execute().await.map_err(...)?;

    Ok(())
}

This function is doing something similar to what we've done before, but this time it's inside the new network namespace. Basically, we're giving veth-peer an IP address that matches the subnet of our bridge. This lets them communicate with each other. After assigning the IP, we activate veth-peer by bringing it online. This step is key to making sure that everything in our isolated network environment is connected and ready to go. If you were doing this manually, you'd use ip commands like these:

# Assign IP in the namespace
$ ip netns exec newns ip addr add 178.18.0.2/16 dev veth-peer
# Set veth-peer up in the namespace
$ ip netns exec mynetns ip link set veth-peer up

So, that wraps up our network setup. Now we should have everything in place.

+----------------------+                   +--------------------------------+
|     Host System      |                   |   Isolated Network Namespace   |
|                      |                   |   (e.g., newns)                |
|  +----------------+  |                   |                                |
|  |    Bridge      |  |                   |                                |
|  |    (br0)       |  |                   |                                |
|  |  178.18.0.1/16 |  |                   |  |+-------------------------+  |
|  |  +-----------+ |  |                   |  |      IP: 178.18.0.2      |  |
|  |  |   veth    |---------------------------|        veth-peer         |  |
|  |  +-----------+ |  |                   |  +--------------------------+  |
|  +----------------+  |                   +--------------------------------+
|                      |                                                     
+----------------------+                                                     

At this point, if we take a look at our system's routing setup using the ip route command, we'll see an entry for our bridge. This entry is crucial. It tells our system how to handle traffic to and from the 172.18.0.0/16 network. Essentially, whenever our system needs to send a packet to an address within this range, it knows to use the isobr0 interface, all thanks to this route in the routing table.

$ ip route
...
172.18.0.0/16 dev isobr0 proto kernel scope link src 172.18.0.1
...

The Main Program

Before we dive into the main function where we'll bring all these pieces together, let's take a closer look at the invocation of the clone function provided by the nix crate inside main. Understanding this is key to how we set up our isolated environments.

// prepare child process
let cb = Box::new(|| c_process(&args, veth2_idx));
let mut tmp_stack: [u8; STACK_SIZE] = [0; STACK_SIZE];
let child_pid = unsafe {
	clone(
		cb,
		&mut tmp_stack,
		CloneFlags::CLONE_NEWNET,
		Some(Signal::SIGCHLD as i32),
	)
}

In this part of the code, we're setting up a new child process. We use the clone system call with a specific flag, CLONE_NEWNET, to ensure this child process has its own separate network environment. We also allocate a memory stack for this process and define what it should do using a closure cb. The clone call returns the child process's ID, which we store in child_pid. This setup is crucial for our project as it is required for our network setup.

The complete code for the main function is outlined below. A crucial element within it is the c_process function. This function is central to our setup — it's what runs as the child process in the newly created network namespace. What c_process essentially does is: firstly, it calls setup_veth_peer, which configures the network interface (veth-peer) inside this new namespace. This step is vital for establishing network communication within the isolated environment. Secondly, c_process executes the execute function. This is where the server core functionality lies — based on our initial choice, execute launches either a TCP or UDP echo server.

fn main() {
    env_logger::init();
    let args = Args::parse();
    let rt = tokio::runtime::Runtime::new().expect("Failed to create Tokio runtime");
    let (_, _, veth2_idx) = rt
        .block_on(prepare_net(
            args.bridge_name.clone(),
            &args.bridge_ip,
            args.subnet,
        ))
        .expect("Failed to prepare network");

    // prepare child process
    let cb = Box::new(|| c_process(&args, veth2_idx));
    let mut tmp_stack: [u8; STACK_SIZE] = [0; STACK_SIZE];
    let child_pid = unsafe {
        clone(
            cb,
            &mut tmp_stack,
            CloneFlags::CLONE_NEWNET,
            Some(Signal::SIGCHLD as i32),
        )
    }
    .expect("Clone failed");

    info!("Parent pid: {}", nix::unistd::getpid());

    rt.block_on(async {
        join_veth_to_ns(veth2_idx, child_pid.as_raw() as u32)
            .await
            .expect("Failed to join veth to namespace");
    });

    thread::sleep(time::Duration::from_millis(500));

    match waitpid(child_pid, None) {... Wait for the child process}
}

fn c_process(args: &Args, veth_peer_idx: u32) -> isize {
    info!("Child process (PID: {}) started", nix::unistd::getpid());
    // Spawn a new blocking task on the current runtime
    let rt = tokio::runtime::Runtime::new().expect("Failed to create Tokio runtime");
    let process = rt.block_on(async {
        setup_veth_peer(veth_peer_idx, &args.ns_ip, args.subnet).await?;
        execute(args.handler.clone(), args.server_addr.clone()).await
    });
    
    info!("Child process finished");
    0
}

No cleanup process ?

In the code, you might wonder about the cleanup process for the network resources. Here’s how it works: The kernel plays a crucial role in resource management, especially with network namespaces. When the child process, which runs in its own network namespace, finishes its task and terminates, the associated network namespace is also destroyed. This is a key point — the destruction of the network namespace triggers the kernel to automatically clean up any network interfaces within it, including our veth pairs. So, when the part of the veth pair inside the namespace is deleted by the kernel, the corresponding part in the bridge becomes inactive and is typically removed as well. This automatic cleanup by the kernel ensures that our system remains efficient and free from unused network resources once the child process completes its job.

All the stuff we've talked about — setting up the network, creating isolated spaces, and all that code — is part of a project I've called isoserver. I chose a simple name because, honestly, naming things isn't my strong suit! It's a no-nonsense program that shows these ideas in action. If you're curious to see the code or maybe want to help out, you can find it all in the isoserver repository.

Running the Server

Now, let's look at how to run the app, similar to what's in the README of the repo. We'll go through the command-line arguments (CLI args) and see how to launch the server. The good news is, most of these arguments have default values, so you might not need to specify them all, depending on your setup

To run the server, use the following command. Remember, you can skip some arguments if the default values fit your needs:

sudo RUST_LOG=info ./isoserver--server-addr [server address] --handler [handler] \
  --bridge-name [bridge name] --bridge-ip [bridge IP] --subnet [subnet mask] \
  --ns-ip [namespace IP]

Values

  • --server-addr: No default value, must be specified (e.g., "0.0.0.0:8080").
  • --handler: Default is "tcp-echo". Options are "tcp-echo" or "udp-echo".
  • --bridge-name: Default is "isobr0".
  • --bridge-ip: Default is "172.18.0.1".
  • --subnet: Default is "16".
  • --ns-ip: No default value, must be specified (e.g., "172.18.0.2").

You can easily test the TCP echo server. First, run it with the default network configuration in one terminal. Then, open another terminal and use telnet 172.18.0.2 8080. This will let you see the program in action.

sudo RUST_LOG=info ./isoserver --server-addr 0.0.0.0:8080 --ns-ip 172.18.0.2
[2023-12-21T21:02:54Z INFO  isoserver::net] Interact with bridge isobr0 at cidr 172.18.0.1/16
[2023-12-21T21:02:54Z INFO  isoserver::net] bridge isobr0 already exist
[2023-12-21T21:02:54Z INFO  isoserver] Parent pid: 30396
[2023-12-21T21:02:54Z INFO  isoserver] Child process (PID: 30413) started
[2023-12-21T21:02:54Z INFO  isoserver::net] setup veth peer with ip: 172.18.0.2/16
[2023-12-21T21:02:54Z INFO  isoserver::handlers::tcp] TCP echo server listening on: 0.0.0.0:8080
[2023-12-21T21:02:54Z INFO  isoserver::handlers::tcp] waiting for new client connection
[2023-12-21T21:02:57Z INFO  isoserver::handlers::tcp] new client connection
[2023-12-21T21:02:58Z INFO  isoserver::handlers::tcp] Read 4 bytes from the socket
[2023-12-21T21:02:58Z INFO  isoserver::handlers::tcp] Wrote 4 bytes to the socket
[2023-12-21T21:03:03Z INFO  isoserver::handlers::tcp] Read 6 bytes from the socket
[2023-12-21T21:03:03Z INFO  isoserver::handlers::tcp] Wrote 6 bytes to the socket
...
[2023-12-21T21:03:10Z INFO  isoserver::handlers::tcp] Client disconnected

I opt for 0.0.0.0 to listen on all interfaces within the new network namespace. This choice is strategic because it allows the server to accept connections on any network interface that's available in its isolated environment, including the veth pair connected to the bridge. If we were to use 127.0.0.1, the server would only listen for connections originating from within the same network namespace, essentially limiting its reach to local-only interactions. By choosing 0.0.0.0, we eliminate the need for additional configurations that would be required to make the server accessible beyond the local scope of 127.0.0.1, like setting up specific routing or port forwarding rules.

So there you have it: I've created a simple method to launch isolated servers, each with its own veth. It's set up so I can attach eBPF programs for interaction and monitoring. This might not be the most complex program out there, but for me, it was both fun and incredibly useful to build.

To Conclude

And that wraps up our journey through the isoserver project. We've covered everything from setting up isolated network namespaces to configuring veth pairs, all through straightforward Rust code. Remember, if you're curious about the details or want to experiment with the code yourself, the entire project is available in the repo.

Thank you for reading along. This blog is a part of my learning journey and your feedback is highly valued. There's more to explore and share, so stay tuned for upcoming posts. Your insights and experiences are welcome as we learn and grow together in this domain. Happy coding!

Share article