In my previous article, we introduced some basic concepts about networking and sockets. We discussed different models and created a simple example of network communication using sockets, specifically starting with Stream TCP
sockets.
We started with this socket type and family because it is one of the most common implementations and the one that the majority of developers work with most frequently. Stream TCP sockets are widely used due to their reliability and ability to establish a connection-oriented communication channel. This makes them ideal for scenarios where data integrity and order are crucial, such as web browsing, email, and file transfers. By understanding the fundamentals of Stream TCP sockets, developers can build robust network applications and troubleshoot issues more effectively.
In this article, we will continue working with this socket type and dive a bit deeper into socket programming. So far, we have seen how to create a listener socket as a server and how to use another socket to connect to it. Now, we will explore one key detail about this communication, the endianness
!
Endianness
Endianness refers to the order in which bytes are arranged in memory. In a big-endian
system, the most significant byte is stored at the smallest memory address, while in a little-endian
system, the least significant byte is stored first. Understanding endianness is "crucial" in socket programming because data transmitted over a network may need to be converted between different endianness formats to ensure proper interpretation by different systems. This conversion ensures that the data remains consistent and accurate, regardless of the underlying architecture of the communicating devices.
In this example, a 32-bit value 1,200,000 (hex 0x00124F80
) shows endianness. In a big-endian system, bytes are stored as 0x00 0x12 0x4F 0x80
from lowest to highest address. In a little-endian system, the order is reversed: 0x80 0x4F 0x12 0x00
.
Endianness
(Order of bytes in memory)
Big-Endian System
+-----------------+-----------------+-----------------+------------------+
| Most Significant| | | Least Significant|
| Byte | | | Byte |
| (MSB) | | | (LSB) |
+-----------------+-----------------+-----------------+------------------+
| 0x00 | 0x12 | 0x4F | 0x80 |
+-----------------+-----------------+-----------------+------------------+
| Address: 0x00 | Address: 0x01 | Address: 0x02 | Address: 0x03 |
+-----------------+-----------------+-----------------+------------------+
Little-Endian System
+------------------+-----------------+-----------------+------------------+
| Least Significant| | | Most Significant |
| Byte | | | Byte |
| (LSB) | | | (MSB) |
+------------------+-----------------+-----------------+------------------+
| 0x80 | 0x4F | 0x12 | 0x00 |
+------------------+-----------------+-----------------+------------------+
| Address: 0x00 | Address: 0x01 | Address: 0x02 | Address: 0x03 |
+------------------+-----------------+-----------------+------------------+
Imagine you receive 0x00124F80
, and you don’t know if they are in big-endian or little-endian format. If you interpret these bytes using the wrong endianness, you’ll end up with a completely different value. In big-endian
format, the most significant byte comes first, so 0x00124F80
remains the same (decimal value: 1,200,000)
. However, in little-endian
format, the bytes are reversed, and the value would be 0x804F1200
(decimal value: 2152665600)
. This discrepancy can lead to significant errors in data processing, making it essential to handle endianness correctly.
To illustrate the concept of endianness with a simpler decimal example, consider the 32-bit value 574. In a big-endian system, the most significant digits are stored first. For 574, the digits are 5, 7, and 4. In a big-endian system, this would be stored as 500 (5 x 100), 70 (7 x 10), and 4 (4 x 1), in that order. In contrast, a little-endian system would store these digits in reverse order: 4 (4 x 1), 70 (7 x 10), and 500 (5 x 100). This reversal can cause the value to be interpreted incorrectly.
Network byte order
So for that it’s crucial for machines to agree on endianness when communicating to ensure data is interpreted correctly. For example, in internet communication, standards like RFC 1700
and RFC 9293
define the use of big-endian format, also known as network byte order. This standardization ensures that all devices on the network interpret the data consistently, preventing errors and miscommunication.
RFC 1700: The convention in the documentation of Internet Protocols is to express numbers in decimal and to picture data in "big-endian" order ...
RFC 9293: Source Address: the IPv4 source address in network byte order Destination Address: the IPv4 destination address in network byte order
https://datatracker.ietf.org/doc/html/rfc768 https://datatracker.ietf.org/doc/html/rfc1700
Machine agreements
If your application defines a protocol that specifies byte order, both the sender and receiver must adhere to this protocol. If the protocol specifies that certain fields in the data should be in network byte order, then you must convert those fields accordingly. Many application data formats that rely on plain text
such as JSON
and XML
are either byte order independent (treating data as strings of bytes) or have their own specifications for encoding and decoding multi-byte values.
Endianness only applies to multi-byte
values, meaning it affects how sequences of bytes representing larger data types (like integers and floating-point numbers) are ordered. Single-byte values are not affected by endianness.
For example, in Rust, the crates bincode
and postcard
use little-endian
by default. This means that when you serialize data using these crates, multi-byte values will be ordered in little-endian format unless specified otherwise.
Socket address for binding
If you have read or watched anything about socket programming, it is almost certain that you have encountered some C
examples. For instance, you might have seen something like the following code. This is because, as we learned earlier, IP addresses and ports need to be in big-endian format:
address.sin_family = AF_INET;
address.sin_addr.s_addr = htonl(...);
address.sin_port = htons(PORT);
bind(sockfd, (struct sockaddr *)&address, sizeof(struct address))
...
These functions htons
and htonl
is part of a series of C
helper functions that help convert multi-byte
values to the host byte order. These functions ensure that data is correctly interpreted regardless of the underlying system’s endianness.
The htonl() function converts the unsigned integer hostlong from host byte order to network byte order.
The htons() function converts the unsigned short integer hostshort from host byte order to network byte order.
The ntohl() function converts the unsigned integer netlong from network byte order to host byte order.
The ntohs() function converts the unsigned short integer netshort from network byte order to host byte order.
Modern programming languages and frameworks typically handle these formats for you and also offer specialized functions to manage them. For example, if we revisit our previous socket server example in Rust, we will see the following when creating the address:
let server_address = SockaddrIn::from_str("127.0.0.1:8080").expect("...");
In the previous line, we don’t have to deal with byte ordering due to the implementation. However, we can inspect the address to determine the default byte ordering on our machine, which is little-endian in my case. We can also convert it to big-endian, as required for the IP address and port.
// Create a socket address
let sock_addr = SockaddrIn::from_str("127.0.0.1:6797").expect("...");
let sockaddr: sockaddr_in = sock_addr.into();
println!("sockaddr: {:?}", sockaddr);
println!("s_addr Default: {}", sockaddr.sin_addr.s_addr);
// big endian
println!("s_addr be: {:?}", sockaddr.sin_addr.s_addr.to_be());
// little endian
println!("s_addr le: {:?}", sockaddr.sin_addr.s_addr.to_le());
When we run this code, we get the following output:
$ cargo run --bin addr
sockaddr: sockaddr_in { sin_len: 16, sin_family: 2, sin_port: 36122, sin_addr: in_addr { s_addr: 16777343 }, sin_zero: [0, 0, 0, 0, 0, 0, 0, 0] }
s_addr Default: 16777343
s_addr be: 2130706433
s_addr le: 16777343
As we mentioned, although Rust and other modern programming languages handle byte ordering i some levels and offer convenient abstractions, it’s always valuable to understand these foundational concepts.
Practical Example: Handling Endianness in Client-Server Communication
Now that we know about byte ordering, we are going to create a simple example to illustrate what we have covered so far. For this, we will create a simple program that receives a file from a socket. The client will first send the size of the file, and then it will send the data of the file. Note that this will be a straightforward implementation, keeping the code simple to illustrate this concept.
Server
We are going to re-use the code from the last article about creating a socket server of type stream and INET family. We will create a simple function to set up this socket server and return the file descriptor. This will help us maintain a clean implementation and reuse it in all future examples. The function is named create_tcp_server_socket
, and it is quite simple, as shown below:
pub fn create_tcp_server_socket(addr: &str) -> Result<OwnedFd, nix::Error> {
let socket_fd = socket(
nix::sys::socket::AddressFamily::Inet, // Socket family
nix::sys::socket::SockType::Stream, // Socket type
nix::sys::socket::SockFlag::empty(),
None,
)?;
// Create a socket address
let sock_addr = SockaddrIn::from_str(addr).expect("...");
// Bind the socket to the address
bind(socket_fd.as_raw_fd(), &sock_addr)?;
// Listen for incoming connections
let backlog = Backlog::new(1).expect("...");
listen(&socket_fd, backlog)?;
Ok(socket_fd)
}
Now we can use that function to create a server socket and accept incoming connections.
fn main() {
let socket_fd = create_tcp_server_socket("127.0.0.1:8000").expect("...");
// Accept incoming connections
let conn_fd = accept(socket_fd.as_raw_fd()).expect("...");
}
Next, after accepting the connection, we expect to receive the size of the file that the client will transmit over the network. We expect to receive a 32-bit unsigned integer, so we prepare a 4-byte buffer to receive the size. We then read from the socket and put the data in the buffer:
// Receive the size of the file
let mut size_buf = [0; 4];
recv(conn_fd, &mut size_buf, MsgFlags::empty()).expect("...");
let file_size = u32::from_ne_bytes(size_buf);
println!("File size: {}", file_size);
If you notice, we are using from_ne_bytes
here. This function assumes that the bytes are arranged in the native endianness of our machine (ne stands for native endianness)
. Therefore, we expect that the value is using the same endianness as our machine.
Finally, we will use that size to create an in-memory buffer to receive the file’s data over the connection and print the bytes read. As mentioned, this is a naive and basic implementation just to illustrate the case:
// Receive the file data
let mut file_buf = vec![0; file_size as usize];
let bytes_read = recv(conn_fd, &mut file_buf, MsgFlags::empty()).expect("...");
println!("File data bytes read: {:?}", bytes_read);
Full code:
use nix::sys::socket::{accept, recv, MsgFlags};
use socket_net::server::create_tcp_server_socket;
use std::os::fd::AsRawFd;
fn main() {
let socket_fd = create_tcp_server_socket("127.0.0.1:8000").expect("...");
// Accept incoming connections
let conn_fd = accept(socket_fd.as_raw_fd()).expect("...");
// Receive the size of the file
let mut size_buf = [0; 4];
recv(conn_fd, &mut size_buf, MsgFlags::empty()).expect("...");
let file_size = u32::from_ne_bytes(size_buf);
println!("File size: {}", file_size);
// Receive the file data
let mut file_buf = vec![0; file_size as usize];
let bytes_read = recv(conn_fd, &mut file_buf, MsgFlags::empty()).expect("...");
println!("File data bytes read: {:?}", bytes_read);
}
Client
For our client, the logic is again pretty simple and similar to what we had before, but with some subtle additions, like reading the file and sending its size before sending the actual data.
... socket creation and connection
// Read the file into a buffer
let buffer = std::fs::read("./src/data/data.txt").expect("Failed to read file");
// send the size of the file to the server
let size: u32 = buffer.len() as u32;
send( socket_fd.as_raw_fd(), &size.to_ne_bytes(), MsgFlags::empty()).expect("...");
// send the file to the server
send(socket_fd.as_raw_fd(), &buffer, MsgFlags::empty()).expect("...");
Two things to notice: as we mentioned earlier, we create a u32 for the file size and we are using the function to_ne_bytes
to send bytes arranged in the native endianness, which in my case is little-endian.
The data.txt
file is a simple text file containing some lorem ipsum data. We can inspect its size using the stat command:
$ stat -c%s ./src/data/data.txt
574
If you are wondering why we have to deal with endianness for the size but not for the file content itself, remember what we learned earlier:
Many application data formats that relies on plain text, are either byte order independent (treating data as strings of bytes) or ...
Running
Now we can run our server and client to see the output and understand how it works. The size of the file is 574 bytes, and we send the entire file.
# Server
$ cargo run --bin tcp-file-server
Socket file descriptor: 3
Socket bound to address: 127.0.0.1:8000
File size: 574
File data bytes read: 574
# Client
$ cargo run --bin tcp-file-client
Socket file descriptor: 3
Sending file size: 574
Sending file data
So far, so good, right? The behavior is as expected: we are sending a file with 574 bytes, and we are receiving that in the server. This works because both the client and server are using ne (native endianness)
, and since they are on the same machine, they are both using little-endian. But what if the client uses a different endianness? What if it sends the size using big-endian
, for example?
We can simulate what will happen by modifying this line in the client to instruct it to send the data in big-endian instead of little-endian. For that, we use to_be_bytes
(be stands for big-endian):
send( socket_fd.as_raw_fd(), &size.to_be_bytes(), MsgFlags::empty()).expect("...");
If we run our programs again, we will see why understanding endianness is important. From the client’s perspective, we are sending the same value, just in a different endianness. However, if you look at our server, you will notice the issue:
# Client
$ cargo run --bin tcp-file-client
Socket file descriptor: 3
Sending file size: 574
Sending file data
The server still treats the size as if it is coming in the same endianness, which is little-endian. But it is not little-endian anymore; it is big-endian. This results in a totally different and incorrect value. In this simple case, it causes the server to allocate much more memory than needed for the buffer to receive the file. You can imagine that in more complex scenarios, this issue could have a much larger and more serious impact.
# Server
$ cargo run --bin tcp-file-server
Socket file descriptor: 3
Socket bound to address: 127.0.0.1:8000
File size: 1040318464
File data bytes read: 574
And if we were using our native OS bit size, like a 64-bit integer in my case, and used a u64 type for the size variable in both the client and server, the issue could be even worse. We can modify the following lines in the client and server and see the result:
// server
let mut size_buf = [0; 8];
...
let file_size = u64::from_ne_bytes(size_buf);
// client
let size: u64 = buffer.len() as u64;
After making these changes, let’s run the server and client again:
# Server
$ cargo run --bin tcp-file-server
Socket file descriptor: 3
Socket bound to address: 127.0.0.1:8000
File size: 4468133780304953344
memory allocation of 4468133780304953344 bytes failed
Aborted (core dumped)
In this case, the multi-byte representation of the integer is larger, and the value representation becomes colossal in the wrong byte order. Here, the server tries to allocate about 4.4 exabytes of memory, which is far more than what’s available on any current machine. This illustrates how critical it is to handle endianness correctly, as incorrect handling can lead to severe issues like memory allocation failures and program crashes.
You can find the code for this example and future ones in this repo.
To Conclude
Understanding endianness is crucial for developing robust network applications. As demonstrated, even a simple task like sending a file over a network can lead to significant issues if endianness is not handled correctly. By adhering to standards and properly managing byte order, we ensure that data is accurately interpreted across different systems, preventing errors and enhancing the reliability of our applications. Modern programming languages like Rust provide helpful abstractions, but a solid grasp of these foundational concepts allows developers to troubleshoot and optimize their code more effectively. Always be mindful of endianness when working with multi-byte values in network communication to avoid potential pitfalls.
Thank you for reading along. This blog is a part of my learning journey and your feedback is highly valued. There's more to explore and share regarding socket and network, so stay tuned for upcoming posts. Your insights and experiences are welcome as we learn and grow together in this domain. Happy coding!