 Research
 Open Access
 Published:
Waterwall: a cooperative, distributed firewall for wireless mesh networks
EURASIP Journal on Wireless Communications and Networking volume 2013, Article number: 225 (2013)
Abstract
Firewalls are network devices dedicated to analyzing and filtering the traffic in order to separate network segments with different levels of trust. Generally, they are placed on the network perimeter and are used to separate the intranet from the Internet. Firewalls are used to forbid some protocols, to shape the bandwidth resources, and to perform deep packet inspection in order to spot malicious or unauthorized contents passing through the network. In a wireless multihop network, the concept of perimeter is hard to identify and the firewall function must be implemented on every node together with routing. But when the network size grows, the ruleset used to configure the firewall may grow accordingly and introduce latencies and instabilities for the lowpower mesh nodes. We propose a novel concept of firewall in which every node filters the traffic only with a portion of the whole ruleset in order to reduce its computational burden. Even if at each hop we commit some errors, we show that the filtering efficiency measured for the whole network can achieve the desired precision, with a positive effect on the available network resources. This approach is different from the protection of a space behind a wall: we use the term waterwall to indicate a distributed and homogeneous filtering function spread among all the nodes in the network.
1 Introduction
Protecting a network from unsolicited, often malicious traffic is one of the constant concerns of any network administrator. Apart from standard networking devices as switches and routers, middleboxes as NATs and firewalls are normally installed on the network boundary to separate trusted portions of the network from the global Internet and in general from less trusted ones.
In some cases, however, even the separation between the internal and the external network is not straightforward, and identifying boundaries and points of interconnection is even more difficult. A typical example is a wireless mesh network, in which a collection of subnets are interconnected through a backbone of mesh nodes, but each subnet is only loosely coupled with the others. Moreover, many points of access to the global Internet may exist (see Figure 1 for a pictorial representation). Mesh networks are often used with this configuration in order to bring connectivity in a costeffective way to areas where other technologies would be too expensive [1]. As a concrete example, community networks [2, 3] use this approach to share network resources between hundreds or even thousands of users and represent one of the most successful application of mesh networking. Projects like Guifi or Awmn (see http://guifi.net and http://awmn.gr) are examples of how this technology integrates with standard networks and how successful this approach can be. Future advances will open new possibilities for this technology [4].
In this paper, we tackle the problem of firewalling in large mesh networks. In such networks, each mesh node applies a specific firewall ruleset to the traffic directed to itself (or to subnets attached to it). The firewall is used to defend the local network from attacks, to shape the access to the Internet across a connection, or to forbid the access to certain logical resources. If all nodes share their rulesets and enforce them also on the outgoing traffic, the traffic is not filtered at the destination but directly at the source. This reduces the waste of network resources but forces each node to filter with a global ruleset made of thousands of rules, which is not practical for most of lowpower Linuxbased mesh routers. We propose to split the global ruleset in pieces and enforce only a portion of it at every hop with the goal of filtering the packets as close as possible to their source node in order to save network resources.
A correctly configured traditional firewall does not introduce falsepositives (packets that should be dropped but are instead forwarded). Instead, with our approach, each node singularly introduces some falsepositives, but as a global network function, the firewall will work with an arbitrary accuracy that can be tuned to the needs. To stress the difference from a typical firewall we chose the term waterwall to indicate a distributed and homogeneous filtering function spread among all of the nodes in the network. Note that we use filtering as target application for the sake of simple explanation and by way of example, but the same logic can be applied to other traffic analysis functions such as intrusion detection.
2 Related works and motivation
Distributed firewalling has not received much attention in the literature, but an initial model has been proposed by Bellovin et al. in [5], where the firewall was moved from a bastion host to endpoints in a traditional architecture network. Recently, the subject has been investigated with more attention. Bellovin again, followed by other authors, proposed a distributed policy enforcement platform [6–10]. These works are not focused on the complexity introduced by large rulesets. Other works focus on the application of hash functions to speed up rule matching [11–13] or on limiting the nodes that enforce the firewall [14]. None of these works focus on techniques to reduce the rulesets on single nodes.
The work whose idea comes closer to the contribution of this paper is [15], where the most recently matching rules are stored in a cache that is used to enforce filtering, thus using only a subset of the entire ruleset. The cache is split in two halves, with each one regulated with a different policy in order to ensure efficiency and fairness. This approach requires a feedback from the nodes that generate the rulesets in order to organize the cache; moreover, as with every caching strategy, its performance depends on the characteristics of the underlying traffic.
Our work assumes that the default filtering policy is to forward a packet. Packets are dropped only when there is a rule that matches them. This approach is more viable in a mesh network than a denybydefault one: for the last one to be usable, the rulesets must be perfectly synchronized and updated; otherwise, there is the risk of dropping legitimate traffic. In networks where a high security level is required, also denybydefault ruleset can be used as in [16, 17], but it does not match our network scenario.
Additional similarities can be found in the field of intrusion detection since filtering with large rulesets presents the same difficulties of traffic inspection with a large database of fingerprints. An approach to distributed intrusion detection systems (IDS) like [18] could benefit from the solution we propose. More affinities can be found in [19] that, as our work, exploits the distributed nature of an ad hoc network to spread the IDS function over the entire network.
2.1 Motivation
Figure 1 shows a widely used configuration for a wireless mesh network where a set of mesh routers interconnects separated local area networks (LANs). Each LAN has its own internet protocol (IP) addressing, and the routing protocol running on the mesh routers allows the clients of distinct LANs to communicate. In some cases, nodes may physically roam from one LAN to another, depending on the kind of routing protocol they may or may not maintain their initial IP addresses to keep their sessions alive. Finally, some of the LANs have a direct access to the Internet and share it with other users that are not equipped with it.
In this scenario, the owner of a mesh node is generally also the manager of the corresponding LAN, and he is interested in protecting it. We take into consideration three use cases applicable to the simple network in Figure 1:

1.
The manager of network C wants to protect its network from unwanted traffic coming from the outside. For instance, he does want to block connections to remote shell protocols coming from the mesh network to host A in LAN C.

2.
With a finer granularity, he may want to limit access only to some logical resources; for instance, host A may have some folders that are shared only on the LAN while some others are shared with the whole mesh network. The access to these resources can be denied or simply limited to a maximum bit rate.

3.
The manager of network C wants to forbid some traffic types that come from the mesh network and are directed to the Internet using its connection. This is normally due to the commercial agreements that the manager has with his network service provider. Again, traffic can be forbidden or it can be limited to a certain maximum bit rate.
Now imagine that a node in the network labeled E starts an attack against, let us say, host A. This may be due to a malicious user or to a virus that took control of a host in the network and starts a denial of service (DoS) or a brute force attack. We can add a fourth use case:

4.
The manager of network C detects an attack and reactively enforces network filters to protect its resources.
The issues described in the use cases can be partially resolved by configuring a firewall on each mesh node in order to filter the traffic directed to its LAN. The first three use cases can be tackled by setting up a mixture of layer4 and application layer firewall rules on the mesh router in C, which will drop or shape some traffic. The fourth one can be approached with dynamic rules that are activated when the firewall detects an anomaly in the usage of the resources, for instance, an abnormal number of internet control message protocol packets. Modern Linuxbased firewalls support all these features. What remains unsolved are the consequences for the rest of the mesh network and for the other LANs. Clearly, the malicious traffic coming from network E will still traverse the mesh network and subtract useful resources to the other allowed communications. Considering that in a mesh network the available bandwidth is shared between upload and download, this can severely impact not only the victim LAN but also the other networks on the way from the attacker to the victim. This example shows how the concept of border firewall does not correctly apply to the mesh network scenario, where the border itself of the network is very hard to define.
To solve this problem the mesh routers can share their rulesets in order to apply them directly on the other mesh routers. The ruleset of mesh router C applies only to the traffic directed to LAN C, to some logical resources it controls or to the internet traffic flowing across its connection. Each mesh router will publish its ruleset and collect all the other rulesets in a global ruleset. Then it will enforce it directly on the packets that it is forwarding so that the traffic is filtered as close as possible to the source. This approach indeed protects not only the resources of each LAN but also the shared resources of the mesh network. It’s not the goal of this paper to investigate how the rulesets are securely distributed, in the simplest case rulesets can be known in advance and every node just sponsors the identifier (ID) of one or more predefined ruleset in routing messages.
Now imagine that this model is applied to a large mesh network. As an extreme but realistic use case, imagine that this model is applied to a community wireless mesh network like the Guifi network. Guifi is made up of thousands of nodes^{a} and used by tens of thousands of users that daily access the network from various places (see [20] for a characterization of its topology features). What happens if even only 10% of the mesh routers start distributing a ruleset made of, let us say, 30 rules each? The result will be a global ruleset with tens of thousands of rules. Corporate firewalls can handle large rulesets up to tens of thousands of rules, but this is not the case for wireless routers that are generally lowcost devices designed for minimal energy consumption. The most used products are commercial devices that embed a lowpower processor (e.g., a 133MHz Intel or AMD lowend device), one or more IEEE 802.11b/g/a/n wireless cards, and run a customized Linux kernel. The whole hardware is enclosed in an outdoor shell powered over LAN and costs no more than 100 €. A 133MHz processor cannot easily handle a ruleset made up of thousands of rules organized in a linear list; it will introduce processing delays and packet dropping.
To improve filtering performance, rulesets can be preprocessed with various approaches, none of which is easy to port in this context. For instance, once the whole ruleset has been created, wildcards and numeric ranges can be used to group rules and reduce their total number. This involves a complex and costly preprocessing of the ruleset but speeds up the lookup time during the routing decision. It is convenient when the ruleset is mostly static and when the hardware is powerful enough for the preprocessing. In the case we consider, rules can be dynamically generated, nodes can be added to or removed from the network, and links may be temporarily unavailable. Each of these events will change the rulesets or the network topology (and consequently add/remove rulesets associated with nodes). Assuming that the nodes are powerful enough to perform the preprocessing, they would spend most of their CPU time repeating this task.
Techniques based on complex data structures, such as trees or graphs, can be used instead of using a linear list. The more complex the data structure, the more memory and preprocessing are needed. The less complex the data structure, the less the technique will be flexible and high performing. For instance, rules can be grouped using their target netmask, but this is meaningless for application layer rules, for multicast rules, or when a node that has a certain resource to be filtered roams to a new network. Moreover, with a mesh network made of thousands of nodes, there are thousands of netmasks, so filtering is still cumbersome. This gets even worse with networks based on IPv6 addresses. Both these approaches are hardly applicable when the rules do not match IP addresses and TCP/UDP ports but layer7 data inside a packet.
In this work, we take a different direction. We keep the simplicity of linear lists, but we exploit the cooperative nature of mesh networks to reduce the overhead for each single node.
2.2 Firewalls with large rulesets
Before we detail the proposed approach, we further investigate the consequences of large rulesets on the performance of the network. Figure 2 reports the increment in the processing time of a single packet when the ruleset size grows. The data have been measured using an embedded system equipped with a 400MHz processor and 128 Mbytes of RAM over a wired network. Fifty percent of the rules matches the network and transport layer fields; the rest matches the packet contents at layer 7. Contrary to the results we obtained in a previous work [14], where the tests were carried without traffic, the measures have been taken when the node is under a load of 1 Mbit/s.
With up to 3,000 rules, the delay grows almost linearly, meaning that the system is able to handle the load as expected. After that threshold, the delay grows at a faster pace and arrives close to 0.5 s with 5,000 rules. Since this delay is introduced by every node for every hop, the total roundtrip time in a mesh network using large rulesets makes the network unusable. Filtering is simply not a function that can be introduced ‘for free’ when the rulesets get large.
3 Filtering based on route length
Consider a network N like the one in Figure 1 where a proactive routing protocol is running (from now on, we refer to mesh nodes simply as ‘nodes’). Each node j is connected to a subnet, and for each node j, there exists a ruleset r_{ j } that is used to filter the traffic directed to its own subnet, to itself, or to the Internet across the connection attached to its subnet. Node j will sponsor its own ruleset to the rest of the nodes so that every node is aware of a global ruleset $R=\bigcup _{j}{r}_{j};\phantom{\rule{2.77626pt}{0ex}}\forall j\in N$. The routing table of any node i contains the next hop and the distance in terms of hops to reach j (and all the nodes in the subnet of j). This is the usual configuration of a mesh network configured, for instance, with optimized link state routing (OLSR) protocol [21].
Now consider a packet p coming from the subnet of node k that is forwarded by node i and is destined to the subnet of node j. Assume that for this packet, there exists a rule in R that will drop it when it arrives to j. The aim of the waterwall is to drop the packet as close as possible to the source node k. The simplest solution is to enforce the whole R directly in k. This solution has two drawbacks: it is impossible if R is made up of thousands of rules for the considerations introduced in Section 2.2, and it would be extremely easy to circumvent since, when the packet leaves its own subnet, it is not filtered anymore. A node k that behaves in a malicious way can start an attack against a node j, and all the traffic will arrive at the destination^{b}. To tackle the second issue, more nodes on the path from k to j will have to apply the filter, thus aggravating the first issue. The strategy we propose is to filter at each hop with only a subset of the global ruleset that is dynamically chosen for each packet and for each hop. We aim to use larger rulesets for nodes close to the source and smaller rulesets for nodes far from the source. The definition of the strategy behind this intuition, however, requires some more discussion and formalization. Table 1 contains a set of definitions that are used (and detailed) in the rest of the paper.
First of all, how should a node i estimate its distance from the source node k of a packet p with destination to j? The simplest way is to look at the timetolive (TTL) field in the IP header, but it is also the easiest to circumvent. The attacker could simply forge packets with a low TTL and avoid the waterwall to be effective.
Another way is to use the distance from the source node k to i, but the attacker can set the source IP to the address of another node w and, contrary to what happens in the Internet, it would still be able to intercept the replies provided it is in the shortest path between w and i. Summing up, node i cannot trust the contents of a packet coming from a node that is possibly an attacker, so the distance from the source must be estimated with other means.
What we propose is that each node uses a subset of rules R_{ i } whose size depends on the ratio between the distance from the destination and m(i), the average distance of node i to any node in the network. In practice, node i compares the length of the remaining path to the destination with the average length of the path of packets generated by i itself. We define P^{f} as the probability that node i filters a packet going from node k to node j:
δ is a parameter that can be used to limit the maximum number of rules enforced in a single node. Node i will use a random subset R_{ i } of R of size P^{f}(k,i,j)×R, ensuring that P^{f}(k,i,j) is the probability that i filters p. If R is organized as a linear list, this can be implemented as starting to scan the list from a random point for a portion of the list of size R_{ i }. When i is close to j, the fraction $\frac{{\text{sp}}_{l}(i,j)}{m\left(i\right)}$ decreases; in contrary, when i is close to k, the value of P^{f}(k,i,j) is close to 1·δ. We also define t(i) as the value of sp_{ l }(i,j) averaged on all routes passing through i between any couple (k,j). If we call C(i) the set of all the couples (k,j) for which i∈sp(k,j), then
To understand how our approach scales with the size and shape of the network graph, we have to understand the behavior of t(i). Let us define m and t as the average m(i) and t(i) computed on every node, respectively; m is the average number of hops in the network, and t is the average number of hops remaining after a packet is forwarded by any node, averaged on all the nodes. Intuitively, t must be smaller than m, but how do t(i) and m(i) change depending on the position of i in the network? In the next sections, we will first present the results based on an example of linear topology, then we will analyze a more complex twodimensional (2D) topology.
3.1 1D Linear topology
As a clarifying example, we take a linear topology with 10 nodes and report the average values of t(i), m(i), and t(i)/m(i) in Figure 3.
It can be noticed that the values of m(i) are influenced by the position of i in the topology. In particular, nodes that are close to the periphery will have larger values compared to nodes that are in the center of the topology. This can be explained noting that when i is in the periphery of the network, its average distance from the other nodes is larger than when i is in the center, so m(i) is higher on the periphery. It is also easy to see that in this simple topology, if we compute t(i) excluding the packets that are generated by i itself, t(i) is constant. This would make the ratio t(i)/m(i) decrease for nodes close to the extreme ends of the network. In the figure, instead, we plot t(i) including also the packets generated by node i, which increases the values of t(i) on the periphery. This takes into account that in our scenario, each node is a gateway for its own subnet, so the first hop is counted in its own subnet. Even in this case, t(i)/m(i) is still larger for nodes that are central in the topology.
We expect the central nodes of the network to be more congested than the nodes in the borders since the number of shortest paths that pass across them is higher. Considering this, the shape of t(i)/m(i) introduces a positive effect: the more p gets close to the center of the network, the higher is the chance of being filtered. The practical consequence is that when p is moving from the periphery to the center of the network, that is more congested, its chances of being filtered are increased. When p has already passed the central region of the network, the chances of being filtered decrease. If we look it from a different perspective, we impose a larger filtering effort for packets that are going towards the most loaded area of the network because we want to save resources in that area where they are more precious. When the packets have passed the central area, we spend less effort to filter them since they are directed to the periphery of the network, which is less congested; in any case, packets will be filtered at the destination.
We now define the probability that a packet p is filtered after h hops from the source node k when it is destined to node j:
moves the dependency of P^{f}() from the node i where the packet is filtered to the position of i in the route from k to j. ${P}_{h}^{\phantom{\rule{1.5pt}{0ex}}f}\left(\right)$ can be averaged for all the couples k,j in order to keep only the dependency on h. Exploiting ${P}_{h}^{\phantom{\rule{1.5pt}{0ex}}f}\left(\right)$, we can compute the probability that p arrives at h hops from the source node, which we call P_{ a }(h) (arrival probability):
In Figure 4, we report P_{ a } for the same network considered in Figure 3 when δ=0.5.
The diameter of the network is equal to nine hops. When the packet arrives at the destination, it is filtered with a destinationspecific ruleset, so we do not include the last hop in the curve. We have numbered them from 0 to 8, indicating that the first chance of being filtered is on node k itself. Figure 4 shows that we obtain indeed the desired effect: the chances of a packet to be filtered are higher close to the source and decrease when it gets close to destination.
3.2 2D Topologies
In the linear topology described so far, the distance between two nodes is given by the modulus of the difference between their node IDs, so the results are obtained by means of simple algebra. When the network topology is defined on a 2D plane, more complex instruments must be used. The most suitable instrument to study the behavior of a mesh network with a 2D topology is computer simulations; nevertheless, we want to test our technique against networks that may grow up to hundreds of nodes. Network simulators cannot handle scenarios of such size; thus, we use Python NetworkX library to evaluate the characteristics associated with large topologies. For some applications, approximating a wireless mesh network with an abstract graph may be a simplification that is too far from reality. In our case, we rely on the existence of a proactive routing protocol running in the mesh network. We are not interested in physical layer and MAC layer performances (that are more sensitive to the simplifications introduced by graph analysis); we operate directly on the graph that the routing protocol generates, assuming that it is able to find neighbor nodes, to identify and use only symmetric links, and to build the routing table from any source k to any destination j. This is perfectly compatible with, for instance, the widely used OLSR protocol. Note also that we assume the routing protocol uses a shortestpath metric, and we use NetworkX functions in order to compute the values of sp_{ l }(i,j) directly on the graph. It is out of the scope of this paper to show it, but we believe that the same approach can be applied even when the routing is not a simple minimum hop. In this case, the graph will be a weighted one where it is still possible to compute m(i) and t(i) taking into account the weights of each graph edge.
To test the performance of the waterwall, we will use two metrics introduced in previous works [14, 22] and defined as follows:

M_{ 1 }( k , j ). It counts each falsepositives on the route from k to j, that is, it is incremented each time an unwanted packet is forwarded on the path from the sender to the destination. It is normalized on the route length from k to j, so it expresses the fraction of the path that p is able to reach before being filtered.

M_{ 2 }( k , j ). It counts each falsepositives endtoend, that is, it is incremented each time an unwanted packet arrives to j. It is normalized to 1, so it represents the probability of unwanted traffic to arrive to destination j.
When averaged on every couple (k,j), M_{1} gives an estimation of the impact of falsepositives on the whole network traffic. For instance, when a node that has been infected by a worm starts a DoS attack against another host, M_{1} tells how much the waterwall fails to mitigate this attack in terms of wasted network resources.
M_{2} instead measures the inefficiency in filtering traffic directed against a specific host. In our scenario, the destination node j applies its own ruleset so that M_{2} always goes to zero when p arrives to its destination. But we consider it since it is useful in other scenarios (for instance, for intrusion detection or when some traffic is forbidden by a network administrator but not all nodes support filtering).
M_{2}(k,j)=P_{ a }(k,j) as it is the probability of not being filtered on the whole path from k to j. M_{1} is defined as the average number of hops that p makes before being discarded:
The first term of the equation takes into account packets that are filtered before they arrive to the destination (including node k). It is the sum of the path length from k to i, multiplied by the probability of reaching i and multiplied again by the probability of being filtered on node i. The second terms takes into consideration the packets that arrive to the destination j.
One more evaluation parameter we consider is the average endtoend delay for every route in the network. For a network in which every node j has a ruleset of size r_{ j }=30, for each route, we compute the average endtoend delay introducing at every node a processing delay d that depends on P^{f}(k,i,j). The value of d is taken directly from the data measured on a real platform and reported in Figure 2. The delay thus depends on the total number of nodes and on the value of the δ parameter.
Figures 5, 6, and 7 report the value of the metrics M_{1}, M_{2}, and delay for a 2D topology with random placement of nodes, increasing the network size and varying δ. The nodes are placed in an area of growing size with constant spatial density of nodes, and each node is connected to the neighbors that fall inside a radius of 70 m. Using NetworkX primitives, we are able to compute the shortest paths on the considered graphs and compute the equations we have defined so far.
We can see that, as expected, M_{1} and M_{2} decrease when δ is increased (recall that M_{1} and M_{2} measure falsepositives, so they are measures of badness). This is intuitive since a larger δ corresponds to less falsepositives. Less intuitive is the fact that given a certain δ, a larger network has smaller values of M_{1} and M_{2}. In the previous section, we have shown that the values of t(i) are smaller if i is close to the periphery of the network; this is true also in 2D topologies. As a consequence, the ratio t(i)/m(i) is smaller in the periphery of the network as can be seen in Figure 3. In a 2D topology, the periphery of the network is represented by nodes that are placed on the perimeter of the covered area and that have fewer neighbors compared to the ones that are at the center of the area. If we keep the density constant and increase the number of nodes, we increase the covered area and, consequently, its perimeter. But the perimeter of the network grows more slowly compared to the area, so in larger networks the fraction of the nodes on the perimeter becomes less relevant. As a consequence, a larger network will have a larger average value of the t/m ratio and will filter more packets per hop. Figure 7 shows the average endtoend delay for the networks under consideration. A larger δ corresponds to higher processing delays introduced at every hop.
To interpret these results, consider a network with 200 nodes and 30 rules per node, thus R=6,000. With such a large ruleset, each hop would introduce a delay larger than 0.45 s, as can be seen in Figure 2. If we consider that m in such a network has an average larger than 8, this would produce an average delay larger than 3.6 s, which would make the network unusable. Instead, with the waterwall approach, we can configure the δ parameter in order to find the right equilibrium between latency and filtering efficiency; for instance, if δ=0.4 we obtain an average M_{1} lower than 50% and keep the delay around 0.15 s. That is, we decrease the filtering efficiency to one half, but we reduce the delay by a factor of 24.
Still, if a higher performance of the firewall is needed with a large network size, the delay introduced by the waterwall must be further reduced. In the next section, we introduce an optimization that, at the cost of a simple ordering function applied to the ruleset, can further reduce the falsepositive rate.
3.3 Smart ruleset partition
When node i processes a packet p from k to j, it randomly chooses a position λ in R and uses a portion R_{ i } of the ruleset of size P^{f}(k,i,j)×R starting from position λ. Packet p is tested against all the rules in R_{ i }. The probability of evaluating the same rule twice in the path from source to destination given that each choice of λ is independent at each hop is high due to the socalled birthday paradox. If we are able to use minimum overlapping R_{ i } sets, then we can expect that M_{1} and M_{2} decrease faster with the distance from the source. The results obtained with disjoint rulesets are reported in Figures 8 and 9 and represent an upper bound of the gain reachable with this improvement. Comparing Figures 5 and 8, we can see that to obtain similar results, a lower value of δ is sufficient; for instance, to have M_{1} below 50%, δ=0.3 is sufficient (even if δ=0.2 is below 50% for a network larger than 100 nodes) which corresponds in Figure 7 to a tolerable delay even for a network with 300 nodes.
We can thus try to find a smarter way to choose λ in order to minimize the intersection between different R_{ i } along the path. In this paper, we introduce two proposals to be further evaluated in future works.
The first is to choose λ as a function of specific network parameters of p and node i:
where

⊕ is the XOR operator,  is the concatenation operator, and (mod) is the modulo operation

IP_{dest} and IP_{src} are the destination and source IP addresses of p, respectively

IP_{chk} and IP_{id} are checksum and identification field of the IP header, respectively. Those fields are immutable from source to destination, and their combination is unique for each packet. They are concatenated since their size is just half of the size of an IP address.

$$\hat{{\text{IP}}_{i}}$$
is the IP address of node i with inverted byte order. Bytes are swapped since we want the host identifier of the IP that has a larger variability to be the most significant byte. Otherwise, the modulo operation may just return the same value for each host.
The rationale of this choice is to produce a λ that changes from hop to hop depending on a unique parameter of node i. In this way, we spread the choices of λ with a deterministic algorithm and try to get a better coverage of R. Nevertheless, we have to avoid that a node i deterministically selects the same λ for all the packets belonging to the same flow (identified by IPs and ports). If this condition does not hold and a node always chooses the same ruleset to filter the packets, then there is a chance that portions of R are never covered.
If this condition does not hold, then an attacker may try to precompute the behavior of the nodes in between the attacker and the destination and choose the route with the highest probability of not being filtered. For this reason, we introduce in Equation 6 the unique identifier IP_{id} and the checksum IP_{chk} in order to make λ hard to predict along the evolution of the traffic flow.
As an alternative approach, we could use $\frac{{\text{sp}}_{l}(i,j)}{m\left(i\right)}$ to determine not only the size of R_{ i }, but also its position in R. This way, λ would not depend on some identifier of the node that is performing the filtering (IP_{ i }) but on the estimation of its position in the path from source to destination. This proposal, as the previous one, is an initial design that needs further analysis.
For both these approaches to be applicable, every node must keep the rules in its ruleset in an ordered list. Nevertheless, we do not lose the generality of the approach since the ordering is independent on the semantics of the rule, so it can be applied to rules of any kind. For instance, given the data structure that is used to store the rule in the operating system, an ordering based on a fingerprint on this data structure is sufficient.
Note that in all the results we have shown so far, M_{1} and M_{2} hardly reach values lower than 0.1. This is due to the fact that P^{f}(k,i,j) in Equation 1 may not be equal to 1 even if δ=1 at the first hop for the case sp_{ l }(i,j)>m(i). To have values of P^{f}(k,i,j) closer to 1, we can use δ>1. In this case, Equation 1 must be modified in order to make the value of P^{f}() bounded by 1, as follows:
The rest of the equations do not change. In Figure 10, we report M_{1} and M_{2} when δ is larger than 1; for the sake of clarity, we report only the smallest scenario (50 nodes). It can be noticed that the metrics follow the same trend observed for values lower than 1 and reach values lower than 0.1.
4 Conclusions
In this paper, we have introduced a new model to perform distributed firewalling in mesh networks that take advantage of the multihop nature of those networks to share the load needed for the filtering function. To stress the difference with a traditional firewall, we chose the term waterwall indicating a fluid and distributed network function, instead of a single filtering host. We have shown that the waterwall can be used to greatly reduce the unwanted traffic in a mesh network. To quantify the cost of the filtering function, we used the delay measured on an embedded processor by large rulesets and have shown that our approach scales well up to mesh networks of hundreds of nodes. The source code used to realize the test is available on the website of the main project financing this work (http://www.pervacy.eu).
As future work, we intend to implement the filtering strategy on a network simulator in order to test and optimize the enhancement described in Section 3.3. Afterwards, we plan to embed this technique in some widely used routing protocol implementation, such as OLSR, in order to test on real networks.
Endnotes
^{a} At the time of writing, the Guifi network is made up of about 22,000 nodes and growing at a pace of a hundred nodes per week. The network is divided in zones, each one can be formed by hundreds of nodes.
^{b} The attacker we take into consideration is able to mangle the contents of packets, but we imagine that the routing protocol implements some security measures to avoid, or at least identify, attacks on network routing.
References
 1.
Akyildiz IF, Wang X, Wang W: Wireless mesh networks: a survey. Elsevier Comput. Netw 2005, 47(4):445487. 10.1016/j.comnet.2004.12.001
 2.
Frangoudis P, Polyzos G, Kemerlis V: Wireless community networks: an alternative approach for nomadic broadband network access. IEEE Commun. Mag 2011, 49(5):206213.
 3.
Vega D, CerdaAlabern L, Navarro L, Meseguer R: Topology patterns of a community network: Guifi.net. In IEEE International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob). Barcelona; 8–10 Oct 2012.
 4.
Min G, Wu Y, AlDubai AY: Performance modelling and analysis of cognitive mesh networks. IEEE Transactions on Communications 2012, 60(6):14741478.
 5.
Ioannidis S, Keromytis AD, Bellovin SM, Smith JM: Implementing a distributed firewall. In 7th ACM Conference on Computer and Communications Security (CCS). Athens; 01–04 Nov 2000.
 6.
Zhao H, Chau CK, Bellovin SM: ROFL: routing as the firewall layer. In Workshop on New security paradigms (NSPW). Lake Tahoe; 22–25 Sept 2008.
 7.
Zhao H, Bellovin SM: Source prefix filtering in ROFL. Technical report CUCS03309,. Columbia University, 2009
 8.
Zhao H, Bellovin SM: High performance firewalls in MANETs. In IEEE International Conference on Mobile Adhoc and Sensor Networks. Hangzhou; 20–22 Dec 2010.
 9.
Zhao H, Lobo J, Roy A, Bellovin SM: Policy refinement of network services for MANETs. In IFIP/IEEE International Symposium on Integrated Network Management (IM). Dublin; 23–27 May 2011.
 10.
Alicherry M, Keromytis A, Stavrou A: Distributed firewall for MANETs. Technical report. Columbia University (Computer Science Technical Report Series), 2008
 11.
Fantacci R, Maccari L, Ayuso P, Gasca R: Efficient packet filtering in wireless ad hoc networks. IEEE Commun. Mag 2008, 46(2):104110.
 12.
Maccari L, Fantacci R, Neira P, Gasca R: Mesh network firewalling with bloom filters. In IEEE International Conference on Communications (ICC). Glasgow; 24–28 June 2007.
 13.
Neira P, Gasca R, Maccari L, Lefevre L: Stateful firewalling for wireless mesh networks. In International Conference on New Technologies, Mobility and Security, 2008 (NTMS). Tangier; 5–7 Nov 2008.
 14.
Maccari L: A collaborative firewall for wireless adhoc social networks. In International Conference on Security and Cryptography (SECRYPT). Rome; 24–27 July 2012.
 15.
Taghizadeh M, Khakpour A, Liu A, Biswas S: Collaborative firewalling in wireless networks. In IEEE International Conference on Computer Communications (INFOCOM). Shanghai; 10–15 Apr 2011.
 16.
Zhang H, DeCleene B, Kurose J, Towsley D: Bootstrapping denybydefault access control for mobile adhoc networks. In IEEE Military Communications Conference (MILCOM). San Diego; 17–19 Nov 2008.
 17.
Alicherry M, Keromytis AD, Stavrou A: Evaluating a collaborative defense architecture for MANETs. In IEEE International Conference on Internet Multimedia Services Architecture and Applications (IMSAA). Bangalore; 9–11 Dec 2009.
 18.
Esposito M, Mazzariello C, Oliviero F, Peluso L, Romano SP, Sansone C: Intrusion detection and reaction: an integrated approach to network security. In Intrusion Detection Systems, ed. by R di Pietro, LV Mancini. Advances in Information Security, vol. 38 172–210, 2008. New York): Springer,;
 19.
Panos C, Xenakis C, Stavrakakis I: A novel intrusion detection system for MANETs. In International Conference on Security and Cryptography (SECRYPT). (Athens; 26–28 July 2010.
 20.
CerdaAlabern L: On the topology characterization of Guifi.net. In IEEE International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob). Barcelona; 8–10 Oct 2012.
 21.
Clausen T, Jacquet P (Eds): 3626  Optimized Link State Routing Protocol (OLSR). ; 2003.
 22.
Maccari L, Lo Cigno R: Privacy in the pervasive era: a distributed firewall approach. In IEEE/IFIP Conference on Wireless On demand Network Systems and Services (WONS), Poster Session. Courmayeur; 23 Dec 2012.
Acknowledgements
This work has been financed by Provincia di Trento under The Trentino programme of 360 research, training and mobility of postdoctoral researchers, incoming Postdocs 361 2010 CALL 1, PCOFUNDGA2008226070. Renato Lo Cigno has been partially funded by the European Commission under grant agreement no. FP7288535 ‘CONFINE’: Open Call 1, Open Source P2P Streaming for Community Networks –OSPS–.
Author information
Affiliations
Corresponding author
Additional information
Competing interests
Both authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Maccari, L., Cigno, R.L. Waterwall: a cooperative, distributed firewall for wireless mesh networks. J Wireless Com Network 2013, 225 (2013). https://doi.org/10.1186/168714992013225
Received:
Accepted:
Published:
Keywords
 Mesh Network
 Internet Protocol
 Intrusion Detection System
 Wireless Mesh Network
 Mesh Router