IPv6 Neighbor Discovery Responder for KVM VPS
This article is originally published on yoursunny.com blog https://yoursunny.com/t/2021/ndpresponder/
I Want IPv6 for Docker
I'm playing with Docker these days, and I want IPv6 in my Docker containers.
The best guide for enabling IPv6 in Docker is how to enable IPv6 for Docker containers on Ubuntu 18.04.
The first method in that article assigns private IPv6 addresses to containers, and uses IPv6 NAT similar to how Docker handles IPv4 NAT.
I quickly got it working, but I noticed an undesirable behavior: Network Address Translation (NAT) changes the source port number of outgoing UDP datagrams, even if there's a port forwarding rule for inbound traffic; consequently, a UDP flow with the same source and destination ports is being recognized as two separate flows.
$ docker exec nfd nfdc face show 262 faceid=262 remote=udp6://[2001:db8:f440:2:eb26:f0a9:4dc3:1]:6363 local=udp6://[fd00:2001:db8:4d55:0:242:ac11:4]:6363 congestion={base-marking-interval=100ms default-threshold=65536B} mtu=1337 counters={in={25i 4603d 2n 1179907B} out={11921i 14d 0n 1506905B}} flags={non-local permanent point-to-point congestion-marking} $ docker exec nfd nfdc face show 270 faceid=270 remote=udp6://[2001:db8:f440:2:eb26:f0a9:4dc3:1]:1024 local=udp6://[fd00:2001:db8:4d55:0:242:ac11:4]:6363 expires=0s congestion={base-marking-interval=100ms default-threshold=65536B} mtu=1337 counters={in={11880i 0d 0n 1498032B} out={0i 4594d 0n 1175786B}} flags={non-local on-demand point-to-point congestion-marking}
The second method in that article allows every container to have a public IPv6 address.
It avoids NAT and the problems that come with it, but requires the host to have a routed IPv6 subnet.
However, routed IPv6 is hard to come by on KVM servers, because virtualization platform such as Virtualizor does not support routed IPv6 subnets, but can only provide on-link IPv6.
On-Link IPv6 vs Routed IPv6
So what's the difference between on-link IPv6 and routed IPv6, anyway?
It differs in how the router at the previous hop is configured to reach a destination IP address.
Let me explain in IPv4 terms first:
|--------| 192.0.2.1/24 |--------| 198.51.100.1/24 |-----------| | router |--------------------| server |--------------------| container | |--------| 192.0.2.2/24 |--------| 198.51.100.2/24 |-----------| (192.0.2.16-23/24) | | 192.0.2.17/28 |-----------| \-------------------------| container | 192.0.2.18/28 |-----------|
The server has on-link IP address 192.0.2.2.
- The router knows this IP address is on-link because it is in the 192.0.2.0/24 subnet that is configured on the router interface.
- To deliver a packet to 192.0.2.2, the router sends an ARP query of 192.0.2.2 to learn the server's MAC address, which should be responded by the server.
The server has routed IP subnet 198.51.100.0/24.
- The router must be configured to know: 198.51.100.0/24 is reachable via 192.0.2.2.
- To deliver a packet to 198.51.100.2, the router first queries its routing table and finds the above entry, then sends an ARP query to learn the MAC address of 192.0.2.2 which should be responded by the server, and finally delivers the packet to the learned MAC address.
The main difference is what IP address is enclosed in the ARP query:
- If the destination IP address is an on-link IP address, the ARP query contains the destination IP address itself.
- If the destination IP address is in a routed subnet, the ARP query contains the nexthop IP address, as determined by the routing table.
If I want to assign an on-link IPv4 address (e.g. 192.0.2.18/28) to a container, the server should be made to answer ARP queries for that IP address so that the router would deliver packets to the server, and then forwards these packets to the container.
- This technique is called ARP proxy, in which the server responds to ARP queries on behalf of the container.
The situation is a bit more complex in IPv6 because each network interface can have multiple IPv6 addresses, but the same concept applies.
Instead of Address Resolution Protocol (ARP), IPv6 uses Neighbor Discovery Protocol that is part of ICMPv6.
A few terminology differs:
IPv4 | IPv6 |
---|---|
ARP | Neighbor Discovery Protocol (NDP) |
ARP query | ICMPv6 Neighbor Solicitation |
ARP reply | ICMPv6 Neighbor Advertisement |
ARP proxy | NDP proxy |
If I want to assign an on-link IPv6 address to a container, the server should respond to neighbor solicitations for that IP address, so that the router would deliver packets to the server.
After that, the server's Linux kernel could route the packet to the container's bridge, as if the destination IPv6 address was in a routed subnet.
NDP Proxy Daemon to the Rescue, I Hope?
ndppd, or NDP Proxy Daemon, is a program that listens for neighbor solicitations on a network interface and responds with neighbor advertisements.
It is often recommended for dealing with the scenario when the server has only on-link IPv6 but we need a routed IPv6 subnet.
I installed ndppd on one of my servers, and it worked as expected with this configuration:
proxy uplink { rule 2001:db8:fbc0:2:646f:636b:6572::/112 { auto } }
I can start up a Docker container with a public IPv6 address.
It can reach the IPv6 Internet, and can be ping-ed from outside.
$ docker network create --ipv6 --subnet=172.26.0.0/16 --subnet=2001:db8:fbc0:2:646f:636b:6572::/112 ipv6exposed 118c3a9e00595262e41b8cb839a55d1bc7bc54979a1ff76b5993273d82eea1f4 $ docker run -it --rm --network ipv6exposed --ip6 2001:db8:fbc0:2:646f:636b:6572:d002 alpine # wget -q -O- https://www.cloudflare.com/cdn-cgi/trace | grep ip ip=2001:db8:fbc0:2:646f:636b:6572:d002
However, when I repeated the same setup on another KVM server, things didn't go well: the container cannot reach the IPv6 Internet at all.
$ docker run -it --rm --network ipv6exposed --ip6 2001:db8:f440:2:646f:636b:6572:d003 alpine / # ping -c 4 ipv6.google.com PING ipv6.google.com (2607:f8b0:400a:809::200e): 56 data bytes --- ipv6.google.com ping statistics --- 4 packets transmitted, 0 packets received, 100% packet loss
What's Wrong with ndppd?
Why ndppd works on the first server, but does not work on the second server?
What's the difference?
We need to go deeper, so I turned to tcpdump.
On the first server, I see:
$ sudo tcpdump -pi uplink icmp6 19:13:17.958191 IP6 2001:db8:fbc0::1 > ff02::1:ff72:d002: ICMP6, neighbor solicitation, who has 2001:db8:fbc0:2:646f:636b:6572:d002, length 32 19:13:17.958472 IP6 2001:db8:fbc0:2::2 > 2001:db8:fbc0::1: ICMP6, neighbor advertisement, tgt is 2001:db8:fbc0:2:646f:636b:6572:d002, length 32
- The neighbor solicitation from the router comes from a global IPv6 address.
The server responds with a neighbor advertisement from its global IPv6 address.
Note that this address differs from the container's address.IPv6 works in the container.
On the second server, I see:
$ sudo tcpdump -pi uplink icmp6 00:07:53.617438 IP6 fe80::669d:99ff:feb1:55b8 > ff02::1:ff72:d003: ICMP6, neighbor solicitation, who has 2001:db8:f440:2:646f:636b:6572:d003, length 32 00:07:53.617714 IP6 fe80::216:3eff:fedd:7c83 > fe80::669d:99ff:feb1:55b8: ICMP6, neighbor advertisement, tgt is 2001:db8:f440:2:646f:636b:6572:d003, length 32
- The neighbor solicitation from the router comes from a link-local IPv6 address.
- The server responds with a neighbor advertisement from its link-local IPv6 address.
- IPv6 does not work in the container.
Since IPv6 has been working on the second server for IPv6 addresses assigned to the server itself, I added a new IPv6 address and captured its NDP exchange:
$ sudo tcpdump -pi uplink icmp6 00:29:39.378544 IP6 fe80::669d:99ff:feb1:55b8 > ff02::1:ff00:a006: ICMP6, neighbor solicitation, who has 2001:db8:f440:2::a006, length 32 00:29:39.378581 IP6 2001:db8:f440:2::a006 > fe80::669d:99ff:feb1:55b8: ICMP6, neighbor advertisement, tgt is 2001:db8:f440:2::a006, length 32
- The neighbor solicitation from the router comes from a link-local IPv6 address, same as above.
- The server responds with a neighbor advertisement from the target global IPv6 address.
- IPv6 works on the server from this address.
In IPv6, each network interface can have multiple IPv6 addresses.
When the Linux kernel responds to a neighbor solicitation in which the target address is assigned to the same network interface, it uses that particular address as the source address.
On the other hand, ndppd transmits neighbor advertisements via a PF_INET6 socket and does not specify the source address.
In this case, some complicated rules for default address selection come into play.
One of these rules is preferring a source address that has the same scope as the destination address (i.e. the router).
On my first server, the router uses a global address, and the server selects a global address as the source address on its neighbor advertisement.
On my second server, the router uses a link-local address, and the server selects a link-local address, too.
In an unfiltered network, the router wouldn't care where the neighbor advertisements come from.
However, when it comes to a KVM server on Virtualizor, the hypervisor would treat such packets as attempted IP spoofing attacks, and drop them via ebtables rules.
Consequently, the neighbor advertisement never reaches the router, and the router has no way to know how to reach the container's IPv6 address.
ndpresponder: NDP Responder for KVM VPS
I tried a few tricks such as deprecating the link-local address, but none of them worked.
Thus, I made my own NDP responder that sends neighbor advertisements from the target address.
ndpresponder is a Go program using the GoPacket library.
- The program opens an AF_PACKET socket, with a BPF filter for ICMPv6 neighbor solicitation messages.
- When a neighbor solicitation arrives, it checks the target address against a user-supplied IP range.
- If the target address is in the range used for Docker containers, the program constructs an ICMPv6 neighbor advertisement messages and transmits it through the same AF_PACKET socket.
A major difference from ndppd is that, the source IPv6 address on a neighbor advertisement message is always set to the same value as the target address of the neighbor solicitation, so that the message wouldn't be dropped by the hypervisor.
This is made possible because I'm sending the message via an AF_PACKET socket, instead of the AF_INET6 socket used by ndppd.
ndpresponder operates similarly as ndppd in "static" mode.
It does not forward neighbor advertisements to the destination subnet like ndppd does in its "auto" mode, but this feature isn't important on a KVM server.
If ndppd doesn't seem to work on your KVM VPS, give ndpresponder a try!
Head to my GitHub repository for installation and usage instructions:
https://github.com/yoursunny/ndpresponder
Comments
Nice. Candidate for the Blog?
Keith
interesting,. i will read more about it-
1 question what tool did you use to draw the netwrok ( Let me explain in IPv4 terms first: )
I don't read the LES blog myself.
I'd like the Google Search result to index my blog instead.
You do not need a tool to type an ASCII art.
The Overtype mode for Visual Studio Code is very helpful.
Occasionally I type SVG source code to create more complicated artwork.
What is a "Face" in Named Data Networking? article has two samples.
if you made the above by hand then i salute you. Hopefully you can join the content writers group and look forward for new info from you.
I spent the whole day getting ndpresponder to work on Webhosting24 Cloud.
@tomazu gives everyone a /48, but the router does not deliver neighbor solicitation packets to the server if I ping one of the addresses in my subnet from a client machine.
Nevertheless, adding an IPv6 address with
ip addr add
command works.After carefully comparing every address, every packet, and every bit, I found the difference.
Their router expects the KVM server to transmit a neighbor solicitation from the newly added IPv6 address targeting the router, and then the router would deliver a neighbor solicitation to my new IPv6 address, after that the address becomes reachable.
I had to rewrite half of ndpresponder to adjust to this procedure: the program now hooks onto Docker event stream.
This allows the program to know when a new container is connected, so that it can transmit a neighbor solicitation packet on behalf of the container and let the router know the new address.
Next year, I'll ask for routed IPv6.
Free NAT KVM | Free NAT LXC
Routed IPv6 Hall of Fame
Include routed IPv6, at least /64 subnet, to get listed.
well thank you for trying without opening a ticket, but you could have asked :-)
Just to be sure, is this in Munich or in Singapore?
isn't that the way it is supposed to be? Otherwise you would need a single IPv6 assigned out of a IPv6 subnet and your /48 IPv6 subnet routed to that!?
Webhosting24: We are a Hoster, so You can be a Builder.® Build something today: 1 IPv4 & /48 IPv6 subnet always included
Munich Cloud Servers: NVMe RAID10 & Unmetered Bandwidth Singapore Launch Thread - Premium Connectivity, Ryzen CPU & NVMe RAID1
If I open a ticket, you fiddle some settings without telling me what, and then I run into same problem in another network, the cycle repeats.
That's why I try to poke around first, and hope the identified solutions can work in more places.
Unless, there's a hardware fault:
https://talk.lowendspirit.com/discussion/comment/62635/#Comment_62635
Munich,
wh24-1617893523.local
in billing panel.For on-link IPv6, this setup differs from other providers, such as Nexril and Evolution Host and WebHorizon:
The setup at Webhosting24 is that, the KVM server must actively declare the existence of an IPv6 address by transmitting:
Without those, no incoming packets or neighbor solicitation will be delivered.
I'm not well versed in IPv6 related protocols.
If your Cisco router expects these packets, I suppose some RFC requires the host system to transmit them upon address assignment.
ndppd would not transmit them because it's a hack, not a fully compliant implementation.
For routed IPv6, what TunnelBroker does is giving each client two prefixes in separate ranges:
However, I'm told that Virtualizor lacks support for routed prefix, if live migration is needed:
https://www.lowendtalk.com/discussion/comment/3209268/#Comment_3209268
Interesting read.
My ISP does a similar thing, a /64 for ND with a separate /48 for PD.
We assign a random IPv6 address from a /96 when the VM is deployed, with a /64 routed to be available to the VM - although the /96 and each /64 come out of the same "larger" range (except the Helsinki Storage as each /64 comes from a separate /56). We're working on adding the ability to route large prefixes if required.
Hey, you use virtualizor? I'm curious how
Webhosting - NVMe SSD, Cloudlinux, Litespeed, SSH Access
KVM VPS Singapore | 256MB NAT VPS - LA, NY, CH, NL, IN, SG, JP starts $7 per year!
We add aaaa:bbbb:cccc:dddd:eeee:ffff:gggg:1/96 as an IP onto an interface on the host. Create an IP Pool and fill in the first part of the "Generate IPv6" section with the details from the /96 and set the gateway/etc.
We always leave the "Generate subnets" bit blank as the /64 side per client is all done manually. It adds a bit of work but it was the best way with Virtualizor to give each VM it's own /64.
Some people are happy enough with just having IPv6 connectivity so they don't use the /64 but it's there if they do.
I've been doing some testing of using a setup like what hetzner do with using fe80::1 as the gateway so there's need for the VM to have an IP address and a separate /64 but I'm not sure how this would play with Virtualizor either.
OK, could you please confirm that your IPv6 gateway is ending in ::fffe ?
if you have your "main" IPv6 address up & running and you are not using any "strange" IPv6 gateway, then this should work. Otherwise I would have to use another configuration (like the ones mentioned with a manual /64 IPv6 + subnet routed over that and/or using the "universal" gateway fe80::1).
Please let me know about the IPv6 gateway, thank you for providing feedback regarding this!
Webhosting24: We are a Hoster, so You can be a Builder.® Build something today: 1 IPv4 & /48 IPv6 subnet always included
Munich Cloud Servers: NVMe RAID10 & Unmetered Bandwidth Singapore Launch Thread - Premium Connectivity, Ryzen CPU & NVMe RAID1
nice hack, thanks for sharing. I can imagine it may be a pain to manage manually for monthly committed orders, but it works.!
Webhosting - NVMe SSD, Cloudlinux, Litespeed, SSH Access
KVM VPS Singapore | 256MB NAT VPS - LA, NY, CH, NL, IN, SG, JP starts $7 per year!
The gateway is
xxxx:xxxx:0:100::1
, which is not in my subnet.When the server was delivered, it's already installed from a template (most likely Debian 10), and I copied the settings from there.
I tried
xxxx:xxxx:yyyy::fffe
, and it is also usable as a gateway.Once again, we blame Virtualizor for:
Even if I have
xxxx:xxxx:yyyy::1/48
assigned to the KVM server, the container is still unreachable unless I actively declare its presence by transmitting a gratuitous neighbor solicitation targeting the new IPv6 address and a neighbor solicitation from the new IPv6 targeting the router.This condition is the same regardless of whether
xxxx:xxxx:0:100::1
orxxxx:xxxx:yyyy::fffe
is used as gateway.You should offer routed IPv6, which would earn you a spot in the routed IPv6 Hall of Fame.
I know this will not comfort you, but I set this up as routed IPv6 in Virtualizor, so I was sure this was tested and working and honestly I never had any problem with the configuration when using ::fffe as default gateway.
Will create a Debian VM myself and perform some additional tests.
Webhosting24: We are a Hoster, so You can be a Builder.® Build something today: 1 IPv4 & /48 IPv6 subnet always included
Munich Cloud Servers: NVMe RAID10 & Unmetered Bandwidth Singapore Launch Thread - Premium Connectivity, Ryzen CPU & NVMe RAID1
Is there an easy way (i.e. other than recompiling) to change the logging level of ndpresponder?
I believe loglevel cannot be changed.
Good news is, recompiling a Go program only takes one second.
Thanks. Think I'm ready to get rid of IPv6 on my VPSes, so might not need to do that.
It's a bad idea to create duplicate content on websites where you can't add a canonical tag. Because of the missing canonical tag, both your LES and LET forum posts are outranking your blog post.
If you want Google to index your website, you should improve your SEO. I can recommend reading https://ahrefs.com/seo and follow their blog.
I've spoken to Virtualizor about this and it is on their todo/release list.
Haven't got a planned release version when it will be implemented
https://clients.mrvm.net
Changing log level is now implemented in revision b119b09cfbbc188bf013e17b4afdc44621b20181.
https://github.com/yoursunny/ndpresponder
Thanks for suggestion.
Thank you for the update! Some (hopefully constructive) feedback:
It works well. The biggest issue I found is memory use. I clocked it at 90 MB compared to ndppd's 4 MB. Not massive, but in the scenario it is targeted to (KVM VPS), 90 MB can translate to 10-20% of RAM (obviously depending on specs). I used
-s -w
flags and ranupx
already. I wonder if it is possible to cut out some of the linked/included libraries?Another situation it doesn't seem to handle (neither does ndppd) is the reverse direction - NDP to the gateway - when using nested virtualization. Let's say I have an LXC container running inside a KVM VPS. If there's no IPv6 traffic coming from the KVM "host", there's no NS message to the router and the neighbor goes invalid. When the LXC container generates IPv6 traffic, it causes a NS message to the gateway but it is sent from the link address, and so gets ignored by the gateway. Thus something has to run on the KVM "host" to keep the gateway's neighbor info "fresh", e.g. periodic
ndisc6
or even a continual ping. What would be good is to regenerate the NS message from the host to the gateway using the global IPv6 as source. Of course, if IPv6 was set up properly & routed to the KVM then this isn't a problem, but you know that often doesn't happen in reality.docker stats
shows the ndpresponder container uses about 10MB RAM, buthtop
shows 90MB.Go compiler always statically links everything, so memory usage is higher than a C program, but lower than Python or Node.js.
I can consider making the Docker network feature optional using build constraints, and see how it affects memory usage.
https://github.com/yoursunny/ndpresponder/blob/b119b09cfbbc188bf013e17b4afdc44621b20181/hostinfo.go#L53-L78
There's already a simple logic that inserts a static neigh entry:
An obvious drawback is that, if the router MAC address suddenly changes, the static neigh entry would prevent outgoing packets.
Sending Neighbor Solicitation manually would not be helpful, because the response wouldn't update kernel's neigh entry.
That (set a static neighbor) is basically what I do manually as a work-around, which seems a bit ugly. As you say, there's a problem if the MAC of the gateway changes. I generally reboot the VPS in that case for other reasons. I didn't notice that your program does it already.