Anycast: Useless? DNS or BGP?
I've been toying around with the idea of using anycast DNS to host geographically distributed network of servers for faster speed.
I'm using two of HostHatch's test IPs, one in Los Angeles and one in Oslo:
https://ping.sx/ping?t=anycast-dns.dash-cloud.com
As you can see, routing is insane... Some east coast locations are routed to Oslo, while sometimes Europe locations are routed to Los Angeles! After averaging out everything, I figure there's practically no benefit to using Anycast DNS.
So, it seems anycast DNS is out of the question.
That leads me to my next question: are anycast IPs any better? E.g. I have HostHatch announce the same IPs in multiple locations.
Or, will that end up being useless like anycast DNS?
Alternatively, I can just use AWS Route 53 with geographic routing or CloudFlare load balancers and call it a day, but I'm curious 
What are your thoughts?
 
                             
                            
Comments
There are two benefits in using anycast.
First, you can reduce latency without using GeoDNS.
A drawback of GeoDNS is that, if the client is using a resolver far from themselves and the resolver doesn't support EDNS Client Subnet extension, the client would get a server IP far from them and experience suboptimal performance.
However, you need a lot of POPs to realize this benefit.
Just having New York, Las Vegas, and Luxembourg isn't enough.
Moreover, you need to adjust BGP routing from time to time, to ensure clients are connecting to the nearest replicas.
Second, you can achieve better reliability in case of a major failure.
If a routing announcement is withdrawn at one location, the same IP is still accessible from other locations.
If you synchronize aggressively, you could even keep the TCP connections working, although the synchronization overhead would be high.
On the other hand, GeoDNS has a delay in dealing with a failure: it has to wait for the client to resend a DNS query.
DNS cache would cause at least 120 seconds of service outage, while BGP could converge in less than 30 seconds.
Muchas gracias!
Very insightful, totally understand.
TensorDock: Hourly Cloud GPUs from $0.32/hour
I’m seeing the same issues in my tests with Route53. Some users are directed to the other side of the world. Keep in mind though that most geodistributed ping tests use datacenter IPs, while your visitors have residential IPs. These tend to do better with GeoDNS.
Don't forget to factor in some crazy routes because many providers will go with "cheap" transit. This often times results in some utterly useless routing despite Anycast "best intentions". This is especially true for mass broadband/residential/mobile carriers and the like (esp. if they are "big").
Unfortunately, I've not seen very consistent anycast routing except from a few (like Cloudflare, who seem to do a good job because of the sheer number of POPs they have, which results in somewhat decent routing most of the time even if cheap transit is involved).
I wish there was some sort of real magic transit that was consistent and provider agnostic (like how the Internet was supposed to be, neutral in the fullest sense without any commercial bias).
If you're a big enough operation you might be able to routinely measure client pings to different POPs and dynamically route based on that.
It seems your GeoDNS setup is weird We do that on daily basis and it works fine for all of our customers. Drop us a message to support [at] gbshouse.com and we can help you with both as we also offer anycast-as-a-service
 We do that on daily basis and it works fine for all of our customers. Drop us a message to support [at] gbshouse.com and we can help you with both as we also offer anycast-as-a-service
Just checked your DNS answers and it will always return 2 IPs.
One from LAX and another one is from OSL. It seems that you've misconfigured your GeoDNS settings :-)
Misaka.io | Blazing fast AnyCast DNS with 60+ POPs GeoDNS, AXFR, DNSSEC supported.
And Reliable high-performance virtual server with BGP support | Ashburn, New York, Kansas, Madrid, St. Petersburg, Hong Kong, Tokyo, São Paulo, Johannesburg
ping.sx | Ping any server from global locations in parallel
If you're willing to play around a bit with CF Workers, you can get it done pretty nicely (you can almost achieve anything you want). Here's an example you can get started with: https://community.cloudflare.com/t/geographic-routing-and-load-balancing-with-cloudflare-workers/21900
I thought these were pretty good comments.
Part of the consideration is what resources you have at your disposal. I've mostly experimented with Anycast for GeoDNS and am definitely in the "low-end user" category. My experience (from that perspective) is that Anycast is only as useful as your monitoring. While it is true that BGP can adjust quickly, it has to be triggered to do that. Practically if you're a low-end user like me, cases where a network connection is lost are fairly easily handled (maybe upstream by the provider), but what is more problematic is where the software isn't working properly but the VPS/container is up and running so everything traffic is still being sent to that node. If you just have one Anycast IP, this can leave you with a bit of a "black hole".
My current approach is to use to use two Anycast IPv4's for GeoDNS at a different location on each of 5 continents, so speed-wise it is slightly suboptimal compared to a 10-location Anycast, but a bit more robust (which works for me; I still get response times in the 9-19 msec range from the locations I care about, but to each his own) and costs around €6/mo in total.
Since DNS doesn't take much bandwidth, I've got quite a bit of spare capacity. I've thought about putting landing pages at the edge, but am inclined to leave that to CDNs. Instead what I'm considering doing is putting key-value stores on the edge nodes.
My answer above assumes you have a large scale (think Cloudflare and Google).
Redundancy at each POP ensures a server failure would not affect traffic at that POP, because the load balancer should have caught that.
Then, BGP helps you when the whole POP goes offline.
If all the servers in one POP have failed but routes are not withdrawn, it would indeed cause service outage.
Also, be careful with BGP route dampening.
If you change routes too frequently, neighbors will ignore your announcements for while.
What Anycast solution are you using that costs €6/month in total? Pretty curious.
Building custom containers on fly.io