Reliability of benchmarking scripts
I was doing some benchmarking when I noticed some weird numbers being reported. I decided to investigate by running the script three times. The reported download speeds of the 100MB test file from Softlayer SG to my server are:
5.92 MiB/s
5.67 MiB/s
4.16 MiB/s
When I downloaded the same file using curl three consecutive times, the speeds are:
8.422 MB/s
7.890 MB/s
8.403 MB/s
Finally, I tried the same file three consecutive times using wget:
8.57 MB/s
9.19 MB/s
8.60 MB/s
There seems to be a clear consistent pattern that the script is under-reporting the bandwidth in the range of 40 - 80% relative to plain curl or wget. I have even more wild swings in other instances. This is making me curious as to what is causing it. Nench uses the curl command in its script, but it doesn't seem to be curl's fault because when I executive the curl command manually, the speeds are consistent with wget.
I think the download speeds of benchmark scripts need to be taken with quite a big grain of salt. I am not sure why, but manual downloads seem to be a much better reflection of downloading speed.
Now I have to think about how to tweak my benchmarking procedure.
Deals and Reviews: LowEndBoxes Review | Avoid dodgy providers with The LEBRE Whitelist | Free hosting (with conditions): Evolution-Host, NanoKVM, FreeMach, ServedEZ | Get expert copyediting and copywriting help at The Write Flow
Comments
@poisson - (just curious) are you using the same flags for curl as it is invoked in nench.sh?
and I guess might also see if any effect from piping to that awk command (the guts of the
Bps_to_MiBps()
function)(Just as a methodical first pass, to rule out low-hanging fruit)
HS4LIFE (+ (* 3 4) (* 5 6))
I think any single bench mark on a shared resource environment that is used to make an absolute judgement about anything is in itself a more accurate judgement of the level of knowledge of the person using it is than the benchmark results are on the service.
A generic bench mark probably tests nothing that resembles real world sources or scenarios, you need to make your own based on your own use case to have anything even remotely accurate.
https://inceptionhosting.com
Please do not use the PM system here for Inception Hosting support issues.
Nope I didn't use the same flags but they should not affect because there is no timeout (--max-time accounted for) and -s simply silences curl, but I did use -o because I had to output the file to disk. I cannot see how they can reasonably explain the difference.
I am not sure about the piping, but that looks like should not impact the download speeds.
Deals and Reviews: LowEndBoxes Review | Avoid dodgy providers with The LEBRE Whitelist | Free hosting (with conditions): Evolution-Host, NanoKVM, FreeMach, ServedEZ | Get expert copyediting and copywriting help at The Write Flow
I actually have lots of data points, not just a single bench. This is why I became curious and started to dig. It seems like scripts tend to be under-reporting for reasons that I don't know why, even when everything else is pretty much controlled for (same file, same servers, time is within the same few minutes).
Deals and Reviews: LowEndBoxes Review | Avoid dodgy providers with The LEBRE Whitelist | Free hosting (with conditions): Evolution-Host, NanoKVM, FreeMach, ServedEZ | Get expert copyediting and copywriting help at The Write Flow
@poisson - well ... I'm inclined to agree with regard to "no obvious reason it should run slower" ...
And yet - it does! So presumably there is a reason, and presumably that reason is not obvious (to me) when looking at the entire nench.sh script as a whole.
And so, I would simply try to decompose the system methodically (without thinking too much about how things "should" work, given that my delusional capacity for pure reason and perfect knowledge has failed me yet again - while perhaps some simple experimentation and observation would suffice to enlighten instead). Yea, verily I should endeavor to test each possible component and combination of components as I essentially put the script back together.
So - along those lines - the next questions might be "does running curl inside a bash function take longer?" ... "does running bash functions from inside a script in a file take longer?" ... And so forth.
The rest is left as an exercize for the i nterested reader. Q.E.D, etc, etc, etc.
HS4LIFE (+ (* 3 4) (* 5 6))
My guess is the script could be doing some cpu/memory intensive bench before start the download so it could be affecting it.
I bench YABS 24/7/365 unless it's a leap year.
edit the benchmark script.
add before the network portion of the benchmark
See if that makes a difference, alternatively move the networking part of the benchmark to run first before all else.
If it does change things I can probably explain whats going on, if not then I am puzzled without having a look myself.
https://inceptionhosting.com
Please do not use the PM system here for Inception Hosting support issues.
I like the new batch of benchmark scripts that use a handful of 10G Iperf3 servers. Is it YABs?
disk IO: fio with 4k/8k blocksizes if you are evaluating iops headroom.
Cpu: CPU steal tells you a lot about how the node is being managed by operator.
YABS uses iperf but the problem is there are few Asian public iperf servers. The one in YABS often doesn't work.
Deals and Reviews: LowEndBoxes Review | Avoid dodgy providers with The LEBRE Whitelist | Free hosting (with conditions): Evolution-Host, NanoKVM, FreeMach, ServedEZ | Get expert copyediting and copywriting help at The Write Flow
Indeed. I tried to find all the public iperf3 servers out there, but Asia is pretty dark on that front. Mark from DirectAdmin did say he'd sponsor a few iperf3 POPs for the bench, so there's a possibility a couple more locations might be added. Problem will be finding a location in Asia that has good connectivity and a ton of bandwidth (as iperf servers will naturally push a ton of bw every month).
Humble janitor of LES
Proud papa of YABS
That's why for my own benchmarking purposes, I might set up a private iperf on my mikho SG box. Can't make it public.
Deals and Reviews: LowEndBoxes Review | Avoid dodgy providers with The LEBRE Whitelist | Free hosting (with conditions): Evolution-Host, NanoKVM, FreeMach, ServedEZ | Get expert copyediting and copywriting help at The Write Flow
My 2c, if you are downloading from any public server, chances of one getting a consistent number will be slim because odds are, other people/scripts etc already using them.
Nexus Bytes Ryzen Powered NVMe VPS | NYC|Miami|LA|London|Netherlands| Singapore|Tokyo
Storage VPS | LiteSpeed Powered Web Hosting + SSH access | Switcher Special |
Yes and no. Many of the tests use big providers like Linode and Softlayer, which is ok. Results are fairly consistent. I am contemplating changing to 1GB download tests instead of the 100MB most scripts use (or to iperf) because often it seems to take time to reach maximum speed using curl, and the download often completes at lower speeds by 100MB. At least 500MB is necessary to average out fluctuations.
Deals and Reviews: LowEndBoxes Review | Avoid dodgy providers with The LEBRE Whitelist | Free hosting (with conditions): Evolution-Host, NanoKVM, FreeMach, ServedEZ | Get expert copyediting and copywriting help at The Write Flow
Thanks everyone for the input, especially @uptime. I thought there was no curl timeout issue, but when I investigated more, it seems like timeout was indeed an issue for certain locations. I modified the timeout parameter and now the speeds are similar to a direct command-line curl or wget.
But, I noticed that it was only particular locations that were exhibiting problems, which were APAC locations. Then I decided to leave the timeout untouched and added another location to do a speed test and VOILA, I got my answer. The problem is that Softlayer's networks need a higher timeout.
Here's a sample output from Europe without modifying the original curl timeout. Note all the Softlayer locations and the corresponding alternative locations (marked by asterisks for your easy reference):
Here's a sample output from Europe increasing the curl timeout from 10s to 60s (note all the Softlayer locations again):
Again, Softlayer generally performed worse than alternative locations, but note the double asterisks for APAC locations with 60s timeout compared to the previous test with only 10s timeout.
Still somewhat skeptical, I decided to run the same thing on a VPS located in America this time. The nomenclature in terms of asterisks is the same as above. First up is the original curl timeout of 10s
Now, the results when the curl timeout increased to 60s:
The difference from timeout is not that obvious for the US location. However, I think one thing is clear; probably Softlayer locations should not be used for benchmarking because they are generally slower for unknown reasons. This seems to be unique to Softlayer's networks as the other networks I used do not exhibit such symptoms.
It's taken quite a while to figure this shit out but I think I am pretty convinced that the problem is with Softlayer.
Deals and Reviews: LowEndBoxes Review | Avoid dodgy providers with The LEBRE Whitelist | Free hosting (with conditions): Evolution-Host, NanoKVM, FreeMach, ServedEZ | Get expert copyediting and copywriting help at The Write Flow
its good you fixed it
I've observed that most downloads can tend to start off slower then ramp up to the potential speed, but it really seems to depend on the network or the time of day. I've never really been able to make much sense of it without actually researching the phenomenon.
It'd be nice if the community could collaborate on a benchmark script that we could all agree to use, to keep all the results as fair as possible (or at least consistent)
Most of the time, I put network problems down to "You're in Australia, Dave, your connectivity is fucked sometimes, that's just how shit is in the land down under, ya silly cunt."
Get the best deal on your next VPS or Shared/Reseller hosting from RacknerdTracker.com - The original aff garden.
@poisson - awesome analysis! and thank you for actually doing the needful to figure this one out.
one question/suggestion for future reference - do you ever
iperf
, bro?HS4LIFE (+ (* 3 4) (* 5 6))
I prefer iperf but problem is there are few public hosts in APAC. I may run my own APAC iperfs using my bundle from @mikho (never done it before but should not be too difficult I guess) but no way I can make those public servers.
Deals and Reviews: LowEndBoxes Review | Avoid dodgy providers with The LEBRE Whitelist | Free hosting (with conditions): Evolution-Host, NanoKVM, FreeMach, ServedEZ | Get expert copyediting and copywriting help at The Write Flow
This is why 500MB to 1GB files are usually better because the effect is greatly diluted after a while. If I have time I will modify an existing script to use 1GB files instead (with a warning on high bandwidth use)
Deals and Reviews: LowEndBoxes Review | Avoid dodgy providers with The LEBRE Whitelist | Free hosting (with conditions): Evolution-Host, NanoKVM, FreeMach, ServedEZ | Get expert copyediting and copywriting help at The Write Flow
That's normal TCP behavior. In a perfect world all data streams would eventually ramp up to the full advertised capacity of their pipes albeit slowly. However, a lot of factors (network load, latency, laws of physics) contribute to the actual observed speed. Cerf, Kahn and their team tried to think of all these possibilities in the 1970s and the best they could come up with was TCP.
Yea, iperf is a much better test because we can ignore the first few seconds in the bandwidth test to get a better sense of actual throughput at time of testing.
Deals and Reviews: LowEndBoxes Review | Avoid dodgy providers with The LEBRE Whitelist | Free hosting (with conditions): Evolution-Host, NanoKVM, FreeMach, ServedEZ | Get expert copyediting and copywriting help at The Write Flow
how about BBR to improve testing
I bench YABS 24/7/365 unless it's a leap year.
Stop reading random Chinese blogs.
My pronouns are asshole/asshole/asshole. I will give you the same courtesy.
I bench YABS 24/7/365 unless it's a leap year.