Need help troubleshooting slow I/O on Synology NAS device
Perhaps a bit of an unusual topic for this forum, but since a Synology NAS basically is a ‘stripped down’ Linux machine, I figured it wouldn’t hurt to try.
I’ve received a Synology DS218Play NAS from a friend of mine to fix. It died completely from one day to the other. Lights would blink, but he was unable to reach it via the web interface most of the time and if it worked, it was painfully slow.
The device has 2 HDDs (spinning rust) in RAID and I found one of these to be defective. It would basically hang up the entire system, so as soon as I removed that drive, it seemed to work fine. Until I tried to do some I/O intensive tasks (like downloading a file); I/O would cap out at 8 megabytes per second. Even for spinning rust on an ARM based machine that’s slow. So, I swapped the drive for a WD RED Plus (CMR) HDD. Still slow… now what?
I reset the NAS to factory settings, formatted the drive and started off fresh but still; as soon it needs to do some I/O intensive tasks (like restore a backup for example) it craps out. What could this be causing?
I ran some disk benchmarks using a script I found here: https://gist.github.com/gxfxyz/0d81429fc48a82ef6c2402b178a82566 but I’m having trouble interpreting the results:
sudo hdparm -Tt /dev/md3
# create a 1 GiB test file with random data (it can take minutes) head -c 1G </dev/urandom >test # clear cache sync; echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null # read from test file, read 1 MiB each time, read 1024 times, total read 1 GiB dd if=test of=/dev/null bs=1M count=1024
1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 6.85157 s, 157 MB/s
# write 0 to test file, write 1 GiB each time, write 2 times, total write 2 GiB dd if=/dev/zero of=test bs=1G count=2 oflag=dsync
2+0 records in 2+0 records out 2147483648 bytes (2.1 GB) copied, 67.5115 s, 31.8 MB/s
# write 0 to test file, write 128 MiB each time, write 8 times, total write 1 GiB dd if=/dev/zero of=test bs=128M count=8 oflag=dsync
8+0 records in 8+0 records out 1073741824 bytes (1.1 GB) copied, 8.26623 s, 130 MB/s
# write 0 to test file, write 1 MiB each time, write 1024 times, total write 1 GiB dd if=/dev/zero of=test bs=1M count=1024 oflag=dsync
1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 22.2873 s, 48.2 MB/s
# write 0 to test file, write 128 KiB each time, write 1024 times, total write 128 MiB dd if=/dev/zero of=test bs=128k count=1024 oflag=dsync
1024+0 records in 1024+0 records out 134217728 bytes (134 MB) copied, 15.573 s, 8.6 MB/s
# write 0 to test file, write 4 KiB each time, write 1024 times, total write 4 MiB dd if=/dev/zero of=test bs=4k count=1024 oflag=dsync
1024+0 records in 1024+0 records out 4194304 bytes (4.2 MB) copied, 15.5282 s, 270 kB/s
# write 0 to test file, write 512 bytes each time, write 1024 times, total write 512 KiB dd if=/dev/zero of=test bs=512 count=1024 oflag=dsync
1024+0 records in 1024+0 records out 524288 bytes (524 kB) copied, 17.2119 s, 30.5 kB/s
fio xfio.conf
seq-read-1m: (g=0): rw=read, bs=1M-1M/1M-1M/1M-1M, ioengine=psync, iodepth=1 seq-write-1m: (g=1): rw=write, bs=1M-1M/1M-1M/1M-1M, ioengine=psync, iodepth=1 seq-read-4k: (g=2): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1 seq-write-4k: (g=3): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1 rand-read-1m: (g=4): rw=randread, bs=1M-1M/1M-1M/1M-1M, ioengine=psync, iodepth=1 rand-write-1m: (g=5): rw=randwrite, bs=1M-1M/1M-1M/1M-1M, ioengine=psync, iodepth=1 rand-read-4k: (g=6): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1 rand-write-4k: (g=7): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1 fio-2.13 Starting 8 processes seq-read-1m: Laying out IO file(s) (1 file(s) / 5120MB) seq-read-1m: (groupid=0, jobs=1): err= 0: pid=32244: Tue Jan 5 13:44:52 2021 read : io=5120.0MB, bw=162974KB/s, iops=159, runt= 32170msec clat (msec): min=2, max=69, avg= 6.28, stdev= 2.14 lat (msec): min=2, max=69, avg= 6.28, stdev= 2.14 clat percentiles (usec): | 1.00th=[ 5408], 5.00th=[ 5472], 10.00th=[ 5600], 20.00th=[ 5664], | 30.00th=[ 5792], 40.00th=[ 6048], 50.00th=[ 6240], 60.00th=[ 6304], | 70.00th=[ 6368], 80.00th=[ 6496], 90.00th=[ 6688], 95.00th=[ 6944], | 99.00th=[ 8160], 99.50th=[12096], 99.90th=[47872], 99.95th=[50944], | 99.99th=[69120] bw (KB /s): min=117375, max=174384, per=98.83%, avg=161061.86, stdev=11020.53 lat (msec) : 4=0.16%, 10=99.34%, 20=0.20%, 50=0.25%, 100=0.06% cpu : usr=0.09%, sys=4.48%, ctx=5128, majf=0, minf=274 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=5120/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=1 seq-write-1m: (groupid=1, jobs=1): err= 0: pid=32321: Tue Jan 5 13:44:52 2021 write: io=5120.0MB, bw=164819KB/s, iops=160, runt= 31810msec clat (msec): min=2, max=48, avg= 6.05, stdev= 1.38 lat (msec): min=2, max=48, avg= 6.21, stdev= 1.38 clat percentiles (usec): | 1.00th=[ 5280], 5.00th=[ 5344], 10.00th=[ 5408], 20.00th=[ 5472], | 30.00th=[ 5600], 40.00th=[ 5920], 50.00th=[ 6048], 60.00th=[ 6176], | 70.00th=[ 6240], 80.00th=[ 6368], 90.00th=[ 6560], 95.00th=[ 6816], | 99.00th=[ 6944], 99.50th=[ 8256], 99.90th=[21632], 99.95th=[47872], | 99.99th=[48384] bw (KB /s): min=122529, max=164259, per=85.70%, avg=141244.06, stdev=6802.38 lat (msec) : 4=0.23%, 10=99.43%, 20=0.21%, 50=0.12% cpu : usr=2.96%, sys=1.82%, ctx=5154, majf=0, minf=17 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=0/w=5120/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=1 seq-read-4k: (groupid=2, jobs=1): err= 0: pid=32375: Tue Jan 5 13:44:52 2021 read : io=1024.0MB, bw=76316KB/s, iops=19078, runt= 13740msec clat (usec): min=45, max=41219, avg=48.55, stdev=159.47 lat (usec): min=45, max=41220, avg=49.08, stdev=159.47 clat percentiles (usec): | 1.00th=[ 45], 5.00th=[ 45], 10.00th=[ 46], 20.00th=[ 46], | 30.00th=[ 46], 40.00th=[ 46], 50.00th=[ 46], 60.00th=[ 46], | 70.00th=[ 46], 80.00th=[ 47], 90.00th=[ 47], 95.00th=[ 49], | 99.00th=[ 76], 99.50th=[ 78], 99.90th=[ 89], 99.95th=[ 95], | 99.99th=[ 1512] bw (KB /s): min=42861, max=74722, per=91.51%, avg=69832.15, stdev=6667.91 lat (usec) : 50=95.49%, 100=4.47%, 250=0.02%, 500=0.01%, 750=0.01% lat (usec) : 1000=0.01% lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01% cpu : usr=9.10%, sys=39.67%, ctx=262179, majf=0, minf=19 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=262144/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=1 seq-write-4k: (groupid=3, jobs=1): err= 0: pid=32406: Tue Jan 5 13:44:52 2021 write: io=1024.0MB, bw=70097KB/s, iops=17524, runt= 14959msec clat (usec): min=50, max=39594, avg=53.15, stdev=119.00 lat (usec): min=50, max=39596, avg=53.78, stdev=119.01 clat percentiles (usec): | 1.00th=[ 50], 5.00th=[ 51], 10.00th=[ 51], 20.00th=[ 51], | 30.00th=[ 51], 40.00th=[ 51], 50.00th=[ 51], 60.00th=[ 51], | 70.00th=[ 52], 80.00th=[ 52], 90.00th=[ 53], 95.00th=[ 55], | 99.00th=[ 81], 99.50th=[ 89], 99.90th=[ 131], 99.95th=[ 145], | 99.99th=[ 1288] bw (KB /s): min=53700, max=67940, per=93.76%, avg=65722.48, stdev=3553.70 lat (usec) : 100=99.83%, 250=0.15%, 500=0.01%, 750=0.01%, 1000=0.01% lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01% cpu : usr=9.09%, sys=26.41%, ctx=262191, majf=0, minf=23 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=0/w=262144/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=1 rand-read-1m: (groupid=4, jobs=1): err= 0: pid=32439: Tue Jan 5 13:44:52 2021 read : io=1024.0MB, bw=76869KB/s, iops=75, runt= 13641msec clat (msec): min=2, max=83, avg=13.31, stdev= 5.13 lat (msec): min=2, max=83, avg=13.31, stdev= 5.13 clat percentiles (usec): | 1.00th=[ 2160], 5.00th=[ 6368], 10.00th=[ 8512], 20.00th=[ 9792], | 30.00th=[11072], 40.00th=[12224], 50.00th=[13376], 60.00th=[14272], | 70.00th=[15552], 80.00th=[16768], 90.00th=[17792], 95.00th=[18304], | 99.00th=[25728], 99.50th=[36608], 99.90th=[58624], 99.95th=[83456], | 99.99th=[83456] bw (KB /s): min=61188, max=86383, per=88.10%, avg=67719.48, stdev=5093.78 lat (msec) : 4=2.83%, 10=19.53%, 20=76.17%, 50=1.17%, 100=0.29% cpu : usr=0.22%, sys=1.91%, ctx=1027, majf=0, minf=272 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=1024/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=1 rand-write-1m: (groupid=5, jobs=1): err= 0: pid=32465: Tue Jan 5 13:44:52 2021 write: io=1024.0MB, bw=119523KB/s, iops=116, runt= 8773msec clat (msec): min=2, max=49, avg= 8.39, stdev= 5.32 lat (msec): min=2, max=49, avg= 8.56, stdev= 5.32 clat percentiles (usec): | 1.00th=[ 2288], 5.00th=[ 2288], 10.00th=[ 2288], 20.00th=[ 2384], | 30.00th=[ 6048], 40.00th=[ 7520], 50.00th=[ 7968], 60.00th=[ 8512], | 70.00th=[ 9664], 80.00th=[11456], 90.00th=[15040], 95.00th=[17792], | 99.00th=[24448], 99.50th=[27520], 99.90th=[48896], 99.95th=[49408], | 99.99th=[49408] bw (KB /s): min=76487, max=176390, per=77.51%, avg=92647.88, stdev=22157.86 lat (msec) : 4=24.32%, 10=48.05%, 20=25.00%, 50=2.64% cpu : usr=2.05%, sys=1.60%, ctx=1029, majf=0, minf=17 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=0/w=1024/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=1 rand-read-4k: (groupid=6, jobs=1): err= 0: pid=32489: Tue Jan 5 13:44:52 2021 read : io=102400KB, bw=1140.2KB/s, iops=285, runt= 89812msec clat (usec): min=70, max=327343, avg=3502.41, stdev=4495.53 lat (usec): min=70, max=327345, avg=3503.04, stdev=4495.54 clat percentiles (usec): | 1.00th=[ 74], 5.00th=[ 83], 10.00th=[ 87], 20.00th=[ 90], | 30.00th=[ 92], 40.00th=[ 94], 50.00th=[ 1768], 60.00th=[ 3856], | 70.00th=[ 5792], 80.00th=[ 7840], 90.00th=[ 9664], 95.00th=[10816], | 99.00th=[11584], 99.50th=[11712], 99.90th=[12352], 99.95th=[36096], | 99.99th=[52992] bw (KB /s): min= 333, max= 4192, per=75.94%, avg=865.74, stdev=313.03 lat (usec) : 100=43.69%, 250=3.53%, 500=0.46%, 750=0.04%, 1000=0.04% lat (msec) : 2=3.30%, 4=9.80%, 10=30.63%, 20=8.44%, 50=0.06% lat (msec) : 100=0.01%, 500=0.01% cpu : usr=0.21%, sys=0.80%, ctx=25613, majf=0, minf=20 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=25600/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=1 rand-write-4k: (groupid=7, jobs=1): err= 0: pid=32698: Tue Jan 5 13:44:52 2021 write: io=102400KB, bw=2853.5KB/s, iops=713, runt= 35887msec clat (usec): min=77, max=66798, avg=1396.62, stdev=2198.08 lat (usec): min=78, max=66799, avg=1397.36, stdev=2198.08 clat percentiles (usec): | 1.00th=[ 86], 5.00th=[ 94], 10.00th=[ 97], 20.00th=[ 104], | 30.00th=[ 109], 40.00th=[ 118], 50.00th=[ 1128], 60.00th=[ 1288], | 70.00th=[ 1496], 80.00th=[ 2008], 90.00th=[ 3248], 95.00th=[ 4512], | 99.00th=[11456], 99.50th=[13888], 99.90th=[22400], 99.95th=[23936], | 99.99th=[41728] bw (KB /s): min= 2232, max= 5128, per=98.43%, avg=2808.11, stdev=400.27 lat (usec) : 100=14.48%, 250=30.50%, 500=0.16%, 750=0.06%, 1000=0.03% lat (msec) : 2=34.76%, 4=13.58%, 10=5.06%, 20=1.18%, 50=0.18% lat (msec) : 100=0.01% cpu : usr=0.47%, sys=1.31%, ctx=25615, majf=0, minf=16 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=0/w=25600/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): READ: io=5120.0MB, aggrb=162974KB/s, minb=162974KB/s, maxb=162974KB/s, mint=32170msec, maxt=32170msec Run status group 1 (all jobs): WRITE: io=5120.0MB, aggrb=164818KB/s, minb=164818KB/s, maxb=164818KB/s, mint=31810msec, maxt=31810msec Run status group 2 (all jobs): READ: io=1024.0MB, aggrb=76315KB/s, minb=76315KB/s, maxb=76315KB/s, mint=13740msec, maxt=13740msec Run status group 3 (all jobs): WRITE: io=1024.0MB, aggrb=70096KB/s, minb=70096KB/s, maxb=70096KB/s, mint=14959msec, maxt=14959msec Run status group 4 (all jobs): READ: io=1024.0MB, aggrb=76869KB/s, minb=76869KB/s, maxb=76869KB/s, mint=13641msec, maxt=13641msec Run status group 5 (all jobs): WRITE: io=1024.0MB, aggrb=119523KB/s, minb=119523KB/s, maxb=119523KB/s, mint=8773msec, maxt=8773msec Run status group 6 (all jobs): READ: io=102400KB, aggrb=1140KB/s, minb=1140KB/s, maxb=1140KB/s, mint=89812msec, maxt=89812msec Run status group 7 (all jobs): WRITE: io=102400KB, aggrb=2853KB/s, minb=2853KB/s, maxb=2853KB/s, mint=35887msec, maxt=35887msec Disk stats (read/write): md3: ios=300032/299963, merge=0/0, ticks=177520/109410, in_queue=285050, util=93.63%, aggrios=300056/300101, aggrmerge=4/21, aggrticks=178200/108570, aggrin_queue=286380, aggrutil=93.39% sda: ios=300056/300101, merge=4/21, ticks=178200/108570, in_queue=286380, util=93.39%
For me, as a some-what noob, these look ‘fine’. Am I missing the bigger picture?
All help and pointers are welcome ?
Thanks!
Comments
Are you running ZFS? This has considerable overhead, especially if you are running it with compression and de-duplication.
Cheap dedis are my drug, and I'm too far gone to turn back.
Hey, What are the Temperatures of the Subject in pain?
I had one time, that the Dumb thing ran hot and down clock itself to a death machine.
After I blasted it with a compressor it ran like a charm again. It may or may not help it in this case.
Grab your vmWare toblerone today.
No, I'm afraid not. I'm running ext4.
Good one! But according to DSM the temps are 'Normal' (40 degrees celcius).