★ VirMach ★ RYZEN ★ NVMe ★★ $30/2YR- 768MB ★ $40/2YR - 1.5GB ★ $50/2YR - 2.5GB ★ $70/2YR - 4GB ★ LES

AlwaysSkint · July 2022

Guys, find a room, please.

skorous · July 2022

@AlwaysSkint said: Guys, find a room, please.

No need. That'll be my last comment.

Mumbly · July 2022

@skorous said: You're right. You never said single production node. You said a 24h planned maintenance was too long. I inferred from that it must be an outage maintenance since otherwise who would care if it takes 24h.

I said also:

@Mumbly said: There are several solutions to prevent this migration downtime while you sleep, but generally looking it's just unnecessarily annoying. (ie. temporal move of your stuff to the other provider and after maintenance move it back, etc..)

And no, it's just discussion why 24 hours long time window for PLANNED maintenance (migration) isn't most user friendly as there's no way for you to know when exactly to be online to fix things and get them back online after the migration.

With securedragon and justhost.ru you simply push migration button, your stuff get migrated and you work on it. Pretty simple and efficient without need to worry that your stuff will re-appear with new IP on the new node at some unspecific time randomly let's say 20 hours later at 3:00am when you're in bed.
That's why I think that suggestion about 24-hours long migration window isn't most efficent and user friendly for those who actually host something on the vps.

VirMach · July 2022

@Papa said:

@VirMach said:

Finally got an important setting to stick on this one.

And it's alive now except network. Network reconfigure shows "Unknown error", but files at /etc/network are being rewrited.
networking.service shows warning that /etc/resolv.conf is not a symbolic link to /run/resolv.conf

Unfortunately there's pretty much nothing we can do about that really. I really think SolusVM did some updates to that tool and broke it for older operating systems recently. They've been breaking a lot of things, like libvirtd incompatibility, the migration tool wasn't working for a while, the operating systems don't template properly and they haven't been syncing properly a lot of times. They've just been updating all their PHP versions and whatever else, racing forward without actually checking anything.

They broke the entire installer the other day.

@AlwaysSkint said:
@VirMach
ATLZ007 has been in a flap all morning (started ~04:00 UTC+1), with Hetrix showing network going down for a minute each time, approx. 30 minutes apart. Anything funky going on? Abusers?

(Edit: sorry was wrong node being reported. D'oh! It's Atlanta, which has been fine lately, until now.)

Edit2: Seems to have stopped now - last/latest one:

Downtime: 2 min
Noticed at: 2022-07-06 10:10:37 (UTC+00:00)

I'm seeing all the flapping. A lot's been flapping. People are getting situated which means testing network, and a lot of files are still flying around on our end. Unfortunately I can't focus on that right now and nearly impossible to get it all right as everything's happening right now. I did fix a server or two where the networking was completely unusable but flappy flaps are A-OK right now, it's the least of our problems.

VirMach · July 2022

@Mumbly said: And no, it's just discussion why 24 hours long time window for PLANNED maintenance (migration) isn't most user friendly as there's no way for you to know when exactly to be online to fix things and get them back online after the migration.

I agree with you on this. We didn't really have another choice for these. It's painful and bad, for us as well. There are probably 5% of people that have been stuck for 48 hours now and I don't like that but we're doing all we physically can.

Everything keeps breaking and I don't just mean on our end. We're using 10 year old PHP software to manage the virtual servers, so that alone really hurts.

@Mumbly said: With securedragon and justhost.ru you simply push migration button, your stuff get migrated and you work on it. Pretty simple and efficient without need to worry that your stuff will re-appear with new IP on the new node at some unspecific time randomly let's say 20 hours later at 3:00am when you're in bed.

That's why I think that suggestion about 24-hours long migration window isn't most efficent and user friendly for those who actually host something on the vps.

Our developer we hired delayed and then bailed on the project but it was originally going to be planned that way, and then if you didn't do it then you were going to be subjected to this mess.

Mumbly · July 2022

@VirMach that's misunderstanding now
I didn't comment or criticized Virmach migrations here, but discussed @yoursunny's suggestion (well, he made a few good suggestions but this one I feel like wasn't the best one) about 24 hours migration queve in the future

skorous · July 2022

@VirMach said: I agree with you on this. We didn't really have another choice for these. It's painful and bad, for us as well. There are probably 5% of people that have been stuck for 48 hours now and I don't like that but we're doing all we physically can.

Just in case you're thinking this is about the migrations, the discussion you're referring back to is actually about Ryzen Location Change button and @yoursunny opinions on it.

Papa · July 2022

@VirMach said:
Unfortunately there's pretty much nothing we can do about that really. I really think SolusVM did some updates to that tool and broke it for older operating systems recently. They've been breaking a lot of things, like libvirtd incompatibility, the migration tool wasn't working for a while, the operating systems don't template properly and they haven't been syncing properly a lot of times. They've just been updating all their PHP versions and whatever else, racing forward without actually checking anything.

Ok, I understand that, but is there a way to fix network manually? What settings should i set to get vm vorking? I couldn't see any problems from the first approach - there is eth0 ( ens3) interface up, there is a route to gateway xxx.xxx.xxx.1
I can ping nearby nodes, but for gateway response is Network host unreachable.

VirMach · July 2022

@Mumbly said:
@VirMach that's misunderstanding now
I didn't comment or criticized Virmach migrations here, but discussed @yoursunny's suggestion (well, he made a few good suggestions but this one I feel like wasn't the best one) about 24 hours migration queve in the future

I understand, I'm just adding onto it that I agree with you and it's not user friendly and I'm also saying that unfortunately our current situation is similar and also not how we intended for it to be coded by the developer. I did read what @yoursunny suggested and that definitely has its benefits but our ideal version would be immediate.

I remember we improved our script to minimize downtime to only something like an hour per VM and had that concept working pretty well, but it didn't work out when you're planning for efficiency and many servers.

How the project was for the developer that never completed it:

Customer is eligible to use it when the server is queued for migration, and later on it would be open to everyone (the latter being the Ryzen to Ryzen idea.)
Customer receives one credit per term length, as in if they could cancel and re-order, then they're eligible for it. Therefore, naturally customer on first month of service wouldn't be able to abuse it.
Migration is queued in a batch, but the batch gets processed more immediately. It essentially would only wait for the right load conditions and be throttled by quantities of requests. If there are instantly 1,000 requests, then it could theoretically naturally take 24 hours.
Since AFAIK SolusVM does not have an API to run the script that completes a migration, AKA, marks it in the database as being on a new node, our idea was that it'd power up a new service, then replace the details of the existing service on WHMCS.
Old service gets marked for deletion but not immediate deletion in case anything goes wrong, so the data is still there for a few days. More aggressive pruning may happen during peak usage, it'd essentially have let's say a 200GB pool where it stays for a week, and anything past that maybe only a day. This also allows for an easy revert button to be worked in later in customer regrets his decision, instead of contacting us and instead of moving it twice.

It'd have been pretty nice if we had that ready in time for these migrations so it instead didn't end up as the vague day-long periods.

flips · July 2022

Trying to figure out from network status and backlog here: Should FFME002 be operational?
Have never gotten it to work here ...

Troubleshooter reports:
Main IP pings: false
Node Online: false
Service online: offline
Operating System: linux-ubuntu-16.04-server-x86_64-minimal-latest
Service Status:Active

Daevien · July 2022

@flips said:
Trying to figure out from network status and backlog here: Should FFME002 be operational?
Have never gotten it to work here ...

pokes FFME002 with a stick nope, still dead. has been for a while, worked for like most of a day after i migrated there then dead

risturiz · July 2022

I see new Los Angeles Ryzen network better... 10k in traffic vs 100k in Atlanta

Papa · July 2022

Something strange about FFME003 network config of my vm. If i set network interface to dhcp, i receive correct ip address (xxx.xxx.163.xxx) but from dhcp server of another subnet xxx.xxx.162.2. And ip address i receive is from the same subnet as FFME004 vm. Is this working as intended? If i restore network config from control panel, i get the same ip as static, excluding missing symlink to resolvconf.

yoursunny · July 2022

@VirMach said:

Customer receives one credit per term length, as in if they could cancel and re-order, then they're eligible for it. Therefore, naturally customer on first month of service wouldn't be able to abuse it.

Most services are paid annually.
One migration per year is not enough.
That's why I suggested once per month.

Some starter credits should be granted on the current services, because:

Some services have been auto-migrated to undesirable locations, such as Amsterdam to Frankfurt.
Looking glass nodes are inoperable, so that for all the migration done so far, the chosen location may be unsuitable for the needs.

Once all the looking glass nodes are up, can we at least have 2~3 credits per year?
That's a lot more flexible than only one credit per year, in case the network condition deteriorates in the middle.

realEthanZou · July 2022

Seems many JP VMs got their IP changed without prior notice

rhinoduck · July 2022

@realEthanZou said:
Seems many JP VMs got their IP changed without prior notice

Indeed. No email, and no information I could find on the Network Status page.

While I understand that a host can face many challenges that cannot be predicted or immediately explained, this is not one of such situations. And the lack of a warning and the lack of information about why the change happened (Was it intentional and is it to stay, or was was it just a configuration mistake that will be reverted?) is a big fat NO NO in my book.

Papa · July 2022

What tha F is going in with FFME? Migrated from FFME004 yesterday for 3 bucks fee, got the same FFME004, but online and working, today it's totally offline - no boot, no VNC, nothing. And no information about what's going on. I was waiting patiently for two weeks, created zero tickets, but now i have to give up and leave as soon as i could get my data from FFME003. Honestly, even my home server with my hobbyist approach does have less downtime and more reliability, and migrates faster.

NerdUno · July 2022

While our Control Panel still shows 0 IPv4 addresses, we can finally get back in at the assigned IPv4 address and also through our private VPN. So progress has been made at least on the RYZE.SEA-Z002.VMS node in Seattle. We'll see if it stays up.

VirMach · July 2022

We're ramping up the abuse script temporarily. If you're at around full CPU usage on your VPS for more than 2 hours, it'll be powered down. We have too many VMs getting stuck on OS boot and negatively affecting others at this time, due to the operating systems getting stuck on boot for some after the Ryzen change. I apologize in advance for any false positives but I do want to note that it's technically within our terms for 2 hours of that type of usage to be considered abuse, we just usually try to be much more lenient.

@realEthanZou said:
Seems many JP VMs got their IP changed without prior notice

@rhinoduck said:

@realEthanZou said:
Seems many JP VMs got their IP changed without prior notice

Indeed. No email, and no information I could find on the Network Status page.

While I understand that a host can face many challenges that cannot be predicted or immediately explained, this is not one of such situations. And the lack of a warning and the lack of information about why the change happened (Was it intentional and is it to stay, or was was it just a configuration mistake that will be reverted?) is a big fat NO NO in my book.

We did send out emails, but it's possible they did not all send due to SolusVM also being overloaded around that time. Please also check your spam box, we sent these directly from SolusVM.

VirMach · July 2022

@yoursunny said:

@VirMach said:

Customer receives one credit per term length, as in if they could cancel and re-order, then they're eligible for it. Therefore, naturally customer on first month of service wouldn't be able to abuse it.

Most services are paid annually.
One migration per year is not enough.
That's why I suggested once per month.

Some starter credits should be granted on the current services, because:

Some services have been auto-migrated to undesirable locations, such as Amsterdam to Frankfurt.

Looking glass nodes are inoperable, so that for all the migration done so far, the chosen location may be unsuitable for the needs.

Once all the looking glass nodes are up, can we at least have 2~3 credits per year?
That's a lot more flexible than only one credit per year, in case the network condition deteriorates in the middle.

Well initially the way it's going to work is there will be a period of time where you're allowed to "Ryzen Migrate" to your desired location (without data) as everyone lands in their desired location. It'll be announced here and on OGF, as well as most likely an "Announcement" on our website and probably a 1-2 week period where this can be done by everyone eligible.

The credits I described are for a system not yet coded.

VirMach · July 2022

@rhinoduck said: Indeed. No email, and no information I could find on the Network Status page.

I'll make a network status page for it since it seems a lot of the emails failed to send.

VirMach · July 2022

@Papa said:
What tha F is going in with FFME? Migrated from FFME004 yesterday for 3 bucks fee, got the same FFME004, but online and working, today it's totally offline - no boot, no VNC, nothing. And no information about what's going on. I was waiting patiently for two weeks, created zero tickets, but now i have to give up and leave as soon as i could get my data from FFME003. Honestly, even my home server with my hobbyist approach does have less downtime and more reliability, and migrates faster.

FFME004, we found ECC error, and the settings also dropped off again. Memory swap fixed FFME004, we couldn't send out migration emails in time because xTom worked very quickly to get this replaced. The setting drop-off caused a disk to drop and node was online but VMs were not booting. That has been resolved. I'm checking it again to see if settings stick, if they don't there might be another reboot which we'll create network status for but hopefully this will be stable moving forward.

FFME has had 3 network status, many updates here, on OGF, and probably a fair share of emails. It can be considered an ongoing issue until they prove themselves by remaining online for more than 2 days.

VirMach · July 2022

Settings stuck on FFME004 but I'm pretty sure I've said that once before. There's zero information on this but there's been constant kernel bugs regarding these fixes on Linux. I can't rewrite the Linux kernel right now so until Linux figures out how it's going to treat these problems I don't know what else to do about it.

These issues have come on gone ever since NVMe SSDs existed, you can search online since around 2015. If you just look at Linux bug trackers it seems like every version fixes one thing and breaks another. I'll have to come up with some kind of kernel update plan moving forward where we try to mitigate the issues from re-appearing. But of course the solution can't just be to stay on the same version for 5 years.

We're using literal copies of the same nodes in FFM... in Tokyo, and they're not having the same problems with the only difference being kernel versions.

NerdUno · July 2022

@NerdUno said:
While our Control Panel still shows 0 IPv4 addresses, we can finally get back in at the assigned IPv4 address and also through our private VPN. So progress has been made at least on the RYZE.SEA-Z002.VMS node in Seattle. We'll see if it stays up.

Spoke too soon. Status back to dead in the water this afternoon.

kheng86 · July 2022

@VirMach said:
Settings stuck on FFME004 but I'm pretty sure I've said that once before. There's zero information on this but there's been constant kernel bugs regarding these fixes on Linux. I can't rewrite the Linux kernel right now so until Linux figures out how it's going to treat these problems I don't know what else to do about it.

These issues have come on gone ever since NVMe SSDs existed, you can search online since around 2015. If you just look at Linux bug trackers it seems like every version fixes one thing and breaks another. I'll have to come up with some kind of kernel update plan moving forward where we try to mitigate the issues from re-appearing. But of course the solution can't just be to stay on the same version for 5 years.

We're using literal copies of the same nodes in FFM... in Tokyo, and they're not having the same problems with the only difference being kernel versions.

My VM in FFME004 seems fine now, but I have a couple of VMs FFME005 and FFME006 having Status "Offline", can't bootup, can't reinstall OS. Hope you can help, thanks! Ticket #754039

yoursunny · July 2022

@VirMach said:
We're ramping up the abuse script temporarily. If you're at around full CPU usage on your VPS for more than 2 hours, it'll be powered down. We have too many VMs getting stuck on OS boot and negatively affecting others at this time, due to the operating systems getting stuck on boot for some after the Ryzen change. I apologize in advance for any false positives but I do want to note that it's technically within our terms for 2 hours of that type of usage to be considered abuse, we just usually try to be much more lenient.

Boot loop after migrating to a different CPU or changing to a different IP is not abuse.
Customer purchased service on a specific CPU and a specific IP that are not expected to change.
The kernel and userland could have been compiled with -march=native so that it would not start on any other CPU.
The services could have been configured to bind to a specific IP, which would cause service restart loop if the IP disappeared.

The safest way is not automatically powering on the service after the migration.
The customer needs to press Power On button themselves and then fixes the machine right away.

Running -march=native code on an unsupported CPU triggers undefined behavior.
Undefined behavior means anything could happen, such as pink unicorn appearing in VirMach offices, @deank stopping to believe in the end, or @FrankZ receiving 1000 free servers.
The simple act of automatic powering on a migrated server could cause these severe consequences and you don't want that.

VirMach · July 2022

@NerdUno said:

@NerdUno said:
While our Control Panel still shows 0 IPv4 addresses, we can finally get back in at the assigned IPv4 address and also through our private VPN. So progress has been made at least on the RYZE.SEA-Z002.VMS node in Seattle. We'll see if it stays up.

Spoke too soon. Status back to dead in the water this afternoon.

This has an issue with a software getting stuck and duplicating its process over and over until it overloads and we have to reboot it. We made some changes, if it happens again we'll try to catch it earlier this time to avoid a reboot.

VirMach · July 2022

@kheng86 said: but I have a couple of VMs FFME005 and FFME006 having Status "Offline", can't bootup, can't reinstall OS. Hope you can help, thanks!

Were these always offline after Ryzen Migrate button?

VirMach · July 2022

@yoursunny said:

@VirMach said:
We're ramping up the abuse script temporarily. If you're at around full CPU usage on your VPS for more than 2 hours, it'll be powered down. We have too many VMs getting stuck on OS boot and negatively affecting others at this time, due to the operating systems getting stuck on boot for some after the Ryzen change. I apologize in advance for any false positives but I do want to note that it's technically within our terms for 2 hours of that type of usage to be considered abuse, we just usually try to be much more lenient.

Boot loop after migrating to a different CPU or changing to a different IP is not abuse.
Customer purchased service on a specific CPU and a specific IP that are not expected to change.
The kernel and userland could have been compiled with -march=native so that it would not start on any other CPU.
The services could have been configured to bind to a specific IP, which would cause service restart loop if the IP disappeared.

The safest way is not automatically powering on the service after the migration.
The customer needs to press Power On button themselves and then fixes the machine right away.

Running -march=native code on an unsupported CPU triggers undefined behavior.
Undefined behavior means anything could happen, such as pink unicorn appearing in VirMach offices, @deank stopping to believe in the end, or @FrankZ receiving 1000 free servers.
The simple act of automatic powering on a migrated server could cause these severe consequences and you don't want that.

We're ramping up the abuse script. It's what it is called. I didn't say boot loop after migrating is abuse.

Abuse script will just power it down, not suspend. I don't see the harm in powering down something stuck in a boot loop. I was just providing this as a PSA for anyone reading who might be doing something else not related that's also using a lot of CPU and for general transparency, we're making the abuse script more strict to try to power down the ones stuck in the boot loop automatically more quickly.

The safest way is not automatically powering on the service after the migration.
The customer needs to press Power On button themselves and then fixes the machine right away.

Not possible, we have to power up all of them to fix other issues. Otherwise we won't know the difference between one that's stuck and won't boot and others. Plus many customers immediately make tickets instead of trying to power up the VPS after it goes offline so in any case having them powered on has more benefits than keeping them offline.

kheng86 · July 2022

@VirMach said:

@kheng86 said: but I have a couple of VMs FFME005 and FFME006 having Status "Offline", can't bootup, can't reinstall OS. Hope you can help, thanks!

Were these always offline after Ryzen Migrate button?

The "Migration" button was not used. Has been always offline after the migration. It happened after the planned migration from AMS to FFE

★ VirMach ★ RYZEN ★ NVMe ★★ $30/2YR- 768MB ★ $40/2YR - 1.5GB ★ $50/2YR - 2.5GB ★ $70/2YR - 4GB ★ LES

Comments

This Site is currently in maintenance mode.
Please check back here later.

★ VirMach ★ RYZEN ★ NVMe ★★ $30/2YR- 768MB ★ $40/2YR - 1.5GB ★ $50/2YR - 2.5GB ★ $70/2YR - 4GB ★ LES

Comments

This Site is currently in maintenance mode.Please check back here later.

This Site is currently in maintenance mode.
Please check back here later.