This isn’t a gloat post. In fact, I was completely oblivious to this massive outage until I tried to check my bank balance and it wouldn’t log in.
Apparently Visa Paywave, banks, some TV networks, EFTPOS, etc. have gone down. Flights have had to be cancelled as some airlines systems have also gone down. Gas stations and public transport systems inoperable. As well as numerous Windows systems and Microsoft services affected. (At least according to one of my local MSMs.)
Seems insane to me that one company’s messed up update could cause so much global disruption and so many systems gone down :/ This is exactly why centralisation of services and large corporations gobbling up smaller companies and becoming behemoth services is so dangerous.
A couple of days ago a Windows 2016 server started a license strike in my farm … Coincidence?
Yes.
Unless your server was running Crowdstrike and also hosted in a time machine, yes it is.
I wanted to share the article with friends and copy a part of the text I wanted to draw attention to but the asshole site has selection disabled. Now I will not do that and timesnownews can go fuck themselves
heres the entire article
Latest Crowdstrike Update Issue: Many Windows users are experiencing Blue Screen of Death (BSOD) errors due to a recent CrowdStrike update. The issue affects various sensor versions, and CrowdStrike has acknowledged the problem and is investigating the cause, as stated in a pinned message on the company’s forum.
Who Have Been Affected
Australian banks, airlines, and TV broadcasters first reported the issue, which quickly spread to Europe as businesses began their workday. UK broadcaster Sky News couldn’t air its morning news bulletins, while Ryanair experienced IT issues affecting flight departures. In the US, the Federal Aviation Administration grounded all Delta, United, and American Airlines flights due to communication problems, and Berlin airport warned of travel delays from technical issues.
In India too, numerous IT organisations were reporting in issues with company-wide. Akasa Airlines and Spicejet experienced technical issues affecting online services. Akasa Airlines’ booking and check-in systems were down at Mumbai and Delhi airports due to service provider infrastructure issues, prompting manual check-in and boarding. Passengers were advised to arrive early, and the airline assured swift resolution. Spicejet also faced problems updating flight disruptions, actively working to fix the issue. Both airlines apologized for the inconvenience caused and promised updates as soon as the problems were resolved.
Crowdstrike’s Response
CrowdStrike acknowledged the problem, linked to their Falcon sensor, and reverted the faulty update. However, affected machines still require manual intervention. IT admins are resorting to booting into safe mode and deleting specific system files, a cumbersome process for cloud-based servers and remote laptops. Reports from IT professionals on Reddit highlight the severity, with entire companies offline and many devices stuck in boot loops. The outage underscores the vulnerability of interconnected systems and the critical need for robust cybersecurity solutions. IT teams worldwide face a long and challenging day to resolve the issues and restore normal operations.
What to Expect:-A Technical Alert (TA) detailing the problem and potential workarounds is expected to be published shortly by CrowdStrike.
-The forum thread will remain pinned to provide users with easy access to updates and information.What Users Should Do:
-Hold off on troubleshooting: Avoid attempting to fix the issue yourself until the official Technical Alert is released.
-Monitor the pinned thread: This thread will be updated with the latest information, including the TA and any temporary solutions.
-Be patient: Resolving software conflicts can take time. CrowdStrike is working on a solution, and updates will be posted as soon as they become available.In an automated reply from Crowdstrike, the company had stated: CrowdStrike is aware of reports of crashes on Windows hosts related to the Falcon Sensor. Symptoms include hosts experiencing a blue screen error related to the Falcon Sensor. The course of current action will be - our Engineering teams are actively working to resolve this issue and there is no need to open a support ticket. Status updates will be posted as we have more information to share, including when the issue is resolved.
For Users Experiencing BSODs:
If you’re encountering BSOD errors after a recent CrowdStrike update, you’re not alone. This appears to be a widespread issue. The upcoming Technical Alert will likely provide specific details on affected CrowdStrike sensor versions and potential workarounds while a permanent fix is developed.
If you have urgent questions or concerns, consider contacting CrowdStrike support directly.If you have urgent questions or concerns, consider contacting CrowdStrike support directly.
Something tells me that isn’t going to provide the comfort it was meant to.
It is annoying. Some possible solutions:
On desktop: Using Shift + ALT you often can overrule this and select text anyway.
On mobile: Using the reader mode or the Print preview often works. It does for me on this website.
The firefox reader mode button doesn’t show up for me on that site. I wonder if its just a poor site or if they are intentionally trying to break it.
Could be both. You can enforce it: https://addons.mozilla.org/en-US/firefox/addon/activate-reader-view/
This is exactly why centralisation of services and large corporations gobbling up smaller companies and becoming behemoth services is so dangerous.
Its true, but otherside of same coin is that with too much solo implementation you lose benefits of economy of scale.
But indeed the world seems like a village today.
you lose benefits of economy of scale.
I think you mean - the shareholders enjoy the profits of scale.
When a company scales up, prices are rarely reduced. Users do get increased community support through common experiences especially when official channels are congested through events like today, but that’s about the only benefit the consumer sees.
What?! No, it must be Kaspersky!
/s
Is there a chance that this makes organisations move to Linux?
Not really. This isn’t a Windows problem. This is a faulty software problem. People can write faulty software on Linux too.
I guess they would want some cybersecurity software like Crowdstrike in either case? If so, this could probably have happened on any system, as it’s a bug in third party software that crashes the computer.
Not that I know much about this, but if this leads to a push towards Linux it would be if companies already wanted to make the switch, but were unwilling because they thought they needed Crowdstrike specifically. This might lead them to consider alternative cybersecurity software.
No because Windows Indoctrination starts with Academia.
There will have to be heavy monetary losses before IT is forced to leave their golden goose that keeps them employed with “problems” to “fix” that soak up hours each.
But maybe they will notice the monetary losses and competitors not using their trash will pull ahead – that will get their attention. Still they require the cognition to understand the problem and select a solution and the Linux Jungle is hard for corporate minds to navigate without smart IT help.
You’d think maybe not being reliant on a 90 billion dollar company to un-fuck security would be a bigger deal than it is.
Windows usage isn’t the cause of dysfunction in corporate IT but a symptom of it. All you would get is badly managed Linux systems compromised by bloated insecure commercial security/management software.
It’s also reported in Danish news now: https://www.dr.dk/nyheder/udland/store-it-problemer-flere-steder-i-verden
Dutch media are reporting the same thing: https://nos.nl/l/2529468 (liveblog) https://nos.nl/l/2529464 (Normal article)
I just saw it on the Swedish national broadcaster’s website:
https://www.svt.se/nyheter/snabbkollen/it-storningar-varlden-over-e1l936
The annoying aspect from somebody with decades of IT experience is - what should happen is that crowdstrike gets sued into oblivion, and people responsible for buying that shit should have an epihpany and properly look at how they are doing their infra.
But will happen is that they’ll just buy a new crwodstrike product that promises to mitigate the fallout of them fucking up again.
decades of IT experience
Do any changes - especially upgrades - on local test environments before applying them in production?
The scary bit is what most in the industry already know: critical systems are held on with duct tape and maintained by juniors 'cos they’re the cheapest Big Money can find. And even if not, There’s no time. or It’s too expensive. are probably the most common answers a PowerPoint manager will give to a serious technical issue being raised.
The Earth will keep turning.
Not OP. But that is how it used to be done. Issue is the attacks we have seen over the years. IE ransom attacks etc. Have made corps feel they needf to fixed and update instantly to avoid attacks. So they depend on the corp they pay for the software to test roll out.
Autoupdate is a 2 edged sword. Without it, attackers etc will take advantage of delays. With it. Well today.
I’d wager most ransomware relies on old vulnerabilities. Yes, keep your software updated but you don’t need the latest and greatest delivered right to production without any kind of test first.
Very much so. But the vulnerabilities do not tend to be discovered (by developers) until an attack happens. And auto updates are generally how the spread of attacks are limited.
Open source can help slightly. Due to both good and bad actors unrelated to development seeing the code. So it is more common for alerts to hit before attacks. But far from a fix all.
But generally, time between discovery and fix is a worry for big corps. So why auto updates have been accepted with less manual intervention than was common in the past.
I would add that a lot of attacks are done after a fix has been released - ie compare the previous release with the patch and bingo - there’s the vulnerability.
But agree, patching should happen regularly, just with a few days delay after the supplier release it.
I get the sentiment but defense in depth is a methodology to live by in IT and auto updating via the Internet is not a good risk to take in general. For example, should Crowdstrike just disappear one day, your entire infrastructure shouldn’t be at enormous risk nor should critical services. Even if it’s your anti-virus, a virus or ransomware shouldn’t be able to easily propagate through the enterprise. If it did, then it is doubtful something like Crowdstrike is going to be able to update and suddenly reverse course. If it can then you’re just lucky that the ransomware that made it through didn’t do anything in defense of itself (disconnecting from the network, blocking CIDRs like Crowdsource’s update servers, blocking processes, whatever) and frankly you can still update those clients anyway from your own AV update server which is a product you’d be using if you aren’t allowing updates from the Internet in order to roll them out in dev first, phasing and/or schedules from your own infrastructure.
Crowdstrike is just another lesson in that.
some years back I was the ‘Head’ of systems stuff at a national telco that provided the national telco infra. Part of my job was to manage the national systems upgrades. I had the stop/go decision to deploy, and indeed pushed the ‘enter’ button to do it. I was a complete PowerPoint Manager and had no clue what I was doing, it was total Accidental Empires, and I should not have been there. Luckily I got away with it for a few years. It was horrifically stressful and not the way to mitigate national risk. I feel for the CrowdStrike engineers. I wonder if the latest embargo on Russian oil sales is in anyway connected?
I wonder if the latest embargo on Russian oil sales is in anyway connected?
Doubt it, but it’s ironic that this happens shortly after Kaspersky gets banned.
Unfortunately falcon self updates. And it will not work properly if you don’t let it do it.
Also add “customer has rejected the maintenance window” to your list.
Turns out it doesn’t work properly if you do let it
Well, “don’t have self-upgrading shit on your production environment” also applies.
As in “if you brought something like this, there’s a problem with you”.
While I don’t totally disagree with you, this has mostly nothing to do with Windows and everything to do with a piece of corporate spyware garbage that some IT Manager decided to install. If tools like that existed for Linux, doing what they do to to the OS, trust me, we would be seeing kernel panics as well.
How is it not a window problem?
The fault seems to be 90/10 CS, MS.
MS allegedly pushed a bad update. Ok, it happens. Crowdstrike’s initial statement seems to be blaming that.
CS software csagent.sys took exception to this and royally shit the bed, disabling the entire computer. I don’t think it should EVER do that, so the weight of blame must lie with them.
The really problematic part is, of course, the need to manually remediate these machines. I’ve just spent the morning of my day off doing just that. Thanks, Crowdstrike.
Why should it be? A faulty software update from a 3rd party crashes the operating system. The exact same thing could happen to Linux hosts as well with how much access those IPSec programms usually get.
But that patch is for windows, not Linux. Not a hypothetical, this is happening.
Your fixated on the wrong part of the story. Synchronized supply chain update takes out global infrastructure isn’t a windows problem, this happens on linux too!
Just because a drunk driver crashes their BMW into a school doesn’t mean drunk driving is only a BMW vehicle problem.
If BMW makes a car that has square wheels and needs to have everyone install round wheels so the fucking thing works you can’t blame a company for making wheels.
It’s a Microsoft problem through and through.
Your counter to the BMW Drunk driver example didn’t address drunk driving in volvos, toyotas, fords… you just introduced a variable that your upset with. BMW’s having weird wheels has nothing to do with Drunk Driving incidents.
Again your focused on the wrong thing, this story is a warning about supply chain issues.
Your just memeing on the hate for windows.
Have you never seen a DNS outage, a ansible outage, a terraform outage, a RADIUS outage, a database schema change outage, a router firmware update outage?
Again, you’re talking about something I am not. I am talking about THIS problem, right here, that is categorically a windows problem, in that it’s not on the linux kernel stack, or mac. How is this NOT a windows problem??
I love how quickly everyone has forgotten about that xz attack.
I use and love Linux and have for over two decades now, but I’m not going to sit here and claim that something similar to the current Windows issue can’t happen to Linux.
xz attack
That has nothing to do with this. That was a security vulnerability, solved in record time, blame where it was due, and patched in hours.
You’re missing the point. That compromised xz made it into some production distributions. The point here is that shit can happen to Linux, too.
It is on the sense that Windows admins are the ones that like to buy this kind of shit and use it. It’s not on the sense that Windows was broken somehow.
Hate to break it to you, but CrowdStrike falcon is used on Linux too…
And Macs, we have it on all three OSs. But only Windows was affected by this.
And if it was a kernel-level driver that failed Linux machines would fail to boot too. The amount of people seeing this and saying “MS Bad,” (which is true, but has nothing to do with this) instead of “how does an 83 billion dollar IT security firm push an update this fucked” is hilarious
Falcon uses eBPF on Linux nowadays. It’s still an irritating piece of software, but it no make your boxen fail to boot.
Even if it doesn’t kernel panic, a broken eBPF program can break all networking and I/O and effectively cripple a “running” system.
eBPF is better in a lot of aspects, but it won’t prevent software intended to block syscalls from breaking your machines if the code breaks.
The solution posted everywhere, simply delete the broken driver files, isn’t difficult or time consuming, except for situations where tens of thousands of devices stop responding at once, or where every machine is asking you for the encryption key because you’ve altered your boot parameters. Linux’ saving grace here may be that Bitlocker-style encryption is a pain to set up so Linux servers typically don’t do the encryption at all, but the recovery process for enterprise customers would still be very manual and time consuming.
It was panicking RHEL 9.4 boxes a month ago.
Were you using the kernel module? We’re using Flatcar which doesn’t support their .ko, and we haven’t been getting panics on any of our machines (of which there are many).
Nah it was specifically related to their usage of BPF with the Red Hat kernel, since fixed by Red Hat. Symptom was, you update your system and then it panics. Still usable if you selected a previous kernel at boot though.
You’re asking the wrong question: why does a security nightmare need a 90 billion dollar company to unfuck it?
What’s your solution to cyberattacks?
Linux in the hands of professionals. There’s a reason IIS isn’t used anymore.
That doesn’t solve anything. Linux is also subject to cyberattacks.
Hate to break it to you, but most IT Managers don’t care about crowdstrike: they’re forced to choose some kind of EDR to complete audits. But yes things like crowdstrike, huntress, sentinelone, even Microsoft Defender all run on Linux too.
Yeah, you’re right.
I wouldn’t call Crowdstrike a corporate spyware garbage. I work as a Red Teamer in cybersecurity, and EDRs are bane of my existence - they are useful, and pretty good at what they do. In the last few years, I’m struggling more and more to with engagements we do, because EDRs just get in the way and catch a lot of what would pass undetected a month ago. Staying on top of them with our tooling is getting more and more difficult, and I would call that a good thing.
I’ve recently tested a company without EDR, and boy was it a treat. Not defending Crowdstrike, to call that a major fuckup is great understatement, but calling it “corporate spyware garbage” feels a little bit unfair - EDRs do make a difference, and this wasn’t an issue with their product in itself, but with irresponsibility of their patch management.
Fair enough.
Still this fiasco proved once again that the biggest thread to IT sometimes is on the inside. At the end of the day a bunch of people decided to buy Crowdstrike and got screwed over. Some of them actually had good reason to use a product like that, others it was just paranoia and FOMO.
Crowdstrike already killed some Linux machines. Let’s not pretend Windows is at fault here or Linux is magically better in this area. No one is immune from software that can run as a kernel module going bad.
But, my superiority!
Every system has its faults. And I’m still going to dogpile the system with the most faults. But hell Microsoft did buy GitHub, Halo, MineCraft, and a million other things they will probably find a way to buy Linux and ruin it for us just like they ruin everything else.
Let’s see, …we are somewhere in between Extend and Extinguish on the roadmap.
Edit: Case & Point, RIP RedHat & IBM and GitHub CoPilot, what a great idea. RIP Atom Editor and probably a million other things. Do we have a KilledByMicrosoft website yet? I hope people in the pharmacy could get their prescriptions or we might have to add peoples names to the list.
None of this has to do with the current outage though.
I hope people in the pharmacy could get their prescriptions or we might have to add peoples names to the list.
Which isn’t Microsoft’s fault. Linux systems have also been taken down by Crowdstrike’s fuck ups in the recent past.
Microsoft has many faults and I’ll criticize them as I please. And if Linux is a culprit in a global outage someday I’ll contemplate criticizing them too.
This “Not Microsoft’s Fault” comes off as white knighting for Muh Billion Dolla Corporation.
Do we really need to SIMP for the company town.
Microsoft, Google, Apple, Amazon and others deserve every ounce of vitrol they earn through their shitty practices. Again I am criticizing them for being shitty not for the particulars of System X vs System Z but for the aftermath.
I get where you are coming from, but this event is pretty much entirely the fault of Crowdstrike and the countless organizations that trusted them. It’s definitely a show of how massive outages are more likely when things are overly centralized and proprietary, and managed by big, shitty, profit driven organizations. Since crowdstrike operates in kernel space, it doesn’t matter which operating system it’s on, it can break it if it does something stupid. In fact they managed to break some redhat machines not too long ago, and some Debian machines not long before that. It’s just the impact wasn’t as far reaching as this recent utter fuckup, just because fewer critical machines were affected, so we didn’t hear about those smaller fuckups in the news.
Yes, thank you, exactly. The centralized model has its benefits but it also can act as a single point of failure.
If I was going to analyze from an engineering perspective I would focus on when these inevitable events occur due to human error do we have adequate tools to roll back updates? Do we snapshot OS drives before updates? Is there adequate Safe Mode or Fallback Tools to diagnose which files are offending in order to allow the user to remove them.
In my view the windows user isn’t dignified to have the skills or intelligence needed to workaround a “setback” issue like the one yesterday.
It doesn’t help that NTFS is missing modern capabilities, or that there isn’t easy to use DIFF for the layman to understand which files were added to the filesystem that may be causing the breakage.
To be fair though even with those pot holes filled the entire design paradigm of Windows and a proprietary platform is part of the problem. Software is not broken up into package modules that can be assembled into a functioning system it is encumbered with “anti-piracy” boogie man where the software treats the user as an enemy and is designed to break.
Linux isn’t like that. I’ve cloned many distro drives and swapped them into new machines and with 1 or 2 tweaks they JustWork
I see many people on the net defending Microsoft as blameless for technical reasons.
My criticisms were that Microsoft just sucks as you interpreted correctly and offered a eloquent summary. Thank You.
Where I think the entire conversation should move is –
What are the design flaws that allowed this to happen?
“More Rust & Less C” I see some people suggest as this was allegedly a null pointer issue.
And is Windows Broken By Design? My opinion answer - Yes.
(Okay, and what to do about it before the next billion dollars is lost. I would think critical infrastructure should have a model similar to NixOS in immutability but that’s just my opinion.)
Windows does have a fallback mode called safe mode and that’s exactly what’s being used to fix this utter mess.
Package management isn’t going to save you from this as it didn’t save the Linux systems affected last time. It didn’t stop Arch Linux from failing to boot after a Grub update either.
Windows also has drive cloning tools, that isn’t unique to Linux.
NixOS isn’t immutable. It’s not an a/b root system and / isn’t read only. Rather it’s what’s known as reproducible. I am not convinced NixOS would make this any easier either given how simple the fix was. Funnily enough though tools exist called ansible and puppet for configuring systems in repeatable ways that apply to both other Linux systems, Windows systems, and even macOS.
There are like one or two valid points in this whole comment and the rest is pretty much falsehoods and misconceptions.
Edit: Forgot to mention tools exist to make Windows immutable as well. So that is an option.
Windows does have a fallback mode called safe mode and that’s exactly what’s being used to fix this utter mess.
The other fix was reboot your Windows computer at least 15 times.
Package management isn’t going to save you from this as it didn’t save the Linux systems affected last time. It didn’t stop Arch Linux from failing to boot after a Grub update either.
Not everyone was affected though :
How come not everyone was impacted?
Prior to the most recent version, grub only registered the fwsetup if detected support. If your machine detected support, you would have had the fwsetup command registered and the failure wouldn’t occur.
Sure you can criticize as much as you want but if you are wrong in your criticism it just damages all of your criticism over all.
In my opinion it is important to state facts not fiction. This was not Microsoft’s fault, no matter how much you hate Microsoft it still wasn’t there fault and saying that is was is incorrect and doesn’t solve the issue.
Well said, that’s one of the points I have been trying to get across.
Except they haven’t done anything shitty this time. What you are doing would be a bit like claiming the Nazis are responsible for micro plastics. Like yeah Nazis are shit but making false allegations is just giving their defenders something to throw in your face. It makes you, and everyone who is critical of Microsoft look dumb. How about you criticize the company that actually screwed up? They are also a multi-billion dollar company, yet you aren’t blaming them for something that is clearly their fault.
Also fyi Red Hat and IBM are still around and aren’t really a force for good anyway. Stop SIMPing for large companies.
Hilarious. I am sure that, out of principle, you have stopped using all the software that Red Hat contributes to your distribution.
If it is ok with you, I am not going to define my morality in terms of corporate interest. They are not my friends but I do not believe that shutting on their contributions does much for me either.
I am not shitting on their contributions. All I am saying is that as a large company they aren’t anymore my friend than Microsoft. Generally they still exist and make contributions. Microsoft didn’t kill them like the person I am replying to is insinuating.
Am on holiday this week - called in to help deal with this shit show :(
Don’t worry, George Kurtz (crowdstrike CEO) is unavailable today. He’s got racing to do #04 https://www.gt-world-challenge-america.com/event/95/virginia-international-raceway
i hope you get overtime!
Microsoft should test all its products on its own computers, not on ours. Made an update, tested it and only then posted it online.
Microsoft has nothing to do with this. This is entirely on Crowdstrike.
We’re all going to be so smug.
Me too. Additionally, I use guix so if a system update ever broke my machine I can just rollback to a prior system version (either via the command line or grub menu).
Immutable systems sound like something desperately needed, tbh. It’s just such an obvious solution and I’m surprised that it’s been invented so late
That’s assuming grub doesn’t get broken in the update…
True, then I’d be screwed. But, because my system config is declared in a single file (plus a file for channels) i could re-install my system and be back in business relatively quickly. There’s also guix home but I haven’t had a chance to try that.
Is there an easy way to silence every fuckdamn sanctimonious linux cultist from my lemmy experience?
Secondly, this update fucked linux just as bad as windows, but keep huffing your own farts. You seem to like it.
username… checks out?
Oh you really have no fucking clue. It’s medical and no treatment has worked for more than a few weeks. it’s only a matter of time before I am banned. Now imagine living with that for 4+ decades and being the butt of every thread’s joke.
A real shame that can’t be considered medical discrimination.
That sounds exhausting. I hope you find peace, one day.
I’d unsubscribe from !linux for a start.
I’m pretty sure this update didn’t get pushed to linux endpoints, but sure, linux machines running the CrowdStrike driver are probably vulnerable to panicking on malformed config files. There are a lot of weirdos claiming this is a uniquely Windows issue.
Hi there! Looks like you linked to a Lemmy community using a URL instead of its name, which doesn’t work well for people on different instances. Try fixing it like this: !linux@lemmy.ml
Thanks for the tip, so glad Lemmy makes it easy to block communities.
Also: It seems everyone is claiming it didn’t affect Linux but as part of our corporate cleanup yesterday, I had 8 linux boxes I needed to drive to the office to throw a head on and reset their iDrac so sure maybe they all just happened to fail at the same time but in my 2 years on this site we’ve never had more than 1 down at a time ever, and never for the same reason. I’m not the tech head of the site by any means and it certainly could be unrelated, but people with significantly greater experience than me in my org chalked this up to Crowdstrike.