<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[azureopslab.com]]></title><description><![CDATA[Real-world Azure admin scenarios from daily work experience.]]></description><link>https://azureopslab.com</link><image><url>https://cdn.hashnode.com/res/hashnode/image/upload/v1593680282896/kNC7E8IR4.png</url><title>azureopslab.com</title><link>https://azureopslab.com</link></image><generator>RSS for Node</generator><lastBuildDate>Tue, 09 Jun 2026 14:15:03 GMT</lastBuildDate><atom:link href="https://azureopslab.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[The Ticket
One Missed Reboot Took Down a Production VM for a Week— Here's What We Learned]]></title><description><![CDATA[Step 1 — Check the VM from Azure Portal First stop — the Azure Portal.
Azure Portal → Virtual Machines → [VM Name] → Overview
The VM was showing as Running. Power state was fine — the issue was clearl]]></description><link>https://azureopslab.com/the-ticket-one-missed-reboot-took-down-a-production-vm-for-a-week-here-s-what-we-learned</link><guid isPermaLink="true">https://azureopslab.com/the-ticket-one-missed-reboot-took-down-a-production-vm-for-a-week-here-s-what-we-learned</guid><category><![CDATA[azure-devops]]></category><category><![CDATA[azure backup]]></category><category><![CDATA[azure troubleshooting]]></category><category><![CDATA[linux vm]]></category><dc:creator><![CDATA[azureopslab.com]]></dc:creator><pubDate>Tue, 28 Apr 2026 15:01:28 GMT</pubDate><content:encoded><![CDATA[<p>Step 1 — Check the VM from Azure Portal First stop — the Azure Portal.</p>
<p>Azure Portal → Virtual Machines → [VM Name] → Overview</p>
<p>The VM was showing as Running. Power state was fine — the issue was clearly inside the OS.</p>
<p>Step 2 — Try Serial Console Since SSH was completely dead, we opened the Azure Serial Console to look directly inside the VM.</p>
<p>Azure Portal → Virtual Machines → [VM Name] → Serial Console</p>
<p><strong>Result: No output. Complete silence.</strong></p>
<p>The console was returning nothing — which told us this was deeper than a simple OS hang. The VM was not even reaching a stage where it could output anything to the screen.</p>
<p>Step 3 — Reboot the VM We performed a restart from the Azure Portal hoping a clean reboot would bring it back.</p>
<p>Azure Portal → Virtual Machines → [VM Name] → STOP → START</p>
<p>Result: No change.</p>
<p>VM came back to running state but remained completely inaccessible. Serial Console still silent. SSH still dead.</p>
<p>Step 4 — Try Restoring from Azure Backup Since the VM was not responding to anything, we decided to try restoring it from backup before escalating further.</p>
<p>Azure Portal → Backup Center → [VM Name] → Restore</p>
<p>We had multiple restore points available. We selected the most recent one and initiated the restore.</p>
<p>Result: Restore failed.</p>
<p>We tried the next restore point. Failed again. And the next one. Failed again.</p>
<p>Every single restore point was returning errors. Nothing was completing successfully. We were getting nowhere fast.</p>
<p>Step 5 — Raise a Case with Microsoft With the VM unresponsive and backups failing, we had no choice but to escalate to Microsoft Azure Support.</p>
<p>After investigating on their end, Microsoft came back with a critical finding:</p>
<p>"The EFI system partition for this VM has been altered."</p>
<p>This was the breakthrough we needed. EFI system partition does not change on its own — something must have modified it. Time to dig into the history.</p>
<p>Step 6 — Investigate the Change History We went back and checked every ticket ever raised against this VM.</p>
<p>We found it.</p>
<p>One week ago, two changes had been made:</p>
<p>Change 1 — Azure Admin Team: OS disk was expanded from 64 GB to 128 GB from the Azure Portal.</p>
<p>Change 2 — Linux Team: After the disk expansion, the Linux team went inside the VM and expanded the /var partition from 4 GB to 10 GB using Linux disk management tools.</p>
<p>The critical mistake: The server was never rebooted after these changes.</p>
<p>What Actually Happened? When the Linux team expanded /var on a live system, the partition table was modified on disk. But because the server was never rebooted, the OS continued running with the old partition layout in memory.</p>
<p>The EFI system partition — which tells the system exactly how to boot — was affected by this mismatch. When the VM eventually restarted due to a platform-level event, it tried to boot using a layout that no longer matched what was physically on disk.</p>
<p>Because we only discovered the issue one week after it happened, every single backup in that period had captured the VM in its broken state. Restoring from any of these backups simply restored the broken VM again.</p>
<p>Step 7 — Find a Clean Restore Point We had to go much further back — before the disk expansion ever happened.</p>
<p>We carefully checked the dates on all available restore points against the original change ticket timestamp. After searching through the full backup history, we finally found a restore point taken before the OS disk was expanded from 64 GB to 128 GB.</p>
<p>This was our last hope.</p>
<p>Step 8 — Restore from the Old Clean Backup We initiated the restore from this older backup point.</p>
<p>Azure Portal → Backup Center → [VM Name] → Restore → [Pre-Expansion Restore Point]</p>
<p>Result: Success.</p>
<p>The VM was restored to its pre-expansion state with:</p>
<p>Original 64 GB OS disk Original /var partition at 4 GB EFI system partition intact and clean VM booting normally SSH access restored. Application team confirmed services were back online.</p>
<p>Key Takeaways for Azure or Linux Admins</p>
<p>Always reboot after disk or partition changes — never leave a production server running with an unconfirmed partition layout or attached an additional disk and mount it with OS disk.Ever. Always take snapshot before doing any changes.in this case we had taken snapshot but as per policy we do not retain it after 6 days so it was deleted.</p>
<p>EFI system partition is sacred — any disk operation that touches partition tables can silently affect it. Backups are only as good as what they capture — if your VM is silently broken, every backup is backing up a broken VM.</p>
]]></content:encoded></item><item><title><![CDATA[ Unable to Access Linux VM — How We Diagnosed and Recovered It ?]]></title><description><![CDATA[The Ticket
It started with a straightforward request — "Unable to access Linux VM."
No error message. No context. Just a server that wasn't responding. As an Azure Admin, this is one of the most commo]]></description><link>https://azureopslab.com/unable-to-access-linux-vm-how-we-diagnosed-and-recovered-it</link><guid isPermaLink="true">https://azureopslab.com/unable-to-access-linux-vm-how-we-diagnosed-and-recovered-it</guid><category><![CDATA[Azure]]></category><category><![CDATA[linux vm]]></category><category><![CDATA[real time analytics]]></category><category><![CDATA[troubleshoot azure vm]]></category><dc:creator><![CDATA[azureopslab.com]]></dc:creator><pubDate>Fri, 24 Apr 2026 04:38:09 GMT</pubDate><content:encoded><![CDATA[<h2>The Ticket</h2>
<p>It started with a straightforward request — <strong>"Unable to access Linux VM."</strong></p>
<p>No error message. No context. Just a server that wasn't responding. As an Azure Admin, this is one of the most common tickets you'll receive — and the key is to follow a systematic process rather than jumping straight to conclusions.</p>
<h3>Step 1 — Try Access from the Jump Server</h3>
<p>The first thing we always do is <strong>rule out the obvious</strong>.</p>
<p>We logged into our <strong>Jump Server</strong> (the secure bastion host we use to access VMs internally) and attempted to SSH into the Linux VM from there.</p>
<p><strong>Result: Connection failed.</strong> The VM was not reachable from the jump server either — so this wasn't a local network or VPN issue on the user's side. The problem was with the VM itself.</p>
<h3>Step 2 — Check if the VM is Running</h3>
<p>Before panicking, check the basics. We went to the <strong>Azure Portal</strong> and checked the VM status.</p>
<blockquote>
<p><strong>Azure Portal → Virtual Machines → [VM Name] → Overview</strong></p>
</blockquote>
<p><strong>Result: VM was showing as Running.</strong></p>
<p>This was an important clue. The VM was powered on and Azure believed it was healthy — but something inside the OS was wrong.</p>
<h3>Step 3 — Check Serial Console</h3>
<p>Since the VM was running but unreachable, we opened the <strong>Azure Serial Console</strong> to look directly at what was happening inside the VM without needing a network connection.</p>
<blockquote>
<p><strong>Azure Portal → Virtual Machines → [VM Name] → Serial Console</strong></p>
</blockquote>
<p>What we saw immediately told us the story — the console output showed that the <strong>CPU was stuck</strong>.</p>
<p>This is a classic sign that the VM's operating system has <strong>hung</strong> — the kernel or a process has locked up, consuming the CPU and making the system completely unresponsive. No new connections can come in, no commands can run, and the OS is essentially frozen.</p>
<h3>Step 4 — Raise Approval on the Ticket</h3>
<p>In any professional environment, you <strong>never reboot a production server without approval</strong>.We have received approval within 15 min as it is a production server.</p>
<h3>Step 5 — Reboot the VM</h3>
<p>With approval in hand, we performed a <strong>restart from the Azure Portal</strong>:</p>
<blockquote>
<p><strong>Azure Portal → Virtual Machines → [VM Name] → STOP→START</strong></p>
</blockquote>
<p>We monitored the Serial Console during the reboot to watch the boot sequence in real time.</p>
<p>The VM booted cleanly — no errors, no missing files, no kernel panic.</p>
<h3>Step 6 — Verify SSH Access</h3>
<p>Once the VM was back up, we went back to the <strong>Jump Server</strong> and attempted SSH again.</p>
]]></content:encoded></item><item><title><![CDATA[Welcome to AzureOpsLab — real Azure scenarios, not just theory]]></title><description><![CDATA[If you have ever searched for an Azure solution online and found only official Microsoft documentation or generic tutorials that don't quite match what you're dealing with at work, this blog is for yo]]></description><link>https://azureopslab.com/welcome-to-azureopslab-real-azure-scenarios-not-just-theory</link><guid isPermaLink="true">https://azureopslab.com/welcome-to-azureopslab-real-azure-scenarios-not-just-theory</guid><category><![CDATA[Azure]]></category><category><![CDATA[Cloud]]></category><category><![CDATA[Devops]]></category><category><![CDATA[azure admin]]></category><dc:creator><![CDATA[azureopslab.com]]></dc:creator><pubDate>Mon, 20 Apr 2026 16:15:42 GMT</pubDate><content:encoded><![CDATA[<p>If you have ever searched for an Azure solution online and found only official Microsoft documentation or generic tutorials that don't quite match what you're dealing with at work, this blog is for you.</p>
<p>AzureOpsLab exists for one reason: to document real Azure administration scenarios as they actually happen in production environments. Not simplified examples. Not sanitized textbook cases. Real problems, real troubleshooting steps, and real fixes.</p>
<p>Every post on this blog comes from an actual situation encountered while working as an Azure administrator.</p>
<h2><strong>Who this is for</strong></h2>
<p>This blog is most useful if you are:</p>
<ul>
<li><p>An Azure admin is dealing with a problem right now and searching for a real answer</p>
</li>
<li><p>Preparing for AZ-104 or AZ-305, and want to see how concepts apply in the real world</p>
</li>
<li><p>Someone curious about what Azure administration actually looks like day to day</p>
</li>
</ul>
<h2><strong>What this blog is not</strong></h2>
<p>This is not a Microsoft documentation mirror. It is not a place for basic "what is Azure" explainers. There are thousands of those already. This is a working notes blog — written by someone in the field, for people in the field.</p>
]]></content:encoded></item></channel></rss>