<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>GeekISP Tech Blog</title>
	<atom:link href="http://blog.geekisp.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.geekisp.com</link>
	<description>Tech info about GeekISP&#039;s Shared and VPS Hosting Service</description>
	<lastBuildDate>Fri, 11 May 2012 17:10:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Minor interruption</title>
		<link>http://blog.geekisp.com/2012/05/minor-interruption/</link>
		<comments>http://blog.geekisp.com/2012/05/minor-interruption/#comments</comments>
		<pubDate>Fri, 11 May 2012 16:25:07 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[GeekISP]]></category>

		<guid isPermaLink="false">http://blog.geekisp.com/?p=110</guid>
		<description><![CDATA[We had a few machines reboot just now. Still investigating&#8230; UPDATE 1 &#8211; 12:45pm ET &#8211; All services are restored. Investigation continuing. UPDATE 2 &#8211; 1:10pm ET &#8211; As best I can tell this was caused by a cascading power issue in the datacenter. I believe due to a batch task plus other normal activity [...]]]></description>
			<content:encoded><![CDATA[<p>We had a few machines reboot just now.  Still investigating&#8230;</p>
<p>UPDATE 1 &#8211; 12:45pm ET &#8211; All services are restored.  Investigation continuing.</p>
<p>UPDATE 2 &#8211; 1:10pm ET &#8211; As best I can tell this was caused by a cascading power issue in the datacenter.  I believe due to a batch task plus other normal activity that our peak power requirements are greater than the capabilities of our connected UPS devices.  Then due to the cross-connected nature of our environment, load was combined onto another UPS thereby exceeding the capacity there.</p>
<p>We&#8217;re working with our datacenter provider now to plan a migration of some servers in such a way that we can spread the power load more evenly, and better isolate servers from each other.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.geekisp.com/2012/05/minor-interruption/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PHP Upgraded to 5.3.13</title>
		<link>http://blog.geekisp.com/2012/05/php-upgraded-to-5-3-13/</link>
		<comments>http://blog.geekisp.com/2012/05/php-upgraded-to-5-3-13/#comments</comments>
		<pubDate>Thu, 10 May 2012 14:06:33 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[GeekISP]]></category>

		<guid isPermaLink="false">http://blog.geekisp.com/?p=107</guid>
		<description><![CDATA[Just a quick note that this morning our PHP install was upgraded from 5.3.10 to 5.3.13 to patch several security issues. Please contact support if you have any issues resulting from the upgrade.]]></description>
			<content:encoded><![CDATA[<p>Just a quick note that this morning our PHP install was upgraded from 5.3.10 to 5.3.13 to patch several security issues.  Please contact support if you have any issues resulting from the upgrade.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.geekisp.com/2012/05/php-upgraded-to-5-3-13/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Major Outage &#8211; NFS server forced migration</title>
		<link>http://blog.geekisp.com/2012/04/major-outage-nfs-server-forced-migration/</link>
		<comments>http://blog.geekisp.com/2012/04/major-outage-nfs-server-forced-migration/#comments</comments>
		<pubDate>Sun, 08 Apr 2012 06:03:31 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[GeekISP]]></category>
		<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://blog.geekisp.com/?p=100</guid>
		<description><![CDATA[Outage estimated at approximately 6 hours, from Saturday 4/7 @ 7pm ET through 4/8 @ 1am ET. Power issues affecting our primary NFS server appeared to cause some sort of boot failure. Emergency NFS migration performed. All services restored, all data should be intact. More info to come. Update &#8211; 10:15am 4/8 &#8211; MySQL InnoDB [...]]]></description>
			<content:encoded><![CDATA[<p>Outage estimated at approximately 6 hours, from Saturday 4/7 @ 7pm ET through 4/8 @ 1am ET.  Power issues affecting our primary NFS server appeared to cause some sort of boot failure.  Emergency NFS migration performed.</p>
<p>All services restored, all data should be intact.  More info to come.</p>
<p>Update &#8211; 10:15am 4/8 &#8211; MySQL InnoDB issues reported, likely due to inadvertent config file changes.  Investigating now.  (Resolved @10:35am: the InnoDB log file size was changed inadvertently.  It has been reverted, but I may go through the proper procedure to increase its size since the old size is a bit small).</p>
<p>Update 2 &#8211; More details.</p>
<p>This outage was caused by a variety of factors.  I&#8217;ll proceed chronologically in the interest of full disclosure.</p>
<p>First, some background.  At GeekISP, we really strive for reliability above almost everything else, since we need to have a stable platform as a prerequisite for building anything bigger.  To that end, we try to have redundancy for every critical component &#8211; preferably of the auto-failover variety.  For instance, there are a redundant pair of firewalls, each connected to a separate power controller and each power controller connected to a separate UPS.  This way it would require a simultaneous failure of at least 2 components to cause both firewalls to be offline.  Other servers at GeekISP have redundant power supplies for the same reason.  Our typical deployment might have 1 power supply connected to a UPS and the other to utility power, or sometimes we connect both power supplies to a UPS.  Either way we have a reasonable level of protection from single-component power problems (and in some cases multiple components, but not all cases).</p>
<p>With the above in mind, the first failure observed was at approximately 7pm ET on 4/7.  Both firewalls suddenly went offline, cutting off connectivity to the datacenter.  Some of the other machines connected to the same power controllers as the firewalls also went offline at this time, but those machines rebooted cleanly &#8211; the firewalls did not.  The monitoring station external to GeekISP noticed the problem, but unfortunately I did not.  At the time I was in a loud environment and simply did not hear my phone.</p>
<p>At approximately 9:15pm ET I heard my pager go off again, and responded immediately.  I immediately recognized the problem and contacted our datacenter provider for support, and I had someone in front of the main GeekISP rack within 30 minutes.  The firewalls were up at this point, and then it became clear that there were larger issues.</p>
<p>Whatever knocked both firewalls offline at the same time also affected a number of other machines.  Most critically, GeekISP&#8217;s primary NFS server was knocked offline, but it did not recover cleanly on boot.  Normally if power is yanked on a FreeBSD server it will boot up and do a deferred fsck, allowing the machine to resume its normal duties while the disk is analyzed slowly in the background.  Instead of that, we observed the machine booting up normally, but it would hang after reporting it&#8217;s plan to defer all filesystem checks.  The front LEDs showed no sign of disk activity, and a ctrl-c on both a real and virtual keyboards did not allow the boot to proceed.  Booting to single-user mode exhibited the same behavior.</p>
<p>After several failed attempts to get the machine booted in creative ways, we made some progress.  We were able to boot the machine using a FreeBSD 9.0 live-cd and bring it up on the network.  The livecd didn&#8217;t exhibit any problems and I was able to see the data on the disk just fine.  This was at approximately 1am ET.</p>
<p>For a few months prior to this event, I had been planning to upgrade the main NFS server at GeekISP.  Life and other work prevented me from ever being able to make the final switchover, but I had the other machine about 90% configured and also seeded with about 85% of the data.  It was clear that the migration was happening immediately, so I began making a checklist and rsync&#8217;ing the data.</p>
<p>By about 2am ET web traffic and mail had been fully restored (mail was up a bit earlier actually &#8211; there is a small, separate cluster for it) and I was tidying up some of the many loose ends.  At 3am, literally moments before I was going to turn in for the night, my terminals hung.  No response.  Both firewalls had disappeared again.</p>
<p>It took about 20 more minutes to get a member of the datacenter team in front of the rack, but the story was largely the same.  Except more widespread.  This time we had lost power to 2 of the 3 power controllers in our main rack, and also in the power controller on our auxillary rack.  3 separate power controllers, each connected to a dedicated UPS (none of which were operating at more than ~50% capacity), all rebooted at approximately the same time.  1 power controller stayed online throughout both events.  Fortunately, this time we didn&#8217;t have any boot hangs, and I was able to restore order quickly.</p>
<p>Running low on brain power and phone power, the datacenter crew helped me reroute some power cables on the theory that we were somehow overrunning the capacity on the UPS&#8217;s.  The datacenter team and I were (and still are) skeptical that that is the actual cause, but, there just isn&#8217;t much else that can cause the events we observed.  We concluded the response at approximately 3:45am ET.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.geekisp.com/2012/04/major-outage-nfs-server-forced-migration/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SquirrelMail Misbehaving</title>
		<link>http://blog.geekisp.com/2012/03/squirrelmail-misbehaving/</link>
		<comments>http://blog.geekisp.com/2012/03/squirrelmail-misbehaving/#comments</comments>
		<pubDate>Wed, 14 Mar 2012 13:55:06 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[GeekISP]]></category>

		<guid isPermaLink="false">http://blog.geekisp.com/?p=96</guid>
		<description><![CDATA[A couple customers have written in this morning to report issues with SQMail. I&#8217;m looking at this now. UPDATE @10:25am ET &#8211; I found the cause &#8211; one of GeekISP&#8217;s firewalls rebooted last night, and the daemon startup sequence was such that haproxy, which is used to handle load balancing of some internal traffic, started [...]]]></description>
			<content:encoded><![CDATA[<p>A couple customers have written in this morning to report issues with SQMail.  I&#8217;m looking at this now.</p>
<p>UPDATE @10:25am ET &#8211; I found the cause &#8211; one of GeekISP&#8217;s firewalls rebooted last night, and the daemon startup sequence was such that haproxy, which is used to handle load balancing of some internal traffic, started after the dns cache.  That caused haproxy to be unhappy and thus not forward connections to the backend mail server, though it was perfectly happy to accept and then close them immediately!  Thus the empty server reply error message.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.geekisp.com/2012/03/squirrelmail-misbehaving/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Webserver work this morning</title>
		<link>http://blog.geekisp.com/2012/03/webserver-work-this-morning/</link>
		<comments>http://blog.geekisp.com/2012/03/webserver-work-this-morning/#comments</comments>
		<pubDate>Tue, 13 Mar 2012 13:19:38 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[GeekISP]]></category>

		<guid isPermaLink="false">http://blog.geekisp.com/?p=91</guid>
		<description><![CDATA[Just a quickie heads-up &#8211; I&#8217;m doing some work on the webservers this morning, so they may be a little slow until I&#8217;m done. Shouldn&#8217;t be more than 15-20 minutes, hopefully. EDIT &#8211; Took a little longer than expected (I had to remember that there is a crazy manual patch step required for libxml2, see [...]]]></description>
			<content:encoded><![CDATA[<p>Just a quickie heads-up &#8211; I&#8217;m doing some work on the webservers this morning, so they may be a little slow until I&#8217;m done.  Shouldn&#8217;t be more than 15-20 minutes, hopefully.</p>
<p>EDIT &#8211; Took a little longer than expected (I had to remember that there is a crazy manual patch step required for libxml2, see <a href="http://forums.freebsd.org/showthread.php?t=8965">here</a>), but it&#8217;s all sorted out and everything is back to normal.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.geekisp.com/2012/03/webserver-work-this-morning/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Quick update &#8211; &#8216;Learn FP&#8217; folder adjustment</title>
		<link>http://blog.geekisp.com/2012/02/quick-update-learn-fp-folder-adjustment/</link>
		<comments>http://blog.geekisp.com/2012/02/quick-update-learn-fp-folder-adjustment/#comments</comments>
		<pubDate>Sat, 18 Feb 2012 16:34:09 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[GeekISP]]></category>
		<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://blog.geekisp.com/?p=88</guid>
		<description><![CDATA[Some of you may have noticed lately that the &#8216;Learn FP&#8217; mail folder has been a little less reliable lately. If you got a message that was incorrectly flagged as Spam and dropped it into &#8216;Learn FP&#8217;, it might end up back in your Spam folder. I&#8217;ve addressed this and it should not happen again [...]]]></description>
			<content:encoded><![CDATA[<p>Some of you may have noticed lately that the &#8216;Learn FP&#8217; mail folder has been a little less reliable lately.  If you got a message that was incorrectly flagged as Spam and dropped it into &#8216;Learn FP&#8217;, it might end up back in your Spam folder.  I&#8217;ve addressed this and it should not happen again &#8211; instead the message will be delivered to your inbox.</p>
<p>The reason for the odd behavior is simple if you follow what&#8217;s going on behind the scenes.  The &#8216;Learn FP&#8217; folder gets scanned periodically by a job on one of the mail servers.  This job removes the markup from the message and queues it for redelivery, but during redelivery SpamAssassin scans the message again.  Previously, when we were using SpamAssassin&#8217;s Bayesian filter, we&#8217;d also process the message as &#8216;ham&#8217; to train the filter.  Then when redelivery happened, SpamAssassin would scan the message and generally not re-mark it as Spam due to the new information it had from the learning process (though it still did mark things wrong rarely).  Since we stopped using the bayesian filter the spamassassin scans are more consistent, so once spam, always spam.  The adjustment I made was simple &#8211; skip SpamAssassin on redelivery and just go straight to the inbox.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.geekisp.com/2012/02/quick-update-learn-fp-folder-adjustment/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Monday Updates &#8211; PHP to 5.3.10</title>
		<link>http://blog.geekisp.com/2012/02/monday-updates-php-to-5-3-10/</link>
		<comments>http://blog.geekisp.com/2012/02/monday-updates-php-to-5-3-10/#comments</comments>
		<pubDate>Mon, 06 Feb 2012 15:53:15 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[GeekISP]]></category>

		<guid isPermaLink="false">http://blog.geekisp.com/?p=83</guid>
		<description><![CDATA[Happy Monday folks&#8230; I&#8217;ll be upgrading GeekISP&#8217;s PHP install to 5.3.10 to patch a security vulnerability announced a few days ago. Expect a little bit of a slowdown as I take various backends out of service for the upgrade. I&#8217;ll try to handle everything as quickly as possible. PS &#8211; I obviously didn&#8217;t get to [...]]]></description>
			<content:encoded><![CDATA[<p>Happy Monday folks&#8230;  I&#8217;ll be upgrading GeekISP&#8217;s PHP install to 5.3.10 to patch a security vulnerability announced a few days ago.  Expect a little bit of a slowdown as I take various backends out of service for the upgrade.  I&#8217;ll try to handle everything as quickly as possible.</p>
<p>PS &#8211; I obviously didn&#8217;t get to the database upgrade I mentioned in a previous post &#8211; I&#8217;ll have to find another time and give that another go.</p>
<p>EDIT 1 &#8211; 11:45am ET &#8211; Upgrade complete, everything ought to be back to normal.  Please email support@geekisp.com if you see any issues!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.geekisp.com/2012/02/monday-updates-php-to-5-3-10/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Shell server crash &#8211; 1/27 c. 4:45pm ET</title>
		<link>http://blog.geekisp.com/2012/01/shell-server-crash-127-c-445pm-et/</link>
		<comments>http://blog.geekisp.com/2012/01/shell-server-crash-127-c-445pm-et/#comments</comments>
		<pubDate>Fri, 27 Jan 2012 22:35:46 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[GeekISP]]></category>

		<guid isPermaLink="false">http://blog.geekisp.com/?p=81</guid>
		<description><![CDATA[Twice in one week&#8230; fun times. It&#8217;s been back now for a bit, and I&#8217;m continuing to investigate the cause and any possible fixes.]]></description>
			<content:encoded><![CDATA[<p>Twice in one week&#8230;  fun times.  It&#8217;s been back now for a bit, and I&#8217;m continuing to investigate the cause and any possible fixes.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.geekisp.com/2012/01/shell-server-crash-127-c-445pm-et/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Shell server crash Jan 25 c. 9:45am ET</title>
		<link>http://blog.geekisp.com/2012/01/shell-server-crash-jan-25-c-945am-et/</link>
		<comments>http://blog.geekisp.com/2012/01/shell-server-crash-jan-25-c-945am-et/#comments</comments>
		<pubDate>Wed, 25 Jan 2012 14:59:37 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[GeekISP]]></category>
		<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://blog.geekisp.com/?p=78</guid>
		<description><![CDATA[Well that was quick&#8230; we just had our first crash with the shell server since the upgrade to OpenBSD 5.0. This one was different from prior crashes under 4.7. I&#8217;m still investigating the issue in the hopes of uncovering a fix. The machine is back up and working properly again &#8211; total downtime was around [...]]]></description>
			<content:encoded><![CDATA[<p>Well that was quick&#8230; we just had our first crash with the shell server since the upgrade to OpenBSD 5.0.  This one was different from prior crashes under 4.7.  I&#8217;m still investigating the issue in the hopes of uncovering a fix.</p>
<p>The machine is back up and working properly again &#8211; total downtime was around 5 minutes, and my apologies for any inconvenience that it caused.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.geekisp.com/2012/01/shell-server-crash-jan-25-c-945am-et/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MySQL database moving to a new server</title>
		<link>http://blog.geekisp.com/2012/01/mysql-database-moving-to-a-new-server/</link>
		<comments>http://blog.geekisp.com/2012/01/mysql-database-moving-to-a-new-server/#comments</comments>
		<pubDate>Sat, 21 Jan 2012 18:26:01 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[GeekISP]]></category>

		<guid isPermaLink="false">http://blog.geekisp.com/?p=72</guid>
		<description><![CDATA[Just a quickie heads-up: I&#8217;ll be moving GeekISP&#8217;s MySQL database to a new host fairly soon. This migration will bring us up to a more recent version of MySQL, but we&#8217;re staying on the 5.0 series for now. The new box has a lot more RAM and a lot more CPU cores, so it should [...]]]></description>
			<content:encoded><![CDATA[<p>Just a quickie heads-up: I&#8217;ll be moving GeekISP&#8217;s MySQL database to a new host fairly soon.  This migration will bring us up to a more recent version of MySQL, but we&#8217;re staying on the 5.0 series for now.  The new box has a lot more RAM and a lot more CPU cores, so it should be a bit faster.</p>
<p>If you&#8217;d like to test out the new DB, just connect to db.geekisp.com on port *3307*.  Through some proxy magic that will get you to the new host.  Note that your data will be slightly stale until the cutover happens, and any changes made on the new server ahead of the cutover will be lost.</p>
<p>This migration should have minimal impact on websites, since we&#8217;re not changing the server version except for a few patchlevel updates.  That said, email support if you experience any problems.</p>
<p>Note: The exact cutover date is TBH, but likely sometime on the weekend of January 28th.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.geekisp.com/2012/01/mysql-database-moving-to-a-new-server/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

