What a difference a host makes

Over the past few months I have been struggling to get a website I manage onto a level footing as the site’s traffic began to reach over 750k page impressions per month. The VPS server it was on, despite adding additional cpu and RAM resources to it, continued to struggle – slow page load times, and Apache web server crashes were becoming commonplace.

My initial strategy was to move the site to a Media Temple Grid-Service hosting account. I won’t bore you with the details, but the Grid service has lots of space and bandwidth. Rather than running your site on one server, your site has computing resources available across the grid.

Unfortunately, each account has a limit on the amount of computing power it can use, measured in Grid Performance Units (GPU), and the site looked likely to require 5 or 10 times the amount of GPU’s that are included, and this overage would have run into several hundred dollars of additional charges a month. Not feasible.

The other disappointment was that although the hosting platform had become stable with the move to Media Temple, page load times were still disappointingly slow. This wasn’t a real surprise though as I had noticed this with other Media Temple hosted sites.

So off I went to find another hosting alternative. During my research I came across a crowd called SimpleHelix. I was little sceptical from their site as it looks ahem, rather similar to Media Temple’s, particularly the hosting plan names. Some research on blogs and forums gave mixed, but overall positive reviews, so I decided to give it a whirl. I was particularly interested in it’s technology which speeds up web applications.

A month in, and no complaints. Traffic is still growing, but page load times are much faster, and I have experienced no down time or other support issues.

It is not exactly scientific, but looking at the Google Webmaster stats for the web crawler page load time, I can see that pages have been loading much faster since the change in hosts.

Google Crawl Stats

Now actual page load times will not be as fast as they are for the Google web crawler, but I am sure this does give a sound indicator that page load times have decreased significantly since changing over to SimpleHelix. 

So, not as much disk space or bells and whistles as Media Temple, but excellent performance on a $20/month hosting package for a site serving 750,000 pages a month. Well worth it.

Comments

MySQL Cluster Hosting (for 30 euro a month)

I am currently studying for the MySQL Cluster exam, primarily because I find it a pretty interesting (for nerds) technology, but also because it is a high availability, high performance database system ideal not just for the ISP’s and telcos that are using it now, but also for web based businesses and well anyone that needs a reliable and high performance database system.

I am fortunate enough that I am able to get some real world experience with a MySQL Cluster at work through a client, but given that the system is a business critical application that must be up 24×7 at all times (and has been since 2006), I won’t be tinkering around with that.

As you might have guessed from the name, a MySQL Cluster is in fact a collection of servers tied together into one big database. It is a highly available system as 1 or more servers will each hold a copy of the same data, allowing the cluster to continue to work, even with the failure of one more servers within the cluster (known as nodes). It is also ultra fast as data and indexes are held primarily in RAM, allowing for the processing of queries in jig time across the cluster.

Ideally, even for a test cluster, you will need 3-4 servers (at least 2 data nodes, a management node and a MySQL server). This is a lot of hardware to just muck around with a new technology. Even for me…. Clearly I wasn’t going to get this at work, so I had to look elsewhere to supply my needs.

There were basically 3 options:
1) Resurrect some old machines into a cluster at home. Lots of pain with networking, old hardware etc.
2) Virtualize a decent box into 3 or 4 virtual machines using Vmware or other virtualisation technology
3) Get hosting for the cluster!

I was looking at option 2 for a while, but in the end I felt it was going to be too much work, and crucially, I felt the need to hook my cluster into a web application to get a real feel as to how it was going to work.

So that left purchasing some hosted servers for my cluster. I would need my own boxes as setting up a cluster requires root access to the servers. Shared hosting won’t do.

After realizing $300 a month, even at today’s exchange rate, was a bit too much for enhancing my training, I decided that perhaps a combination of options 2 & 3 might be the best approach: purchasing a Virtual Private Servers (VPS) solution from a hosting provider. A VPS would allow me to have complete control over the server, but without having to buy a full dedicated server.

As price was an important factor, I shopped around a bit and finally settled on Tektonik, where I now have 3 Centos 5 VPS servers running my cluster: 2 data nodes and a management node also serving as the MySQL server. The plan I have chosen is quite a low spec one, with just 256MB of RAM (RAM being the key factor for cluster data nodes), but the cluster database will be quite small so I don’t anticipate needing any more than that for the time being. VPS resources can be increased immediately via the control panel if I do need more RAM anyway.

The total cost of each VPS server is USD $15 per month, but there was a special offer at the time knocking 10% off the price, so 3 * $13.50 = $40.50, at today’s rate, a bit under 30 euro.

So for 30 euro I have a working MySQL Cluster, albeit a pretty small one, but it lets me cut my teeth into working with it as it serves a real-world application (a Facebook application I developed), and now that the cluster is set up, additional servers can be easily added to it.

More on how I set up the cluster in a future posting.

Comments (1)

Keeping an eye on your hosting

Since I no longer manage servers directly, I have removed the GSM phone attached to my office server, sending me text alerts whenever a server or service is down.

I do wish to keep an eye on client websites of course though, as well as this one (which has had some small downtime issues in the past month due to some capacity issues with it’s host).

There are a plethora of website monitoring services out there, but the one I decided to take for a test drive was Site24×7.com. It offers the usual monitoring of a website and sends email and text alerts when a site is down. Best of all, it is free!

I had my concerns about the quality of the service given it is free – it is the kind of thing I would happily pay 10-15 euro a month for. I have checked the logs a few times and there are plenty of visits from the monitoring bot and it has reported downtimes on the sites it is monitoring within a minute or two. All in all, it seems a quality service.

One element which I am just about to take a look at in more detail is it’s ability to monitor transactions. This would allow you to not only ensure a site is up, but that it is functioning correctly by say, performing a search or adding something to the website’s shopping cart.

In doing so, you can ensure that the database server is running ok too, or nobody has made a change to your code that has broken the site!

Comments

The more you know, the less happy you are

My first (and certainly not last) 4am post!
Prepare to be underwhelmed.

For the last 36 hours I have been doing battle with two servers located within the same data centre and the guys who manage the network therein.

At about 12pm Tuesday I noticed that a number of mails from server 1 had not yet reached server 2. A quick look and push of the mail queue, and Exim is telling me it has “No route to host”. Hmm, not a good sign.

My first priority was to see just how big an issue it was. After running some checks using external mail programs and some log file checking, I satisfied myself that at least the routing problem seemed to be restricted to those two servers. Or rather those two subnets within the data centre network.

Neither server could ping the other, and even switching off the software firewalls, Iptables and APF, did nothing for me.

At this stage I felt the issue lay outside of either box and bumped it to DC support staff. I had a sense of foreboding about this as I knew in my heart that this kind of small, localised and fairly complex issue would take a lot of two-ing and fro-ing before it landed in the inbox of the sort of knowledgeable techie that could correctly identify and resolve the issue.

Meanwhile, back at the mail servers, mails are piling up in both mail queues. Gah! So a few 4 hour message not yet delivered mails will be received, nothing serious.

I won’t bore you with the mind numbing problem tennis I played with the support staff to get my issue aired, we all have our tales of woe in the department.

Eventually it was declared (second hand in the form “the network engineer said..”) that the network config on either or both machines were not correct.

Now, I’m not the world’s smartest guy, but if I have one strength, it is the ability to approach problem solving in an intelligent way. i.e: If two servers have been working fine for the past 12 months, then why the hell would you need to go about messing with their settings?

It was mentioned that a Cisco router (I think the brand was important as many techies seem to think Cisco is some sort of mythical place that nobody should ever enter or discuss in detail) was recently replaced and that the settings might have worked in the past but, blah blah blah.

“Hmm, ok, give me the settings and I will see how they compare to what I currently have”. And yes, on one server they were different. But changing the config for eth0 is not something to enter into with haste. In fact, it is up there with marrying a woman with a very nosy mother.

So I called them back to double check. No, no, I was assured, those values are correct.

Correct they may be, but not for my server that was down for 3 hours after I applied said settings. I didn’t think it would take them 3 hours to get it back up after I called and explained to them to get someone over to the box, login and apply my “wrong” settings once more, but they spent an hour rebooting the machine that was fine to get themselves warmed up for the main event.

So several hours later, I have my server back up and a gleeful response to the trouble ticket informing me that the server is now back online and the issue was resolved.

Apart from my original problem that is.

It was at that point that I remembered why I had tied bubble wrap to my forehead earlier in the day.

So, I have been immersing myself in learning more about networking, routing and eth0 than I really want to know, but after some careful information gathering, I have been able to create a static route between the two servers to re-establish the network route that was so cruelly taken from me by fate on Tuesday morning.

I had better remember to put a script in that adds the routes back in on next reboot.

I’m left with a slightly more advanced knowledge of networking and a distrust of asking front line tech support about such issues in future.

Which most likely means the next time something like this comes up I will try fixing it myself.

Apparently ignorance is bliss. I wouldn’t know.

Comments

The more you know, the less happy you are

My first (and certainly not last) 4am post!
Prepare to be underwhelmed.

For the last 36 hours I have been doing battle with two servers located within the same data centre and the guys who manage the network therein.

At about 12pm Tuesday I noticed that a number of mails from server 1 had not yet reached server 2. A quick look and push of the mail queue, and Exim is telling me it has “No route to host”. Hmm, not a good sign.

My first priority was to see just how big an issue it was. After running some checks using external mail programs and some log file checking, I satisfied myself that at least the routing problem seemed to be restricted to those two servers. Or rather those two subnets within the data centre network.

Neither server could ping the other, and even switching off the software firewalls, Iptables and APF, did nothing for me.

At this stage I felt the issue lay outside of either box and bumped it to DC support staff. I had a sense of foreboding about this as I knew in my heart that this kind of small, localised and fairly complex issue would take a lot of two-ing and fro-ing before it landed in the inbox of the sort of knowledgeable techie that could correctly identify and resolve the issue.

Meanwhile, back at the mail servers, mails are piling up in both mail queues. Gah! So a few 4 hour message not yet delivered mails will be received, nothing serious.

I won’t bore you with the mind numbing problem tennis I played with the support staff to get my issue aired, we all have our tales of woe in the department.

Eventually it was declared (second hand in the form “the network engineer said..”) that the network config on either or both machines were not correct.

Now, I’m not the world’s smartest guy, but if I have one strength, it is the ability to approach problem solving in an intelligent way. i.e: If two servers have been working fine for the past 12 months, then why the hell would you need to go about messing with their settings?

It was mentioned that a Cisco router (I think the brand was important as many techies seem to think Cisco is some sort of mythical place that nobody should ever enter or discuss in detail) was recently replaced and that the settings might have worked in the past but, blah blah blah.

“Hmm, ok, give me the settings and I will see how they compare to what I currently have”. And yes, on one server they were different. But changing the config for eth0 is not something to enter into with haste. In fact, it is up there with marrying a woman with a very nosy mother.

So I called them back to double check. No, no, I was assured, those values are correct.

Correct they may be, but not for my server that was down for 3 hours after I applied said settings. I didn’t think it would take them 3 hours to get it back up after I called and explained to them to get someone over to the box, login and apply my “wrong” settings once more, but they spent an hour rebooting the machine that was fine to get themselves warmed up for the main event.

So several hours later, I have my server back up and a gleeful response to the trouble ticket informing me that the server is now back online and the issue was resolved.

Apart from my original problem that is.

It was at that point that I remembered why I had tied bubble wrap to my forehead earlier in the day.

So, I have been immersing myself in learning more about networking, routing and eth0 than I really want to know, but after some careful information gathering, I have been able to create a static route between the two servers to re-establish the network route that was so cruelly taken from me by fate on Tuesday morning.

I had better remember to put a script in that adds the routes back in on next reboot.

I’m left with a slightly more advanced knowledge of networking and a distrust of asking front line tech support about such issues in future.

Which most likely means the next time something like this comes up I will try fixing it myself.

Apparently ignorance is bliss. I wouldn’t know.

Comments