• Sales: (866) 518-YARD

Archive for March, 2009

March 30th Outage

By John Dillon | March 31st, 2009 at 4:03AM

Dear Engine Yard Customers,

As many of you know, we experienced a severe outage at our west coast data center yesterday; many of our customers were affected and experienced several hours of downtime. Our engineers became aware of the problem as soon as it occurred, and began the relevant data center escalation procedures.

Engine Yard customers rely on us to run and support their business-critical applications, and that includes relying on our selection of vendors. In this case, we have failed to meet our service level agreements with our west coast customers, and we will, of course, be providing customers with the appropriate service credits.

In the attached report, I have detailed yesterday’s issues, as well as the swift steps we are taking to ensure that this does not happen again. We sincerely apologize for this outage, but are more committed than ever to providing the level of service Engine Yard customers have come to expect.

If you have any additional inquiries, members of our technical teams are available to answer any and all questions; emails can be sent to info@engineyard.com.

Here at Engine Yard we are major supporters of Ruby and Rails. We understand that in order to grow, our ecosystem needs a network of reliable and professional service providers, and we intend to deliver.

John Dillon
CEO


What Happened

Yesterday March 30th, at 9:00 a.m. (PST), our west coast data center experienced a loss of internet connectivity. Our support engineers detected the outage immediately and began investigating the cause. Once we confirmed that the cause was connectivity, we posted the first update to our status blog (9:19 a.m.). We continued to inform customers with new posts as new information was communicated from Herakles.

We were in touch with Herakles senior management for updates at 15 minute intervals. Connectivity began to be restored at approximately 1:30 p.m. and all customers were fully restored by 3:45 p.m. The outage affected about two thirds of our customer base.

Why Did It Happen

Our data-center provider — Herakles — maintains redundant internet uplinks with redundant equipment. Normally the failure of a single internet uplink or switch will prompt a failover event, with minimal loss of connectivity. In this case, however, the route processor of one of the redundant switches (a Cisco 6509) malfunctioned. As part of the malfunction, the device stopped seeing its BGP peers as active, and as such, determined them to have failed. As a result, the device incorrectly promoted itself to master switch and stopped passing traffic inbound or outbound. Complicating the matter, the alerts from the malfunctioning switch that should have notified Herakles monitoring systems of the failure were themselves not routed past the switch.

How It Was Repaired

Herakles data center network engineers worked with Cisco on-site engineers and began debugging the failed switch immediately. The first attempt to repair the switch — by replacing its route processor — failed. After additional trouble-shooting steps, the support engineers physically disconnected the malfunctioning switch, forcing the redundant switch to take over as master. This fully restored traffic, but has now left the internet uplink without switch-level redundancy.

Next Steps

Herakles is currently testing a new redundant switch in its test lab, and will install this during a scheduled maintenance window as soon as possible. When we receive notice of the scheduled maintenance window from Herakles, we will immediately communicate this to customers.

Engine Yard Plans

Starting in September 2008, we began the process of adding an alternative provider to our west coast data center. Our choice was to use our east coast data center connectivity provider as an alternative.

Since the new provider did not yet have a presence in Herakles, this process has taken several months to implement. By April 15th, we will be able to offer this provider as an alternative. At that time we will coordinate with customers who wish to move to the new provider.

Share this post:
  • email
  • Digg
  • del.icio.us
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • Twitter
  • Google Bookmarks
  • Facebook
  • LinkedIn
Popularity: 1% |
Rate this post: 1 Star2 Stars3 Stars4 Stars5 Stars
Loading ... Loading ...

Looking For Some Ruby This Weekend?

By Leah Silber | March 26th, 2009 at 7:03AM

We’ve got you covered!

Engine Yard engineers are all spanning the globe this weekend; meet co-founder Ezra Zygmuntowicz at the Emerging Technologies for the Enterprise event in Philadelphia, and Rails Core Team member Yehuda Katz at Scotland on Rails.

Got a question on the future of Rails? Deployment best practices? Something else Ruby or Rails related? Track them down and ask—we’d love to get to know you!

Share this post:
  • email
  • Digg
  • del.icio.us
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • Twitter
  • Google Bookmarks
  • Facebook
  • LinkedIn
Popularity: 1% |
Rate this post: 1 Star2 Stars3 Stars4 Stars5 Stars
Loading ... Loading ...

Routing Issues with Some Clusters

By Nick French | March 10th, 2009 at 7:03AM

A routing issue with our Sacramento data center affected several clusters early this morning. The data center failed over to a second provider and the issue was resolved at 7:37am PDT. We take these issues very seriously and have posted a summary of the interruption.

You can read the original blog post here: http://engineyard.wordpress.com/2009/03/10/routing-issue-rca-notice/, and the latest update here: http://engineyard.wordpress.com/2009/03/11/xo-communications-link-re-enabled-by-herakles/

For all maintenance and downtime updates, please see our Status Blog: http://engineyard.wordpress.com/

Share this post:
  • email
  • Digg
  • del.icio.us
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • Twitter
  • Google Bookmarks
  • Facebook
  • LinkedIn
Popularity: 1% |
Rate this post: 1 Star2 Stars3 Stars4 Stars5 Stars
Loading ... Loading ...

MountainWest Hackfest: Featuring Solo!

By Leah Silber | March 9th, 2009 at 9:03AM

Those of you who were lucky enough to make it to last year’s MountainWest RubyConf probably remember the original Engine Yard Hackfest. We had a swanky hotel suite, food, drinks and snacks, and round’ the clock hacking. We met well over 100 developers and even hired a few!

When organizers Mike Moore and Pat Eyler announced this year’s MWRC, we jumped right back on the bandwagon. Engine Yard is proud to be sponsoring the event again, and pleased to bring back the MWRC Hackfest — with a twist!

This year, the suite is bigger, the food is better and we’ve got a new offering to play with. Hackfest attendees will receive free access to the Engine Yard Solo offering throughout the show, and an exclusive discount code for new signups.

There’s been a lot of interest and excitement around Solo, and this is your chance to work with it first hand. It’s a great, affordable option for hosting Ruby apps in the cloud, and the perfect choice for many new projects. Engine Yard Solo developers and experts will be on hand to answer questions and talk tech. What more could you ask for?

The Hackfest will be held both nights of MountainWest RubyConf at the Hilton Salt Lake City Center. It will begin immediately following the conference and stay open until the last Hacker leaves.

Here’s to hoping we see you there; the drinks are on us!

Share this post:
  • email
  • Digg
  • del.icio.us
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • Twitter
  • Google Bookmarks
  • Facebook
  • LinkedIn
Popularity: 1% |
Rate this post: 1 Star2 Stars3 Stars4 Stars5 Stars
Loading ... Loading ...