Blue Box System Status

Apr 02

Intermittent network issues

We’re currently getting notifications of some intermittent network issues. It seems limited to a small set of customers. We’re currently investigating this and will be able to provide an update soon.

Update 17:00EST: This issue seems to be related to a long haul provider upstream from our network. We have checked our networks and have not found any issues. 

If you are still seeing issues, can you provide a traceroute to us which will assist us in determining which providers are having issues.

(Web only post)

Mar 27

Intermittent phone service issues

Our phone service is experiencing some issues and your contact with support
may be interrupted, If you need to contact support, please email us or
contact us in customer chat.

Update: Phones are now working again. Total outage time was about 8 minutes

(Web Only Post)

Mar 26

Scheduled Maintenance on Sunday, April 1st 23:00 to April 2nd 0:00 PDT

This is an alert regarding scheduled maintenance for this coming weekend.

Our transport provider will be preforming preventative maintenance on our transport circuit between our Tukwilla data center and our Seattle backup site. During this maintenance connectivity between the two sites will be interrupted.

Maintenance Window: Sunday, April 1st 23:00 to April 2nd 0:00 PDT

Internet connectivity to our Tukwilla and Ashburn data centers will not be affected and our Seattle backup site will fail over to its backup Internet connection. Servers affected at the backup site have IP addresses starting with 208.85.149.xxx, other servers will not be affected.

If you are running database replication or other sensitive applications between servers at our Tukwilla data center and backup site, we recommend stopping database replication during the window.

If you need assistance or have any questions please contact us at support@bluebox.net.

(web only post)

Mar 22

Network uplink upgrade [completed]

The maintenance has been successfully completed. Everything went according to plan. If you encounter any connectivity issues please let us know via our support system.

(Web only post)

Network uplink upgrade tonight

Tonight at 10:00pm PDT (05:00 UTC) we will be upgrading the capacity of one of our upstream links in order to stay ahead of growth.  As part of this upgrade, we will be gracefully shutting down the BGP session with this peer to allow traffic to naturally abate before the work is performed.  Because of this, no packet loss, latency increases or other problems are anticipated as a result of this work.  However, as it does affect our core networking gear, an increased risk of an outage does exist.

Please note that while this upgrade will greatly increase our capacity, this work does not yet address all the factors which lead to last Sunday evening’s and Wednesday morning’s network problems.  (Though this is a step in that process.)

(Web only post)

Mar 21

Network Event

We are investigating reports of a network event and will post more information here shortly.

If you are experiencing any issues, please open an urgent support ticket, or call us at 800-613-4305 and leave us an urgent voicemail.  Our team will get back to you immediately.

Update 5:00am - At this point in time, we believe this to be related to a BGP event with upstream providers.  The network appears stable at this time and we are now researching what happened.

Update 5:05am - Monitoring indicates a disruption based on which BGP route a user was taking connecting to our network.  It appears total disruption was between 4 and 9 minutes and that connectivity was restored at 4:59am.  The network continues to function normally and our team continues to investigate.

Update 5:15am - The network continues to function normally and our team continues to investigate. If you are experiencing issues, please contact support at support@bluebox.net.

Update 5:30am - This event appears related to a single upstream connection that bounced, causing traffic coming over that link to be interrupted and the BGP session to be reset. 

Update 5:45am - Attached is the associated alerting from Pingdom indicating the duration of the event.  Pingdom reports this beginning at 03/21/2012 4:53:32AM pacific, and clearing at 03/21/2012 4:55:34AM.

Uptime for cr01.tkwl01-PING: 03/20/2012 - 03/21/2012

Update 6am - We’ve confirmed this event was the result of a flap in a connection to one of our redundant upstreams.  This is the same upstream that experienced issues on Sunday night.  The issue occurred on the remote end of the gear, not controlled by BBG.  We are awaiting final details from that provider as to what happened from that vendor and will update this post later today once we have that information.

The total visible impact would have been an interruption in routing for a portion of the customer base transiting via this link for a period of aprox 3-5 minutes.  External reporting may indicate a slightly longer outage based on the polling window of the service utilized.

Based on these last two events, Blue Box is evaluating options of turning up a replacement provider and discontinuing existing service with the provider in question.  We had begun that process on Monday based on the event Sunday evening, but we will now escalate that activity.  Additionally, our networking team will be evaluating options at reducing our exposure to these types of events prior to our transition to the different transit provider. 

Update 2:54pm - Upstream provider confirmed that the disruption this morning was caused by a failed line card, which caused a systematic cycle on one side of their Border Router Infrastructure as well as a hung CPU on the other side. We are expediting steps to isolate Blue Box from this provider in a effort to eliminate our exposure to possible future event. 

(Web only post)

Mar 19

BGP Re-convergence Event

This evening, starting at 10:00pm Pacific, one of our upstream connectivity partners experienced a failure in their routing infrastructure.  BGP has re-routed that traffic over our redundant connections and all traffic should be flowing normally at this time.  Please open an urgent support ticket if you are experiencing any residual issues.

Update 10:29pm - We’ve learned from our provider that this was an unplanned emergency maintenance event that started at 10:00pm Pacific.  Our networking team was not notified of this event in advance.  Traffic continues to flow over our redundant connections normally.

(Web only post)

Mar 13

Network Issues

We have been made aware about a brief period of network related issues affecting some customers. The issues have now been cleared and everything should be working as expected again. We are still investigating the root cause.

Update:  The above issue caused some intermittent packet loss for around 10 minutes between 3:45 and 3:55am PDT (10:45-10:55 UTC).  The issue was traced back to a compromised client machine on the network attempting to DOS another host on the internet.  We have shut this machine down and are working with the client in question on clean-up.

We will also be rolling a slight network configuration change forward later today which will make it much more difficult for a single compromised machine on our network to cause problems for any of our other clients on our network.

(Web only post)

Mar 07

block-lb02 to be returned to production

As part of Monday’s power issue, one of our Blocks Load Balancer machines (block-lb02) was removed from production when it lost power.  Its HA partner automatically took over quickly thereafter.  However, to minimize the potential impact on customers during peak hours we kept block-lb02 out of a production role until we were sure the power situation had been addressed.

Tonight shortly after 11:00pm PST (07:00 UTC) we will be returning block-lb02 to a production role.  When we do this, block-lb02 will be taking over load balancer services from its standby partner (block-lb01).  During this transition, some customers with load balanced services may experience up to 2 minutes of interruption as the services transition from block-lb01 to block-lb02.

(Web only post)

Mar 05

All Systems Normal

All customers on vsh222 are now back online.  We will be following up with all affected customers directly. If you are experiencing issues, please contact support at support@bluebox.net.