February 2012
24 posts
Support system issues [RESOLVED]
The support system is now back online and fully operational. Please let us know if you encounter any issues.
Support system issues
We are currently experiencing some issues with our support system which are making it inaccessible. We apologize for the inconvenience and are working on a fix to get the system back as soon as possible.
Reason for Outage: Update on Delivery
This is a follow up to our original post regarding delivery of the Reason for Outage (RFO). We apologize for the delay as we have not completed this by our original deadline.
We have identified the underlying issues and are working to complete the formal RFO. We expect to have this completed, including action items by early next week.
All Services Operating Normally
At this point in time, we believe all all services are operating normally. If you find anything awry with your infrastructure, please open an urgent ticket within our support system.
A full RFO detailing this event along with information on service credits will be forthcoming in the next 24 hours.
Thank you
Network Outage Update
All switches including s43-1 are online and pushing traffic normally. Our staff are combing through customer resources at this time. If you are experiencing an issue with your service, please open an escalated support ticket and we’ll take an immediate look.
Thank you
Network outage update
One of our access layer switches appears to have gone down in the chaos. s43-1.tkwl01 is one of our newer switches; We’re working to get it online again ASAP.
(Web Only Post)
Clean-up continues...
We’re in the process of continuing the clean-up after the outage today.
We’ve just restored services to our highly available load balanced
services (each half of the HA-pair needed to resynchronize with the other
half.) More updates as things progress.
Update on outage
Network traffic appears to be returning to normal. Our initial
investigation seems to indicate that one half of one of our stacked
distribution-layer switches had a failing supervisor module, which around
16:20 PST today created a loop in spanning-tree
Severe network issues
We appear to be having a severe problem with our core routing
infrastructure which is creating extremely high packet loss for our entire
network. We are investigating this now and will update this status page as
we learn more.
Presidents' Day Office Closure
Our offices will be closed tomorrow, Monday, February 20th, in observance of Presidents’ Day. We will have reduced staff for support issues and limited phone support. As always, our on-call staff will be available for emergencies.
If you have an urgent issue that needs to be addressed while our offices are closed, please make sure to open an urgent ticket or call our phone system and leave...
Network connectivity issues [RESOLVED]
We have identified the problem and have a fix in place. All connectivity issues should now be resolved. We are continuing to investigate the root cause, but initial data indicates a failing optic or fiber strand between one distribution router and our core. This event affected a fraction of our data center facilities.
Timeline
2:45am - Initial flap of one redundant link between distribution...
Network connectivity issues
A portion of our network is currently experiencing some packet loss. We are investigating and will provide further updates once we have more details.
Load-balancer upgrade complete
block-lb02 has been successfully upgraded, and put back into its primary
role. During the transition, load balanced services experienced between 20
and 60 seconds of down-time depending on where they fell in the start-up
order. This concludes the work we will be performing on this server and no
further interruptions are expected. If you are experiencing any further
problems with load...
Load balancer update.
We’ve been able to determine that block-lb02 experienced a partial failure
wherein those portions of its software stack which load balance requests
stopped functioning due to a memory starvation issue, but those portions of
the stack responding to heartbeat requests didn’t stop working; meaning
that the other unit in this highly-available pair (block-lb01) did not take
over as it...
Load Balancer outage update.
The load balancer outage referenced in the latest update also affected a
number of customer applications. It appears that the load balancer was
timing out on processing requests, however since it didn’t go down
completely and was still available for its health check, it didn’t fail
over to its redundant pair. Now that things are back online, we are
investigating the cause of this...
Boxpanel unavailable for 10 minutes
Our Box Panel application was offline for 10 minutes today (3:12pm - 3:22pm
PT), during which time API was also likely offline.
Our load balancer was in a non-responsive state, but after a reboot all
appears normal.
Emergency Networking Upgrade Complete
We have completed the emergency network upgrade.
If you see any abnormal latency, reduction in service or dropped packets please open an escalated ticket with our support team or give us a call at 800.613.4305.
Emergency Networking Update: 0200 PST Status
We’re about 65-70% complete with the networking work. There have been some
BGP issues with our upstream link that has caused some short outages for
sites, and some internal traffic between servers has been slower than usual.
We’ll be getting a complete wrap up out early next week.
Emergency Network upgrade - updated
The network upgrade is proceeding according to plan. There is the the
potential for diminished functionality to some hosts when some changes take
effect.
If you are noticing connectivity issues, please let us know at
support@bluebox.net or via customer chat.
Network Issue this morning.
From 10:35 - 10:55 a portion of our network experienced some packet loss
due to a distributed denial of service attack (DDoS) directed at a server
on our network. We have addressed the issue, and any network connectivity
issues should now be resolved. If you continue to experience any network
problems, please get in touch with our support team, and we’ll address it
as best we can.
Emergency Network Upgrade
We will be performing an emergency upgrade of our core routers and the distribution layer of our network from GigE to 10-GigE on Friday, February 10th, 2012 starting at 23:00 PST. This network upgrade will address network performance issues that have been raised and alleviate network congestion. No downtime should be incurred as a result of this upgrade. We will have our senior administrators as...
All Systems Go
All systems are go at Blue Box.
Networking Issue: Updated
From 11:25 to 11:45pm Pacific Time, a small portion of our network experienced packet loss due to a traffic flood generated by a customer’s server. This server has been terminated and all network activity is operating normally at this time. We will continue to monitor the situation closely and our technology and networking teams will be reviewing the event in full detail tomorrow.
Experiencing Networking Issue
This issue is affecting approximately 30% of our network.