Mar 14, 2017
http://ironsysadmin.com/wp-content/uploads/2017/03/IronSysadmin-EP10.mp3
Welcome to Episode 10
News
https://www.bloomberg.com/news/articles/2017-03-08/microsoft-pledges-to-use-arm-server-chips-threatening-intel-s-dominance
Firefox 52 will be the last version of Firefox for Windows XP and Vista
https://www.cnet.com/news/look-out-windows-android-is-catching-up/
https://www.wired.com/2017/03/atari-chip-set-off-bitter-war-among-neuroscientists/?mbid=nl_3817_p2&CNDID=21798766
http://www.nature.com/nature/journal/v543/n7644/full/nature21371.html
NIST’s new password rules – what you need to know
https://xkcd.com/936/
Announcements
Feedback
@Gangrif and @Xenophage make a great pair that will titillate
ones’s ears! They cover things in the ops and
infosec news categories and topics that are relatable or at least
interesting to discuss. It’s not your typical
format of a podcast, but that is what makes it refreshing.
Keep up the great content guys!
Patreon, you guys are awesome
$10 tier.
The face!
Youtube stream for this episode! https://youtu.be/EeD5y34oKNY
Chat
Main topic
Trouble in the cloud, The 2/28/2017 US East 1 S3 outage
https://aws.amazon.com/message/41926/
An Amazon employee was troubleshooting a problem with their S3
billing mechanisms.
A mistake made in an established playbook, took down systems that
were not intended to be taken down
The downtime which was intended only for billing systems, took down
systems essential in both reads and writes to he S3 API.
This required that some systems be rebooted.
Reboots on the Index and Placement subsystems (two of the systems
mentioned as accidentally rebooted) had not been performed for
years
Due to the dependencies between these systems, the restarts took
quite some time.
The downtime caused some backlog of requests, and these needed to
be processed when the systems were once again operational
Remediation
The core issues here were the amount of systems un-intentionally
taken offline, and the fact that systems that depended on eachother
were taken down at the same time.
Amazon has made changes to their tools to help pervent systems from
dropping below service affecting thresholds.
They are also working to remove some of the inter-dependencies.
On top of everything, the the S3 status page depended on the
health of the S3 service in order to operate.
This made it difficult for customers to view the status of S3.
Intro and Outro music credit: Tri Tachyon, Digital MK 2
http://freemusicarchive.org/music/Tri-Tachyon/