Systems Reliability Engineer
Company: The Walt Disney Company
Posted on: November 18, 2022
Systems Reliability Engineers use a software engineering
approach to architect, design, automate, monitor, and build
applications at scale. This includes operating and engineering
software with close business segment alignment to deliver platforms
through efficient, effective and resilient architectures. SREs are
talented engineers that are focused on improving quality through a
data driven approach: instrumentation, automation, and
functional/unit testing.Responsibilities :
- The SRE will help create, build and deliver new technologies or
platforms. - This will include consultation, designing, building,
and supporting development pipelines, automating infrastructure and
operations, creating telemetry for monitoring, engineering high
reliability and reinforcing best practices to secure our company
and guest data.
- Have expert level systems administration skills on both the
Linux and Windows platforms
- Work with CI/CD platforms (Gitlab CI or Jenkins), strong
systems development (Go, Python, Ruby, Node) and cloud automation
tools (Boto, CloudFormation, Terraform), source control, cloud
hosting, container computing, web technologies
- Maintain expertise on systems, operational excellence and
application stability, security, performance, and capacity
management, as well as documentation.
- Work closely with development teams across Disney to
brainstorm, architect, gather requirements, troubleshoot, and
provide stellar customer support
- Be prepared to work in an extremely collaborative and
high-energy environment. -
- Lead project/planning efforts, architectural design,
engineering, attending meetings w/ various teams.
- Implement, integrate and configure solutions, tools,
infrastructure and systems.
- Provide systems administration and application support - -
Level 2 & 3 maintenance and supportBasic Qualifications :
- Understand how to install and configure operating systems,
specifically with expertise in Linux and Windows Server.
- Software Development Continuous Integration (CI) Pipeline
knowledge (GitLab CI, Github Actions)
- Experience with Source Control Management systems (Git)
- Experience in public and private cloud hosting services (AWS,
Google Cloud, Azure, OpenStack, CloudStack) as well as familiarity
with container computing (eg. Docker, ECS, Kubernetes,
- Experience as a subject matter expert on at least one OS and
proficient in multiple operating systems, including OS performance
monitoring, setup, configuration, tuning, and troubleshooting.
- Proficiency in web or web server technologies: - Java, Node.js,
Tomcat, IIS, Apache/nginx, MySQL, PostgreSQL, etc., including being
able to perform basic setup, configuration, and
- Understanding of internet technologies and network protocols,
including HTTP, basic load balancing configurations, security
zones, VIPs, SNMP, REST and DNS.
- Ability to implement existing base standards for new systems
and/or applications with mentoring for all of the following:
- Site monitoring and instrumentation
- Application monitoring and instrumentation
- System monitoring and instrumentation
- Resiliency and performance
- Able to diagnose simple to complex system problems.
- Has experience on one or more load balancer platforms (setting
up pools, VIPs, layer 7 routing, debugging).
- Able to author tools and scripts to be used by others to
automate repeatable production tasks in standard languages like
Bash, Ruby, Python, or Go.
- Advanced skills in at least one programming language such as
Python, PHP, Ruby, Java, Go, Swift or C++ and able to build unit
test suites for all software being developed.
- Experience supporting and/or developing backend tools or
- Able to perform and provide in depth analysis on load test runs
against a moderately complex system.
- Demonstrates exceptional troubleshooting methodology, including
the ability to author and instruct new methodologies to the SRE
- Independently resolve moderately to highly complex system and
- Able to identify and propose system and application fixes for
- Able to evaluate new application requirements for capacity and
run-time best practices.
- Able to evaluate new system and/or infrastructure solutions for
technical feasibility against known requirements and
- Effective at dealing with change: Able to transition in role or
handle a significant modification to workflow or technology with
minimal ramp-up time and with very little guidance.
- Excellent verbal and written communication to all levels in the
- Serves as primary point of contact with Manager.
- Demonstrates curiosity and continuous learning and
- Ability to lead functional teams in systems integration and
design including writing operational specs, architectural diagrams,
test plans and requirements management.
- Effective project management and planning on large-scale
projects (familiarity with agile/scrum and water-fall project
management a plus).
- Construction of concise and complete technical documentation
and the ability to design and deliver training to other staff
- Detailed understanding of the goals and requirements of the
business supported.Required Education :Bachelor of Science degree
in computer science or related field or equivalent experience in
technical operations and software engineering
Keywords: The Walt Disney Company, Burbank , Systems Reliability Engineer, Other , Burbank, California
Didn't find what you're looking for? Search again!