AlpharettaRecruiter Since 2001
the smart solution for Alpharetta jobs

Principal Site Reliability Engineer

Company: Morgan Stanley
Location: alpharetta
Posted on: May 3, 2021

Job Description:


The Principal Site Reliability Engineer will be responsible for understanding our core technology, developing a repeatable, automated resolution process and having the right talent/skillset to develop and improve the process. Responsible managing where the main elements of incidents converge and automate recovery of future incidents in our test environments. Collaborate and communication across multiple departments is a must for this role. If you are highly motivated and goal oriented, can handle interruptions while fluidly switching between several projects, and have an automation approach to solving problems, this job will be ideal.


Long-term service reliability in test environments, increasing the odds that when a problem gets fixed, it stays fixed

Enable quicker response and resolution, and repeatable workflows, using automation, that accelerate the remediation process

Improve service observability -- set SLOs, SLAs, and SLIs; working with product teams and technology teams alike.

Own end-to-end availability of key services and build automation to prevent problem recurrence.

Assist in incident action items for control breaks to ensure issues do not result in repeat incidents.

Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.

Automate response to all non-exceptional service conditions.

Lead by example, mentor the team and establish credibility through quality technical execution.

Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.

Encourage and minimize manual systems work to focus on efforts that bring long-term value to the system.

Evaluate potential failures and their effects on the system.

Develop and deploy operational test cases to catch issues in lower environments.


7 years of experience in the following areas:

Algorithms, data structures, complexity analysis and software design.

One or more of the following: C, C , Java, Python, Go, Perl, Ruby

Designing, analyzing and troubleshooting large-scale distributed systems.

Systematic problem-solving approach coupled with effective communication skills and a sense of ownership and drive.

Debug and optimize code and to automate routine tasks.

Test driven development concepts.

General security concepts as well as secure coding practices.

Excellent communication skills in both verbal and written English.


A degree in computer science or a related field

Proficient with Linux OS.

Proficient with one or more of public cloud deployments: (AWS, GCP, Azure).

Excellent communication skills in both verbal and written English.

Morgan Stanley is an equal opportunity employer committed to diversifying its workforce (M/F/Disability/Vet).


Keywords: Morgan Stanley, Alpharetta , Principal Site Reliability Engineer, Other , alpharetta, Georgia

Click here to apply!

Didn't find what you're looking for? Search again!

I'm looking for
in category

Log In or Create An Account

Get the latest Georgia jobs by following @recnetGA on Twitter!

Alpharetta RSS job feeds