1+ months

Site Reliability/DevOps Engineer - Cisco Digital Network Architecture 1290985

Cisco Systems Inc.
San Jose, CA 95113
Apply Now
Apply on the Company Site

What You'll Do  

You are deeply motivated Site Reliability engineer with background in DevOps/SRE software development and operations. Ideal candidate must have experience building, shipping and operating software-as-a-service (SaaS) product. Ideal candidate would have managed such products using Cloud Native Principles and exposed to cloud technologies. This position will enable Continuous Monitoring & Management of infrastructure while providing timely response within designated SLA times to service effecting faults and performance issues. As an SRE you will work closely with our Managed Services Team to diagnose & characterize issues to provide continuous improvement and to develop infrastructure best practices. As SRE you will be driven to build highly scalable, fault-tolerant, and easy to administer infrastructure. You must be pro-active and organized, diligent about documentation, and passionate about monitoring and automating everything.

Who You'll Work With  

Cisco is transforming the networking industry. To make this happen, we are heavily investing in team responsible for The Network. Intuitive. We are disrupting the industry by building a new networking platform that can learn, adapt, and secure itself at the speed of todays businesses. This Digital Network Architecture platform automates network management and provides our customers with state-of-the-art analytics and insights. This team's innovations span articial intelligence, machine learning, analytics, IoT, security, automation, and more.   Who You Are   This role is primarily to apply your SRE skills to create complete self-serve Software Delivery Machine. The targeted platform will support vast number of cloud and hybrid customers.  The candidate is expected to have strong hands-on skills and will guide and contribute technically to the infrastructure engineering.  

Develop full-fledged software tooling to deliver programmable infrastructure (infrastructure as code) Develop tooling to drive end-to-end micro-services monitoring and management Implement Kubernetes compliance and best practices in terms of security, audits, network policies, reporting Develop Self-service Console to provide infrastructure visibility

 

Responsibilities   

Manage the availability, scalability and performance of the Infrastructure platforms   

Create the tools and infrastructure leveraged by the rest of the engineering teams     

Diagnose and repair network, application, and hardware bottlenecks     

Test and tune network, hardware, and software congurations to maximize performance     

Deploy and manage monitoring and diagnostic tools     

Monitoring systems, databases and networks for proper operation and performance  

Providing a 724 on call support for the operations infrastructure

Create and maintain continuous integration (CI) and continuous deployment (CD) environments to facilitate an agile development process.  Work is generally expected to take place during normal working hours however the Platform Operations Team provides Tier2 and Tier3 7x24x365 on call escalation and candidates should be exible with schedules to meet the needs and demands of the business.

Qualifications  

Strong knowledge of core Enterprise LINUX (Red Hat/CentOS) with a focus upon building, maintaining, securing and performance tuning systems

Proven experience capacity planning, performance tuning, and infrastructure architecture. Experience scaling web, application, and data systems horizontally and vertically    

Experience with K8S and other virtual infrastructure platforms  

High-level shell uency + one or more scripting languages ( Python, Go, Perl, or similar )    

Experience with system automation using Ansible     

Experience with monitoring, alerting, and pipeline analysis tools     

Experience with queuing/data-pipelining  

Experience with SQL/NoSQL systems such as PostgresSQL, MySQL, Cassandra, or Redis    

Experience in the development of operational procedures, processes, and scripts  

The candidate expected to have strong hands-on skills and will guide and contribute technically to the product

BS/MS in Computer Science or related area Four or more years of relevant work experience Hands on experience working with Kubernetes infrastructure Kubernetes Certification is highly preferred Expert understanding of Kubernetes internals (clustering, scheduling, controllers, API server, etc. Very good understanding of container networking Very good software programming skills using Go/Python/YM Excellent understanding of microservices architecture Experience with Kubernetes monitoring tools (prometheus)

 Why Cisco  

At Cisco, each person brings their unique talents to work as a team and make a difference. Yes, our technology changes the way the world works, lives, plays and learns, but our edge comes from our people.   We connect everything people, process, data and things and we use those connections to change our world for the better.   We innovate everywhere - From launching a new era of networking that adapts, learns and protects, to building Cisco Services that accelerate businesses and business results. Our technology powers entertainment, retail, healthcare, education and more from Smart Cities to your everyday devices.   We benet everyone - We do all of this while striving for a culture that empowers every person to be the difference, at work and in our communities.


*LI-IS1

Posted: 2020-08-28 Expires: 2020-12-03
Sponsored by:
ADP Logo

Before you go...

Our free job seeker tools include alerts for new jobs, saving your favorites, optimized job matching, and more! Just enter your email below.

Share this job:

Site Reliability/DevOps Engineer - Cisco Digital Network Architecture 1290985

Cisco Systems Inc.
San Jose, CA 95113
Tweet
Facebook Share
Copy Job URL

Join us to start saving your Favorite Jobs!

Sign In Create Account
Powered ByCareerCast