Lead Systems Admin (DataDog Admin)
Job description
Position: Lead Systems Admin Location: 100% Remote Job Type: 9 Months Contract (Long-term role with extensions) Hiring Manager Notes:
- Straight contract role.. Preference for someone local who can go to the office when needed but open to those who are not local as long as they can support EST hours.
- Using Datadog as primary tool and BigPanda (nice to have) for centralized Event Mgmt.
- First initiative: Datadog is in GovCloud now and plans to move to commercial Cloud within Datadog.
- Second initiative: Moving all claims apps from on-prem to Cloud (AWS). Configuring Datadog in AWS platform.
- Everything runs on Linux. Needs to be able to install, configure, etc all things involved w/ Datadog...
- This resource should be able to configure all of the functionalities within Datadog (Infrastructure monitoring, Application performance monitoring, Log Mgmt, Real User Monitoring and Security Monitoring, .
Skills required:
- Deep expertise in administration of Datadog (both on-prem and in AWS)
- Deep AWS exp
- ServiceNow integration exp (specifically ITOM), only highly preferred.
Interview Process: Initial round w/ HM followed by a panel (Virtual interview) Job description: Job Summary: We are seeking a seasoned Lead Systems Engineer with deep expertise in Datadog, AWS, and ServiceNow integration. In this leadership role, you will oversee the design, implementation, and maintenance of comprehensive monitoring, observability, and incident management solutions for cloud-based infrastructure and applications. You will play a key role in guiding the team to ensure operational excellence, system reliability, and seamless collaboration across IT and engineering teams. Responsibilities:
- Lead the architecture, design, and implementation of end-to-end monitoring solutions using Datadog, ensuring high availability and performance of cloud-based services.
- Oversee the deployment and management of AWS resources (EC2, RDS, Lambda, ECS/EKS, S3, etc.), ensuring adherence to best practices for scalability, security, and cost optimization.
- Define monitoring strategies and best practices, including Datadog dashboards, monitors, alerts, and custom metrics for comprehensive observability.
- Architect and manage the integration of Datadog with ServiceNow to automate incident management workflows, event correlation, and CMDB synchronization.
- Provide technical leadership and mentorship to junior engineers on best practices for monitoring, logging, and observability.
- Collaborate with cross-functional teams to integrate monitoring and logging into CI/CD pipelines and cloud infrastructure.
- Drive continuous improvement in system reliability, including SLO/SLI definitions, synthetic monitoring, and anomaly detection.
- Contribute to and enforce Infrastructure as Code (IaC) standards using Terraform, CloudFormation, or similar tools.
- Participate in high-severity incident management, root cause analysis, and the implementation of corrective actions to prevent future occurrences.
Dexian is a leading provider of staffing, IT, and workforce solutions with over 12,000 employees and 70 locations worldwide. As one of the largest IT staffing companies and the 2nd largest minority-owned staffing company in the U.S., Dexian was formed in 2023 through the merger of DISYS and Signature Consultants. Combining the best elements of its core companies, Dexian's platform connects talent, technology, and organizations to produce game-changing results that help everyone achieve their ambitions and goals. Dexian's brands include Dexian DISYS, Dexian Signature Consultants, Dexian Government Solutions, Dexian Talent Development and Dexian IT Solutions. Visit https://dexian.com/ to learn more. Dexian is an Equal Opportunity Employer that recruits and hires qualified candidates without regard to race, religion, sex, sexual orientation, gender identity, age, national origin, ancestry, citizenship, disability, or veteran status.
|