← All Positions
Posted Mar 26, 2026

Bare Metal Support Engineer

Apply Now
CoreWeave is The Essential Cloud for AI™, delivering a platform that enables innovators to build and scale AI with confidence. As a Bare Metal Support Engineer, you will support, operate, and maintain CoreWeave’s GPU fleet, ensuring reliability and performance while collaborating with customers and engineering teams. Responsibilities - Provide high-level support for customers utilizing bare-metal GPU fleets on CoreWeave Cloud - Diagnose, triage, and investigate reported customer issues and high-priority incidents, identifying root causes and escalating when necessary - Develop a deep understanding of customer workloads and use cases to provide tailored technical support - Coordinate remote troubleshooting and hardware interventions with Data Center Technicians - Create and maintain internal documentation, including troubleshooting guides, best practices, and knowledge base articles - Participate in an on-call rotation to support production clusters and ensure operational reliability - Collaborate with engineering teams to improve hardware reliability, software stability, and system performance - Implement automation and scripting to streamline support workflows and reduce manual interventions - Perform in-depth log analysis and debugging across multiple layers of the stack (firmware, drivers, hardware) - Provide feedback to internal teams on common support issues to drive continuous improvements - Work with networking teams to troubleshoot connectivity issues affecting customer workloads - Support supercomputing infrastructure running GPU workloads at scale - Drive operational excellence by refining internal processes and support methodologies Skills - Experience in data centers, GPU clusters, server deployments, system administration, or hardware troubleshooting - Demonstrated experience driving resolutions and continuous improvements across cross-functional environments and teams within a data center environment - Intermediate knowledge of Linux (Ubuntu, CentOS, or similar), including command-line proficiency - Experience with NVIDIA GPUs, SuperMicro systems, Dell systems, high-performance computing (HPC), and large-scale data center environments - Experience in networking fundamentals (TCP/IP, VLANs, DNS, DHCP) and troubleshooting tools - Hands-on experience with firmware updates, BIOS configurations, and driver management - Experience analyzing system logs and debugging issues across firmware, drivers, and hardware layers - Experience working with Jira, Confluence, Notion, or other issue-tracking and documentation platforms - Experience in scripting and automation (Python, Bash, Ansible, or similar) - You're curious about Kubernetes, Docker, and containerized infrastructure - You have strong problem-solving skills with a proactive and analytical mindset - You have excellent communication skills and a demonstrated ability to work collaboratively in a fast-paced environment Benefits - Medical, dental, and vision insurance - 100% paid for by CoreWeave - Company-paid Life Insurance - Voluntary supplemental life insurance - Short and long-term disability insurance - Flexible Spending Account - Health Savings Account - Tuition Reimbursement - Ability to Participate in Employee Stock Purchase Program (ESPP) - Mental Wellness Benefits through Spring Health - Family-Forming support provided by Carrot - Paid Parental Leave - Flexible, full-service childcare support with Kinside - 401(k) with a generous employer match - Flexible PTO - Catered lunch each day in our office and data center locations - A casual work environment - A work culture focused on innovative disruption Company Overview - CoreWeave is a cloud-based AI infrastructure company offering GPU cloud services to simplify AI and machine learning workloads. It was founded in 2017, and is headquartered in Livingston, New Jersey, USA, with a workforce of 1001-5000 employees. Its website is https://www.coreweave.com.