DataLake AI Platform Operation Engineer

3 days ago


Shanghai, China SAP Full time

 We help the world run better

At SAP, we enable you to bring out your best. Our company culture is focused on collaboration and a shared passion to help the world run better. How? We focus every day on building the foundation for tomorrow and creating a workplace that embraces differences, values flexibility, and is aligned to our purpose-driven and future-focused work. We offer a highly collaborative, caring team environment with a strong focus on learning and development, recognition for your individual contributions, and a variety of benefit options for you to choose from.

 

We are seeking a skilled and motivated individual to join our team as a DataLake AI Platform Operations Engineer. This role focuses on Cloud Infrastructure, Kubernetes (K8S), and Machine Learning, as well as AI Model Training tooling solutions. In this position, you will be responsible for setting up and managing AI and general computing infrastructure connected to an OpenStack-based private cloud, provisioning cloud resources from IaaS, implementing various service components to support distributed model training tasks and productive use-case serving instances across K8S clusters, and overseeing the runtime metrics of each component while continuously optimizing them.

 

 

What You'll Do:

----------------

  • Infrastructure Operation: Utilize OpenStack-based IaaS resources and optimize their provisioning to ensure efficient infrastructure operations.
  • Cross-Node Resource Management: Manage Kubernetes clusters across different regions and availability zones, ensuring optimal performance for use-cases and shared services while minimizing resource consumption.
  • Logging, Auditing, and Metrics: Implement distributed logging solutions using Loki and OpenSearch. Configure auditing for each use-case and collect Prometheus-based metrics from both platform services and use-cases.
  • Dashboarding and Monitoring: Develop dashboards tailored to specific needs and monitor the platform using the dashboard tools you create.
  • Support Platform Use-Cases: Assist use-case development teams in maximizing the platform's capabilities for their projects.
  • TCO Management: Automate the calculation of the total cost of ownership for platform infrastructure and licenses, and allocate these costs to each specific use-cases.
  • Collaboration, Documentation, and Training: Collaborate with peers across regions to support various projects, document new changes, and provide training to platform users.

 

 

 

What You Bring:

----------------

  • Bachelor's degree in Computer Science, Engineering, or a related field; advanced degrees are a plus.
  • Basic understanding of GPU-based computing concepts, and familiarity with AI/ML frameworks and tools such as CUDA, Kubeflow, Spark, or PyTorch.
  • Solid knowledge of Kubernetes and container orchestration concepts.
  • Proficiency in coding languages (e.g., Python, Go, Shell) for automation and infrastructure management.
  • Proven experience in infrastructure and operations management for cloud service solutions.
  • Strong problem-solving skills and the ability to diagnose and resolve complex technical issues.
  • Excellent communication and collaboration skills to work effectively with cross-functional teams.
  • Strong attention to detail and the ability to manage multiple priorities in a fast-paced environment.

 

Join our dynamic team and contribute to cutting-edge solutions in AI and cloud infrastructure

 

 

 

Bring out your best

SAP innovations help more than four hundred thousand customers worldwide work together more efficiently and use business insight more effectively. Originally known for leadership in enterprise resource planning (ERP) software, SAP has evolved to become a market leader in end-to-end business application software and related services for database, analytics, intelligent technologies, and experience management. As a cloud company with two hundred million users and more than one hundred thousand employees worldwide, we are purpose-driven and future-focused, with a highly collaborative team ethic and commitment to personal development. Whether connecting global industries, people, or platforms, we help ensure every challenge gets the solution it deserves. At SAP, you can bring out your best.

We win with inclusion

SAP’s culture of inclusion, focus on health and well-being, and flexible working models help ensure that everyone – regardless of background – feels included and can run at their best. At SAP, we believe we are made stronger by the unique capabilities and qualities that each person brings to our company, and we invest in our employees to inspire confidence and help everyone realize their full potential. We ultimately believe in unleashing all talent and creating a better and more equitable world.
SAP is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to the values of Equal Employment Opportunity and provide accessibility accommodations to applicants with physical and/or mental disabilities. If you are interested in applying for employment with SAP and are in need of accommodation or special assistance to navigate our website or to complete your application, please send an e-mail with your request to Recruiting Operations Team: Careers@sap.com
For SAP employees: Only permanent roles are eligible for the SAP Employee Referral Program, according to the eligibility rules set in the SAP Referral Policy. Specific conditions may apply for roles in Vocational Training.

EOE AA M/F/Vet/Disability:

Qualified applicants will receive consideration for employment without regard to their age, race, religion, national origin, ethnicity, age, gender (including pregnancy, childbirth, et al), sexual orientation, gender identity or expression, protected veteran status, or disability.
Successful candidates might be required to undergo a background verification with an external vendor.

Requisition ID: 398759  | Work Area: Software-Development Operations  | Expected Travel: 0 - 10%  | Career Status: Professional  | Employment Type: Regular Full Time   | Additional Locations: #LI-Hybrid.



  • Shanghai, China SAP Full time

     We help the world run better At SAP, we enable you to bring out your best. Our company culture is focused on collaboration and a shared passion to help the world run better. How? We focus every day on building the foundation for tomorrow and creating a workplace that embraces differences, values flexibility, and is aligned to our purpose-driven and...


  • Shanghai, China NVIDIA Full time

    NVIDIA is hiring distributed systems and structured data platform engineers to design and develop our exa-scale AI infrastructure and deep learning platform for Autonomous Vehicles. Together, we will build the exa-scale software 2.0 cloud platform for one of the most ambitious problems of our time: autonomous vehicles. Then we will apply it to other...

  • Senior AI Engineer

    2 weeks ago


    Shanghai, Shanghai, China Thermo Fisher Scientific Full time

    : Explore New Capabilities: Stay updated with the latest advancements in OpenAI and other LLMs, including China Local AI. Conduct research and experiments to identify new capabilities and potential applications for our organization. Evaluate the feasibility and impact of integrating these technologies into our existing systems. Collaboration: Work...

  • AI Solutions Engineer

    3 weeks ago


    Shanghai, China Thermo Fisher Scientific Full time

    : Explore New Capabilities: Stay updated with the latest advancements in OpenAI and other LLMs, including China Local AI. Conduct research and experiments to identify new capabilities and potential applications for our organization. Evaluate the feasibility and impact of integrating these technologies into our existing systems. Collaboration: Work...


  • Shanghai, Shanghai, China Faurecia Full time

    Job Description Overall responsibilities and duties: The Digital & AI Specialist is a key role in the journey of Forvia digital transformation strategy, as he/she will leverage the Data tools(low-code & big data) and AI platform to create innovative and customized solutions in various business scenarios by working with partners. The main missions...


  • Shanghai, China Faurecia Full time

    Job Description Overall responsibilities and duties: The Digital & AI Specialist is a key role in the journey of Forvia digital transformation strategy, as he/she will leverage the Data tools(low-code & big data) and AI platform to create innovative and customized solutions in various business scenarios by working with partners. The main...


  • Shanghai, China NVIDIA Full time

    We are now looking for a Senior AI Training Performance Engineer!NVIDIA is seeking senior engineers who are obsessed with performance analysis and optimization to help us squeeze every last clock cycle out of AI training, one of the most important workloads in the world. If you are unafraid to work across all layers of the hardware/software stack from GPU...


  • Shanghai, Shanghai, China Signify Netherlands B.V. Full time

    We're looking for a generative AI engineer to join our AI team in Shanghai.Working for Signify means being creative and adaptive. Our culture of continuous learning and commitment to diversity and inclusion creates an environment that allows you to build your skills and career. Together, we're transforming our industry.As the world leader in lighting, we're...


  • Shanghai, China NVIDIA Full time

    We are now looking for a TensorRT Software Development Engineer!NVIDIA is hiring software engineers for its AI Computing team. Academic and commercial groups around the world are using GPUs to power a revolution in deep learning-powered AI, enabling breakthroughs in areas like LLM, ChatGPT and GenerativeAI that has put DL at the “iPhone moment” for AI....


  • Shanghai, China NVIDIA Full time

    NVIDIA is hiring distributed systems and system security engineers to design and develop our exa-scale AI infrastructure and deep learning platform for Autonomous Vehicles. Together, we will build the exa-scale software 2.0 cloud platform for one of the most ambitious problems of our time: autonomous vehicles. Then we will apply it to other applications such...


  • Shanghai, China NVIDIA Full time

    We are now looking for an AI Developer Technology Engineer Intern, CUDA. Intelligent machines powered by AI computers that can learn, reason and interact with people are no longer science fiction. Today, a self-driving car can meander through a country road at night and find its way. An AI-powered robot can learn motor skills through trial and error. This is...


  • Shanghai, Shanghai, China Qualcomm Full time

    Company: Qualcomm China Job Area: Engineering Group, Engineering Group > Software Applications Engineering General Summary: Main Responsibilities: Option 1: Deep Learning Models Compiling, Algorithm optimization, Performance Benchmark, AI application intergration, AI framework Intergeration, Graph and Backend Compiler Development. Option 2:...


  • Shanghai, Shanghai, China NVIDIA Full time

    NVIDIA is hiring distributed systems and data engineers to design and develop our exa-scale AI infrastructure for ingesting, indexing and managing data of Autonomous Vehicles. Together, we will build the exa-scale software 2.0 cloud platform for one of the most ambitious problems of our time: autonomous vehicles. Then we will apply it to other applications...

  • Principal Architect

    4 weeks ago


    Shanghai, China Airwallex Full time

    Airwallex is the leading financial technology platform for modern businesses growing beyond borders. With one of the worlds most powerful payments and banking infrastructure, our technology empowers businesses of all sizes to accept payments, move money globally, and simplify their financial operations, all in one single platform. Established in 2015,...


  • Shanghai, Shanghai, China Amazon Full time

    Job Description Are you excited about working with cutting-edge Generative AI algorithms to solve real-world problems? Join the Generative AI Innovation Center at AWS, where you'll collaborate with a team of strategists, data scientists, engineers, and solution architects to build bespoke solutions using the power of generative AI. Key Job...


  • Shanghai, China NVIDIA Full time

    NVIDIA is hiring distributed systems and data engineers to design and develop our exa-scale AI infrastructure for ingesting, indexing and managing data of Autonomous Vehicles. Together, we will build the exa-scale software 2.0 cloud platform for one of the most ambitious problems of our time: autonomous vehicles. Then we will apply it to other applications...

  • AI Research Scientist

    2 weeks ago


    Shanghai, Shanghai, China Intel Full time

    Job Description We are seeking a highly motivated AI Research Scientist to join our team (Vision and AI Lab at Intel Labs China), focusing on AI Scaling and Generative AI (GenAI) Tech Innovation. The ideal candidate will have a deep understanding of AI, machine learning (ML), and neural networks, with a focus on scaling AI technologies and GenAI...


  • Shanghai, Shanghai, China Ford Motor Company Full time

    MMOTA (Multi-Modules Over the air update) was a service running over Ford VSU(Vehicle Software Update) Platform; This position is an engineer role, be responsible for OTA Projects Execution/Delivery/Management Work as Deployment Engineer to manage and update OTA Project Milestone Deliver status by Daily work. Take technical actions via OTA Platform Portal...

  • AI Research Scientist

    4 weeks ago


    Shanghai, China Intel Full time

    Job Description We are seeking a highly motivated AI Research Scientist to join our team (Vision and AI Lab at Intel Labs China), focusing on AI Scaling and Generative AI (GenAI) Tech Innovation. The ideal candidate will have a deep understanding of AI, machine learning (ML), and neural networks, with a focus on scaling AI technologies and GenAI...


  • Shanghai, China Ford Motor Company Full time

    MMOTA (Multi-Modules Over the air update) was a service running over Ford VSU(Vehicle Software Update) Platform; This position is an engineer role, be responsible for OTA Projects Execution/Delivery/Management Work as Deployment Engineer to manage and update OTA Project Milestone Deliver status by Daily work. Take technical actions via OTA Platform Portal...