DevOps Engineer(GPU cluster) _BD

2 weeks ago


Shanghai, Shanghai, China Bosch Group Full time
Job Description
  • Wording in an international DevOps team, you will be responsible for the operation and development of the GPU cluster for AI Deep Learning Platform.
  • Development of additional features for the service, such as rollout new software, implementation of new cluster interfaces(e.g. restful API, load balancing)
  • Implementation of performance monitoring (e.g. dashboards)
  • Automation & Deployment (e.g. patch management, integration of new compute nodes into cluster)
  • Preparation and execution of maintenances for all clusters, e.g. for security updates, compatibility testing and rollout.
  • Resolution of user incidents via various channels, e.g. issues with GPU devices or scheduling system, user issues in cluster usage (e.g. access, compute jobs, software management.)
  • Software deployment and maintenance (e.g. new versions)
  • Sysadmin housekeeping tasks (config cleanup, etc.)
  • Build, expand, maintain knowledge base
  • 作为博世全球GPU集群DevOps团队的一员,负责作为AI深度学习平台的GPU集群的持续开发与运维
  • 在现有平台既有服务的基础上,开发新的功能模块(例如,restful API, 负载均衡等)
  • 开发平台的性能监控等功能(例如,可视化面板)
  • 自动化部署(例如,软件包管理,将新增计算节点接入到集群等)
  • 博世全球各个GPU计算集群的运维。例如,安全包更新、兼容性测试、扩容等。
  • 通过各种渠道支持用户,解决可能出现的问题。例如,GPU设备的问题、系统任务编排的问题、以及客户使用集群时可能出现的其他问题(访问、计算任务、软件管理...)。
  • 软件包的开发与运维。例如,新版本迭代。
  • 系统管理员的日常任务。例如,配置项的刷新等。
  • 建立,并持续的维护、丰富共享知识库。
Qualifications
  • Major in Computer Science, Mathematics, Engineering, or relevant technical discipline (bachelor or master)
  • 3+ Years of hands-on experience with Linux and DevOps.
  • Deep knowledge in general Linux server administration, such as Linux system management, networking, security and container technologies.
  • Python for operation automation, Kubernetes and Docker for schedule, MLflow deployment
  • Software development experience is a plus.
  • Know-How in GPU computing domain(CUDA, cuDNN, NCCL, tensorflow, pytorch, CST etc.)
  • Network and GPU Hardware basic knowhow (Network, Performance, Model)
  • Good teamwork and cooperation with global team
  • Quick learner for new data technologies
  • English(Listen/speak/read/write).
  • 计算机科学、数学、工程或相关技术专业(本科或硕士学历)
  • 3年以上在Linux, DevOps方面的实操经验
  • 熟练掌握linux服务器管理,并深入了解相关知识。例如linux系统管理,网络,安全,以及容器技术等
  • 熟悉Python自动化运维相关脚本技术,有Kubernetes and Docker 或者MLflow相关经验
  • 具备GPU计算领域相关知识将作为加分项。例如CUDA, cuDNN, NCCL, tensorflow, pytorch, CST等。
  • 有相关网络及GPU 硬件基本知识比如网络和硬件性能
  • 能与全球团队较好的进行团队合作。
  • 具备快速学习并掌握新数据技术的能力。
  • 英语(听,说,读,写)


  • Shanghai, Shanghai, China Bosch Full time

    Job Title: GPU Cluster DevOps Engineer About the Company: Join an international DevOps team at a leading tech company specializing in AI Deep Learning Platforms. As a GPU Cluster DevOps Engineer, you will play a key role in the operation and development of cutting-edge technology. Job Description: Work in an international DevOps team responsible for GPU...


  • Shanghai, Shanghai, China SAP Full time

    What you'll do You are expected to have practical experience in best-practices and tools and processes in the DevOps space. This is required to be engage in daily DevOps activities which make up a portion of your responsibilities. Other responsibilities include helping to move our product forward to its next generation architecture. Have capability of...


  • Shanghai, Shanghai, China SAP Full time

    We help the world run betterOur company culture is focused on helping our employees enable innovation by building breakthroughs together. How? We focus every day on building the foundation for tomorrow and creating a workplace that embraces differences, values flexibility, and is aligned to our purpose-driven and future-focused work. We offer a highly...

  • DevOps Engineer

    2 weeks ago


    Shanghai, Shanghai, China Goodyear Full time

    Location: CN - Shanghai Goodyear Talent Acquisition Representative: Joa Xu Sponsorship Available: No Relocation Assistance Available: No Primary Purpose of the Position This position is a member of Global IT Digital and Analytics team, reporting to the Cloud services team leader. The incumbent will work in partnership with global & cross functional...

  • DevOps Engineer

    2 weeks ago


    Shanghai, Shanghai, China Goodyear Full time

    Location: CN - Shanghai Goodyear Talent Acquisition Representative: Joa Xu Sponsorship Available: NoRelocation Assistance Available: No Primary Purpose of the Position This position is a member of Global IT Digital and Analytics team, reporting to the Cloud services team leader. The incumbent will work in partnership with global & cross functional...

  • DevOps Engineer

    3 weeks ago


    Shanghai, Shanghai, China Goodyear Full time

    Location: CN - Shanghai Goodyear Talent Acquisition Representative: Joa Xu Sponsorship Available: NoRelocation Assistance Available: No Primary Purpose of the Position This position is a member of Global IT Digital and Analytics team, reporting to the Cloud services team leader. The incumbent will work in partnership with global & cross functional...


  • Shanghai, Shanghai, China Optiver Full time

    WHO WE ARE: Optiver is a global market maker with offices in Amsterdam, London, Chicago, Austin, Sydney, Shanghai, Hong Kong, Singapore and Taipei. Founded in 1986, today we are a leading liquidity provider, with close to 2,000 employees in offices around the world, united in our commitment to improve the market through competitive pricing, execution and...

  • DevOps Engineer

    2 weeks ago


    Shanghai, Shanghai, China Scopely Full time

    DescriptionAt Scopely, we are on the lookout for a talented DevOps Engineer to join our Star Trek Fleet Command team in Shanghai. We are passionate about what we do and aim to bring joy and excitement to our players every single day. Our team consists of dedicated individuals who are passionate about gaming and are at the forefront of developing and...


  • Shanghai, Shanghai, China Mercedes-Benz Full time

    Tätigkeitsbereich: Forschung & Entwicklung, inklusive Design Fachabteilung: Software Quality and Management Gesellschaft: Mercedes-Benz Group China Ltd. Standort: Mercedes-Benz Group China Ltd., Beijing Startdatum: sofort Veröffentlichungsdatum: ..4 Stellennummer: MERXE Arbeitszeit: Vollzeit Aufgaben Responsible for building and maintaining CICD...

  • DevOps Engineer

    2 weeks ago


    Shanghai, Shanghai, China Scopely Full time

    Scopely is looking for a DevOps Engineer to join our Star Trek Fleet Command team in Shanghai.At Scopely, we care deeply about what we do and want to inspire play, every day - whether in our work environments alongside our talented colleagues, or through our deep connections with our communities of players. We are a global team of game lovers who are...

  • DevOps Engineer

    4 weeks ago


    Shanghai, Shanghai, China Scopely Full time

    Scopely is looking for a DevOps Engineer to join our Star Trek Fleet Command team in Shanghai.At Scopely, we care deeply about what we do and want to inspire play, every day - whether in our work environments alongside our talented colleagues, or through our deep connections with our communities of players. We are a global team of game lovers who are...


  • Shanghai, Shanghai, China RELX Full time

    About the Role The Senior DevOps Engineer performs complex research, design, and software development assignments within a software functional area or product line, Working as part of an application team to build and maintain the right cloud infrastructure architecture, balancing performance and resilience with cost. Responsibilities Working within an...


  • Shanghai, Shanghai, China Carrier Full time

    Country: China Location: LOC3254: No.3239 Shenjiang Road, Shanghai, Pudong New Area, Shanghai, China 角色职责: As a Senior Computational Engineer, you provide technical expertise in the area of scientific computing and promote the best practice of DevOps and CI/CD. You engage with global product teams and work with modeling and optimization...

  • Senior HPC Engineer

    2 weeks ago


    Shanghai, Shanghai, China NVIDIA Full time

    NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It's a unique legacy of innovation that's fueled by great technology—and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots,...

  • Digital Engineer

    2 weeks ago


    Shanghai, Shanghai, China ThermoFisher Scientific Full time

    Work ScheduleStandard (Mon-Fri)Environmental ConditionsOfficeJob DescriptionWe are seeking a dedicated Full Stack Engineer to join our China Digital Engineering team. In this role, you will primarily focus on backend and devops technologies, playing a key role in delivering various digital products in the life science and laboratory industry at Thermofisher....

  • Senior Data Scientist

    2 weeks ago


    Shanghai, Shanghai, China SAP Full time

    Unleash Your PotentialSAP innovations are utilized by over four hundred thousand clients globally to enhance collaboration and leverage business insights effectively. Initially renowned for its expertise in enterprise resource planning (ERP) software, SAP has transformed into a leading provider of comprehensive business application software and associated...

  • Engineering Manager

    2 weeks ago


    Shanghai, Shanghai, China Maersk Full time

    Introduction to Maersk: A.P. Moller - Maersk is an integrated container logistics company that is responsible for moving 20% of global trade every year. With a dedicated team of over 100,000 employees across 130 countries, we go all the way to connect and simplify global trade, and help our customers grow and thrive. Maersk's vision is to be the global...


  • Shanghai, Shanghai, China Luxoft Full time

    Project descriptionLuxoft is one of the major software services companies world-wide. In particular, we develop high quality software in automotive industry for most famous car makers. The software inside a vehicle was traditionally expected to be a very controlled and self-contained environment. Equipping cars with perception and machine intelligence...


  • Shanghai, Shanghai, China Third Bridge Full time

    Job Description Product & Technology Overview Our Chief Information Officer (CIO) leads the technology function, and our Chief Product and Data Officer (CPDO) leads the product function. We invest heavily in product, data, and technology capabilities, enabling us to deliver innovative products, solutions, and deep market intelligence to our clients. ...


  • Shanghai, Shanghai, China RELX Full time

    About the Role The Software Engineering Lead performs complex research, design, and software development assignments within a software functional area or product line, and provides direct input to project plans, schedules, and methodology in the development of cross-functional software products. This Lead performs software design - typically across...