Job Overview

Title:

Communication Libraries Engineer

Description:

company overview:

at turiyamai, we are pioneering world leading genai semiconductor solutions from india, for india and the world. our breakthrough solutions are set to redefine the future of ai computing, driving unparalleled efficiency, performance, and accessibility for enterprises worldwide.


job description:

we are looking for an experienced individual to join our ai accelerator communications sw team to build hyper-optimized cluster networking solutions. you will contribute to architecture and design of our networking and collective communication software. you must be passionate about optimizing networking and communication performance at scale.


responsibilities:

  • design collective communication software libraries in c++, assembly, python for datacenter ai
  • stay abreast of bleeding edge collective algorithms for wide variety of network topologies to implement in our communication libraries
  • help to hyper-optimize distributed computing algorithms, including compute communication overlap
  • analyse communication bottlenecks for ai workloads and guide better system design
  • collaborate with hardware and software architects and system engineers to hyper-optimize our ai systems deployment


requirements:

  • bachelor's or master's or ph.d. degree in computer science, engineering, or a related field.
  • 3+ years' experience developing hyper-optimized model c++ code
  • experience with one or more of the following:
  • implementing communication middleware like mpi/shmem
  • development and optimization of communication collective algorithms (e.g. allreduce, allgather, scatter, gather, etc.)
  • implementing lower-level communication frameworks like ucx and libfabric, or development using rdma apis
  • experience with gpu collective libraries like nccl, gpu optimized mpi, etc.
  • experience in software performance evaluations, optimizations and debugging
  • excellent problem-solving skills and the ability to work independently as well as part of a fast paced team in a startup environment.
  • strong communication skills to effectively convey technical concepts to non-technical stakeholders.


preferred qualifications / experience:

  • experience developing communication algorithms for large scale cpu/gpu/accelerator clusters is a big plus

experience architecting and developing communication software solutions for ai accelerators using rdma and proprietary communication fabrics, from device drivers through os layers to applications and ai/ml frameworks

familiarity with python programming and pytorch is a plus


benefits:

competitive salary and benefits package.

opportunity to work on cutting-edge ai technology.

collaborative, dynamic and inclusive work environment.

professional growth and development opportunities.


how to apply: interested candidates are invited to submit their resume and a cover letter detailing their relevant experience and why they are a good fit for this role to


Salary:

$795711-$1199022 Annual

Company:

Turiyam AI

Location:

Bangalore, Karnataka, India