AI/ML Inference Kernel Engineer Intern
Remote · Full Time internship
درخواست دینے والے پہلے فرد بنیں۔
- وظیفہ
- Stipend: INR 10,000 – INR 25,000 / month
- دورانیہ
- 6 months
- شروع کریں۔
- فوراً
- کھلنا
- 3
Available for a remote internship, able to start between 29 June 2026 and 3 August 2026, available for a 6-month duration, and possessing relevant skills and interest.
- Work mode
- گھر سے کام کریں۔
- Resume
- Required to apply
انٹرنشپ کے بارے میں
About the Company
Hubnine India Private Limited is a software company focused on using modern AI and machine learning approaches to create better outcomes for clients.
About the Internship
This internship is for an ML Systems & Inference Engineer who will help build the core technical layer of a platform designed for Physical AI and advanced multimodal workloads. The work centers on model serving, GPU-oriented systems, performance tuning, and cloud infrastructure.
The position is not meant for a generic backend profile or a pure research profile. The selected intern should be comfortable reading model code, profiling execution, finding bottlenecks, implementing optimizations, validating the impact, and delivering dependable infrastructure solutions.
What You’ll Work On
- Designing and improving cloud inference workflows for Physical AI, multimodal, generative, simulation, and world-model use cases.
- Raising performance across startup time, queue latency, end-to-end latency, throughput, GPU utilization, reliability, and cost per output or job.
- Building execution-layer components such as model packaging, warm pools, artifact and model caching, batching, queueing, scheduling, and model-aware execution rules.
- Using methods like dynamic batching, quantized model variants, torch.compile, TensorRT, ONNX Runtime, caching, routing, and distributed execution to improve runtime efficiency.
- Finding bottlenecks in GPU compute, memory bandwidth, CPU preprocessing, I/O, model loading, serialization, queueing, and serving overhead.
- Creating benchmarking and evaluation tools to track latency, throughput, startup time, memory use, GPU utilization, cost, reliability, and workload quality.
- Turning runtime telemetry into product features such as performance reports, cost visibility, configuration suggestions, and workload comparisons.
- Developing reusable model adapters, container images, and serving templates to speed up onboarding for new model families.
- Working with founders, design partners, and early customers to convert real use cases into product requirements and technical plans.
- Partnering with GPU performance experts to assess and confirm runtime and kernel-level optimizations.
- Helping shape engineering standards, technical hiring criteria, and the architecture of the core platform.
Eligibility
Applicants must be available for a work-from-home internship, able to begin between 29 June 2026 and 3 August 2026, willing to commit for 6 months, and have relevant skills and interest in the role.
Other Requirements
- Prior exposure to serving or optimizing multimodal workloads such as video, image generation, 3D, VLM, VLA, world models, simulations, synthetic data, gaming, or similar AI applications.
- Experience working with GPU cloud environments such as AWS, GCP, Azure, CoreWeave, Lambda, RunPod, Modal, or similar platforms.
- Background in multi-GPU inference, cluster scheduling, or multi-tenant serving systems.
- Familiarity with CUDA C++, Triton kernels, CUTLASS, PTX, or GPU performance optimization tools.
- Knowledge of compiler and runtime stacks like TVM, MLIR, XLA, TorchInductor, or TensorRT.
- Strong exposure to ML systems, inference serving, GPU-backed deployment, and performance optimization.
- Deep PyTorch knowledge, including the ability to take research code and turn it into production-ready systems.
- Experience running and maintaining large AI models in cloud GPU environments.
- Comfort using profiling tools such as PyTorch Profiler and NVIDIA Nsight.
- Hands-on work with inference frameworks such as TensorRT, Triton, ONNX Runtime, Ray Serve, vLLM, SGLang, Modal, or BentoML.
- Understanding of batching, scheduling, quantization, compilation, distributed inference, Docker, cloud GPUs, and backend engineering.
- Strong problem-solving ability, good engineering judgment, and readiness to operate in a fast-moving startup setting.
Perks
- Certificate
- Flexible work hours
- 5-day work week
Internship Details
This is a 6-month work-from-home internship with an immediate start window. The stipend is INR 10,000 to 25,000 per month, and there are 3 openings. The application deadline is 29 July 2026 at 11:59 PM.
Skills Mentioned
Python, algorithms, data structures, machine learning, artificial intelligence, APIs, PyTorch, MLOps, AWS Lambda, LLMOps, and Python libraries.