SAINITHIN ARTHAM
sai artham at bits-pilani ac in

I'm Sainithin Artham, currently a Research Fellow at IIIT Hyderabad, working under the guidance of Prof. C.V. Jawahar in the CVIT Lab. My research lies at the intersection of multimodal AI, autonomous driving, and deep learning for structured and unstructured environments.

Before this, I worked at BITS Pilani as a Junior Research Fellow, where I applied computer vision techniques for structural health monitoring in smart infrastructure systems. Prior to that, I completed my undergraduate studies at BML Munjal University, where I had the opportunity to work with Dr Soharab Hossain Shaikh on video understanding, and with Dr Arijit Maitra on deep learning and machine learning for healthcare applications .

CV  /  Google Scholar  /  LinkedIn  /  Twitter  /  GitHub  / 

profile photo
Research

My research focuses on building intelligent systems that combine vision, language, and geometry to enable real-world decision-making in complex environments. At IIIT Hyderabad, I work on developing agentic AI pipelines for autonomous driving by integrating multimodal foundation models. Previously, I worked on video understanding using Bayesian Neural ODEs and transformers. For a full list of publications please see Google Scholar.

Bayesian Neural ODE model Procedure segmentation in videos with Bayesian Neural ODE model (BNODE)
Sainithin Artham, Soharab Hossain Shaikh
Neural Computing and Applications, 2024  
arXiv / code, models, data, project page

This paper proposes Bayesian-based neural ordinary differential equations (Bayesian neural ODEs—BNODEs) as an alternative approach for video event localization

Dense Captioning Model A Neural ODE and Transformer‑based Model for Temporal Understanding and Dense Video Captioning
Sainithin Artham, Soharab Hossain Shaikh
Multimedia Tools and Applications, 2024  
arXiv / code, models, data, project page

Dense video captioning with Neural ODE

ConvLoA A transformer-based convolutional local attention (ConvLoA) method for temporal action localization
Sainithin Artham, Soharab Hossain Shaikh
International Journal of Machine Learning and Cybernetics, 2024  
arXiv / code, models, data, project page

We propose a novel framework that leverages an encoder-decoder mechanism powered by VidSwin to extract global features, which are subsequently combined with the local context. To achieve this, we designed ConvLoA, a convolutional local attention mechanism dedicated to computing contextual focus within localized areas in video frames

Dense Captioning Model Pred-AHCP: Robust feature selection enabled Sequence Specific Prediction of Anti-Hepatitis C Peptides via Machine Learning
Akash Saraswat, Utsav Sharma, Aryan Gandotra, Lakshit Wasan, Sainithin Artham, Arijit Maitra, Bipin Singh
Journal of Chemical Information and Modeling, 2024  
arXiv / code, models, data, project page

We developed an explainable ML model that harnesses the amino acid sequence of a peptide to predict its potential as an anti-HepC (AHC) agent

Dense Captioning Model Deep Learning for Autonomous Vehicle Object Detection
Sainithin Artham, Swarali Borde , Shashank Shekhar
IEEE IATSM Coference, 2024  
arXiv

This research explores deep neural networks (VGG16, AlexNet,and GoogleNet) for object classification and detection in autonomousvehicles


This guy is good at website design.