Rifah's Projects

Work related to Computer Vision & Image Processing

Some casual Instant-NGP/Nerf results:

I've ran Instant-NGP (a nerf technique) that uses MLPs to generate 3D scenes from 2D images only. It was even able to create 3D scenes for areas that were absent in the 2D image. How amazing! This could automate the work of creating 3D contents with Unreal to some extent.

Indoor Station — Fig 1 & 2 : 3D Interiors

Indoor Church — Fig 1 & 2 : 3D Interiors

Building with Instant-NGP — Fig 3 & 4 : 3D exteriors

PROJECT 1: RGB-D Fusion for Segmentation on Indoor data for Autonomous Wheelchair

Research Project - Worked on image+depth fusion techniques with transformer networks and with traditional YOLOV7 for segmentation on indoor dataset used for autonomous wheelchair scenario.

Problem - Automate the motorized wheelchair by creating an understanding of the surrounding scene

Solution - Using various segmentation methods, analyze the pros and cons of the outputs based on the model architectures, and improve the performance on the collected dataset.

With the goal of contributing to the automation of a motorized wheelchair, this research project focuses on deep learning approaches and data fusion techniques on indoor data. The objective of the research project is to perform segmentation on indoor dataset collected manually from a camera at Carleton University.

I mainly explore and compare two segmentation methods, traditional neural networks Yolov7 and a multimodal encoder-decoder transformers TokenFusion, in terms of performance and outputs.

Yolov7 : solely depends on image inputs with traditional CNNs for segmentation tasks. Originally trained on a much larger dataset (MS Coco dataset) of only images. Yolo originally performs object detection, but version 7 can also perform segmentation.

TokenFusion: transformer method depends on multimodal data (Image and depth), fusing images with corresponding depths in the encoder-decoder models. Originally trained on a smaller dataset (Nyudv dataset). Can perform both segmentation and object detection

The results demonstrate that the inclusion of depth images are highly beneficial for segmenting the interior structure of the indoor environment, and the performance based on training, time and accuracy are also compared between the two models. Both the methods are quite different in terms of input modalities and backbone networks, and the advantages and disadvantages of both methods are discussed.

Questions I asked myself while using the ML model:

All these models are shown to perform really well on the dataset they were trained on, but does it actually perform well on ANY indoor dataset? Let's explore!

If it doesn't, what are the possible reasons? What should be fine-tuned?

What information is it failing to understand or percieve properly?

Is this a generalization problem?

Is this a data preprocessing problem?

*If you're interested to discuss more or view the code, please Email Me . Due to privacy constraints, you will need extra permission from the Professor to view the code.

First Image : Input Image (RGB)
[Collected from Intel RealSense D455 camera sensor]

Second Image : Depth (D)
[Collected from Intel RealSense D455 camera sensor]

Third Image : Output from Yolov7

Fourth Image : Output from TokenFusion

PROJECT 2 : GANs (Generative Adversarial Networks) on Limited Data

Problem : Shortage of Image data with labels

Solution : Synthetically generate image data with Generative AI Models

Used pre-trained StyleGAN2, GP-GAN and SIN-GAN trained on Breast Cancer images.

It is important to see how the model performs inititally without fine-tuning. To no surprise, the model performed horrible initially. But this gives an insight to where the problem might be and what needs to be changed.

Fine-tuned learning rates and decay values

Changed the regularizor loss to LeCam Loss to improve the generalization of the images.

Synthetically generated better images based on different batch sizes

If more GPU power was available, it is possible to attain further detailing of the generated images.

Questions that still need to be answered:
- How do you defined good results here? I know the loss and FID metrics provide an insight, but even for us humans, it is not clear how to measure goodness of abstract images. If you have any insights, please Email Me .

PROJECT 3: Artistic Images with Dijsktra's Algorithm

Inspired by the various elements in rocks and stones, used Dijkstra's algorithm to create artistic images from a graph structure.
Taking each pixel as a node and connecting it 4-ways with its neighbors, intiated the cost to infinite at the beginning
Used 2 distributions to distribute the edge weights : Guassian and Random
Based on the distributions, selected sites as user inputs from where dispersion will occur.
Run Dijsktra's algorithm from all the sites/source points.
Used various types of costs for experimentations : Usual cost, thresholding cost, using a reciprocal of the costs, edge differences, putting extra weight of maximum edges,etc.
Normalized and recolored the resulting images

Work related to Computer Vision & Image Processing

Some casual Instant-NGP/Nerf results:

PROJECT 1: RGB-D Fusion for Segmentation on Indoor data for Autonomous Wheelchair

*If you're interested to discuss more or view the code, please Email Me . Due to privacy constraints, you will need extra permission from the Professor to view the code.

First Image : Input Image (RGB)
[Collected from Intel RealSense D455 camera sensor]

Second Image : Depth (D)
[Collected from Intel RealSense D455 camera sensor]

Third Image : Output from Yolov7

Fourth Image : Output from TokenFusion

PROJECT 2 : GANs (Generative Adversarial Networks) on Limited Data

Original Images Fed

Synthetically Generated Images

PROJECT 3: Artistic Images with Dijsktra's Algorithm

What I've Learned from DeepLearning.ai Specialization (2019)

Emotion Detection Model

Face Recognition Model

Neural Style Transfer

Autonomous Driving using YOLO algorithm

Work related to Computer Vision & Image Processing

Some casual Instant-NGP/Nerf results:

PROJECT 1: RGB-D Fusion for Segmentation on Indoor data for Autonomous Wheelchair

*If you're interested to discuss more or view the code, please Email Me . Due to privacy constraints, you will need extra permission from the Professor to view the code.

First Image : Input Image (RGB) [Collected from Intel RealSense D455 camera sensor] Second Image : Depth (D) [Collected from Intel RealSense D455 camera sensor] Third Image : Output from Yolov7 Fourth Image : Output from TokenFusion

PROJECT 2 : GANs (Generative Adversarial Networks) on Limited Data

Original Images Fed

Synthetically Generated Images

PROJECT 3: Artistic Images with Dijsktra's Algorithm

What I've Learned from DeepLearning.ai Specialization (2019)

Emotion Detection Model

Face Recognition Model

Neural Style Transfer

Autonomous Driving using YOLO algorithm

First Image : Input Image (RGB)
[Collected from Intel RealSense D455 camera sensor]

Second Image : Depth (D)
[Collected from Intel RealSense D455 camera sensor]

Third Image : Output from Yolov7

Fourth Image : Output from TokenFusion