Computer Vision: Building AI That Sees

What is Computer Vision?

Computer Vision is the field of AI that enables machines to interpret and understand visual information from the world - images, videos, and real-time camera feeds. It powers autonomous vehicles, medical imaging, facial recognition, and countless other applications.

Deep learning, particularly Convolutional Neural Networks (CNNs), has revolutionized computer vision, achieving superhuman performance on many tasks.

Common Computer Vision Tasks

Image Classification: Assign a label to an entire image (cat, dog, car)
Object Detection: Locate and classify multiple objects with bounding boxes
Semantic Segmentation: Classify every pixel in the image
Instance Segmentation: Identify individual object instances
Pose Estimation: Detect human body keypoints
Face Recognition: Identify or verify individuals

Image Processing with OpenCV

import cv2
import numpy as np

# Read image
img = cv2.imread('image.jpg')
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

# Resize
resized = cv2.resize(img, (224, 224))

# Grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Blur
blurred = cv2.GaussianBlur(img, (5, 5), 0)

# Edge detection
edges = cv2.Canny(gray, 100, 200)

# Draw on images
cv2.rectangle(img, (x, y), (x+w, y+h), (0, 255, 0), 2)
cv2.putText(img, 'Label', (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2)

# Save
cv2.imwrite('output.jpg', img)

Image Classification with PyTorch

import torch
import torchvision.transforms as transforms
import torchvision.models as models
from PIL import Image

# Load pretrained model
model = models.resnet50(pretrained=True)
model.eval()

# Image preprocessing
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
])

# Load and process image
img = Image.open('cat.jpg')
img_tensor = transform(img).unsqueeze(0)

# Predict
with torch.no_grad():
    outputs = model(img_tensor)
    _, predicted = torch.max(outputs, 1)

# Get class name
print(f"Predicted class: {predicted.item()}")

Transfer Learning

Use pretrained models and fine-tune for your specific task:

import torch.nn as nn
import torchvision.models as models

# Load pretrained ResNet
model = models.resnet50(pretrained=True)

# Freeze all layers
for param in model.parameters():
    param.requires_grad = False

# Replace final layer for your number of classes
num_classes = 5
model.fc = nn.Sequential(
    nn.Linear(model.fc.in_features, 256),
    nn.ReLU(),
    nn.Dropout(0.4),
    nn.Linear(256, num_classes)
)

# Only train new layers
optimizer = torch.optim.Adam(model.fc.parameters(), lr=0.001)

Object Detection with YOLO

from ultralytics import YOLO

# Load pretrained YOLOv8
model = YOLO('yolov8n.pt')  # nano model for speed

# Run inference
results = model('image.jpg')

# Process results
for result in results:
    boxes = result.boxes
    for box in boxes:
        # Bounding box coordinates
        x1, y1, x2, y2 = box.xyxy[0]
        # Confidence score
        confidence = box.conf[0]
        # Class ID
        class_id = box.cls[0]
        class_name = model.names[int(class_id)]

        print(f"{class_name}: {confidence:.2f}")

# Save annotated image
results[0].save('detected.jpg')

# Real-time detection from webcam
results = model(source=0, show=True)  # webcam

Data Augmentation

Artificially expand your training data:

from torchvision import transforms

train_transform = transforms.Compose([
    transforms.RandomResizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(15),
    transforms.ColorJitter(
        brightness=0.2, contrast=0.2,
        saturation=0.2, hue=0.1
    ),
    transforms.RandomAffine(degrees=0, translate=(0.1, 0.1)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

# For more advanced augmentations
import albumentations as A

transform = A.Compose([
    A.RandomCrop(224, 224),
    A.HorizontalFlip(p=0.5),
    A.RandomBrightnessContrast(p=0.2),
    A.GaussNoise(p=0.1),
    A.Normalize(),
])

Best Practices

Start with pretrained models: ImageNet pretrained weights transfer well
Augment your data: Prevents overfitting, improves generalization
Use appropriate image sizes: Balance between detail and memory
Normalize inputs: Match the normalization used in pretraining
Handle class imbalance: Use weighted loss or oversampling
Visualize predictions: Grad-CAM shows what the model focuses on

Master Computer Vision with Expert Mentorship

Our Data Science program covers computer vision from basics to advanced object detection. Build real vision AI projects with guidance from industry experts.

Explore Data Science Program

Computer Vision

What is Computer Vision?

Common Computer Vision Tasks

Image Processing with OpenCV

Image Classification with PyTorch

Transfer Learning

Object Detection with YOLO

Data Augmentation

Best Practices

Master Computer Vision with Expert Mentorship

Related Articles

Deep Learning Fundamentals

PyTorch Framework

NLP with Transformers