What is Computer Vision?
Computer Vision is the field of AI that enables machines to interpret and understand visual information from the world - images, videos, and real-time camera feeds. It powers autonomous vehicles, medical imaging, facial recognition, and countless other applications.
Deep learning, particularly Convolutional Neural Networks (CNNs), has revolutionized computer vision, achieving superhuman performance on many tasks.
Common Computer Vision Tasks
- Image Classification: Assign a label to an entire image (cat, dog, car)
- Object Detection: Locate and classify multiple objects with bounding boxes
- Semantic Segmentation: Classify every pixel in the image
- Instance Segmentation: Identify individual object instances
- Pose Estimation: Detect human body keypoints
- Face Recognition: Identify or verify individuals
Image Processing with OpenCV
import cv2
import numpy as np
# Read image
img = cv2.imread('image.jpg')
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# Resize
resized = cv2.resize(img, (224, 224))
# Grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Blur
blurred = cv2.GaussianBlur(img, (5, 5), 0)
# Edge detection
edges = cv2.Canny(gray, 100, 200)
# Draw on images
cv2.rectangle(img, (x, y), (x+w, y+h), (0, 255, 0), 2)
cv2.putText(img, 'Label', (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2)
# Save
cv2.imwrite('output.jpg', img)
Image Classification with PyTorch
import torch
import torchvision.transforms as transforms
import torchvision.models as models
from PIL import Image
# Load pretrained model
model = models.resnet50(pretrained=True)
model.eval()
# Image preprocessing
transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)
])
# Load and process image
img = Image.open('cat.jpg')
img_tensor = transform(img).unsqueeze(0)
# Predict
with torch.no_grad():
outputs = model(img_tensor)
_, predicted = torch.max(outputs, 1)
# Get class name
print(f"Predicted class: {predicted.item()}")
Transfer Learning
Use pretrained models and fine-tune for your specific task:
import torch.nn as nn
import torchvision.models as models
# Load pretrained ResNet
model = models.resnet50(pretrained=True)
# Freeze all layers
for param in model.parameters():
param.requires_grad = False
# Replace final layer for your number of classes
num_classes = 5
model.fc = nn.Sequential(
nn.Linear(model.fc.in_features, 256),
nn.ReLU(),
nn.Dropout(0.4),
nn.Linear(256, num_classes)
)
# Only train new layers
optimizer = torch.optim.Adam(model.fc.parameters(), lr=0.001)
Object Detection with YOLO
from ultralytics import YOLO
# Load pretrained YOLOv8
model = YOLO('yolov8n.pt') # nano model for speed
# Run inference
results = model('image.jpg')
# Process results
for result in results:
boxes = result.boxes
for box in boxes:
# Bounding box coordinates
x1, y1, x2, y2 = box.xyxy[0]
# Confidence score
confidence = box.conf[0]
# Class ID
class_id = box.cls[0]
class_name = model.names[int(class_id)]
print(f"{class_name}: {confidence:.2f}")
# Save annotated image
results[0].save('detected.jpg')
# Real-time detection from webcam
results = model(source=0, show=True) # webcam
Data Augmentation
Artificially expand your training data:
from torchvision import transforms
train_transform = transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(15),
transforms.ColorJitter(
brightness=0.2, contrast=0.2,
saturation=0.2, hue=0.1
),
transforms.RandomAffine(degrees=0, translate=(0.1, 0.1)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
# For more advanced augmentations
import albumentations as A
transform = A.Compose([
A.RandomCrop(224, 224),
A.HorizontalFlip(p=0.5),
A.RandomBrightnessContrast(p=0.2),
A.GaussNoise(p=0.1),
A.Normalize(),
])
Best Practices
- Start with pretrained models: ImageNet pretrained weights transfer well
- Augment your data: Prevents overfitting, improves generalization
- Use appropriate image sizes: Balance between detail and memory
- Normalize inputs: Match the normalization used in pretraining
- Handle class imbalance: Use weighted loss or oversampling
- Visualize predictions: Grad-CAM shows what the model focuses on
Master Computer Vision with Expert Mentorship
Our Data Science program covers computer vision from basics to advanced object detection. Build real vision AI projects with guidance from industry experts.
Explore Data Science Program