Computer Vision: Image Preprocessing 🖼️
The Story of Teaching a Robot to See
Imagine you’re teaching a little robot friend to recognize pictures. But here’s the thing—your robot can only understand pictures that are clean, organized, and the same size. It’s like making sure all the puzzle pieces fit perfectly before solving the puzzle!
Image preprocessing is like being a helpful assistant who prepares photos before showing them to the robot. We clean them up, resize them, adjust the colors, and sometimes even add fun changes to help our robot learn better.
🎯 The Big Picture
Think of image preprocessing like packing a lunchbox:
| Step | Lunchbox Analogy | Image Preprocessing |
|---|---|---|
| 1 | Open the fridge | Image Loading |
| 2 | Cut food into bite-sized pieces | Image Transformations |
| 3 | Add some seasoning | Color Adjustments |
| 4 | Make portions equal | Normalization |
| 5 | Mix things up for variety | Random Augmentation |
| 6 | Pack everything in order | Data Pipelines |
1. Image Loading 📂
What is it?
Loading an image is like opening a book to read it. Before we can do anything with a picture, we need to bring it into our computer’s memory.
Simple Example
Imagine you have a photo on your phone. To edit it, you first need to open the photo app and select the picture. That’s loading!
import tensorflow as tf
# Load an image file
image = tf.io.read_file('cat.jpg')
# Decode it so computer
# understands it
image = tf.image.decode_jpeg(image)
print(image.shape)
# Output: (height, width, 3)
What happens inside?
graph TD A["📁 Image File on Disk"] --> B["Read File as Bytes"] B --> C["Decode into Pixels"] C --> D["🖼️ Tensor Ready to Use"]
Key Point: Images become a grid of numbers. Each number represents a color value (0-255).
2. Image Transformations 🔄
What is it?
Transformations are like playing with Play-Doh—you can stretch it, squish it, flip it, or rotate it!
The Most Common Transformations
Resize (Change Size)
Make images bigger or smaller. Like zooming in on a map!
# Make image 224x224 pixels
resized = tf.image.resize(
image,
[224, 224]
)
Flip (Mirror Image)
Like looking in a mirror!
# Flip left to right
flipped = tf.image.flip_left_right(
image
)
Rotate
Spin the image around!
# Rotate 90 degrees
rotated = tf.image.rot90(image, k=1)
Crop (Cut a Piece)
Like cutting out your favorite part of a magazine picture.
# Cut out center portion
cropped = tf.image.central_crop(
image,
central_fraction=0.5
)
Why Transform?
Our robot brain works best when all pictures are the same size. Imagine trying to compare a tiny stamp to a huge poster—that’s hard! Making them the same size helps.
3. Image Color Adjustments 🎨
What is it?
Color adjustments are like using Instagram filters! We can make pictures brighter, more colorful, or even change them to black and white.
Brightness
Make the picture lighter or darker.
# Make it brighter (+0.2)
bright = tf.image.adjust_brightness(
image,
delta=0.2
)
Think of it like opening curtains to let more light in!
Contrast
Make the difference between light and dark parts stronger.
# Increase contrast
contrast = tf.image.adjust_contrast(
image,
contrast_factor=1.5
)
Like turning up the “pop” in your photo.
Saturation
Make colors more or less intense.
# More colorful
saturated = tf.image.adjust_saturation(
image,
saturation_factor=2.0
)
Hue
Shift all colors around the rainbow!
# Shift colors
hue_shifted = tf.image.adjust_hue(
image,
delta=0.1
)
Grayscale
Turn to black and white.
# No more colors
gray = tf.image.rgb_to_grayscale(image)
graph TD A["🌈 Original Image"] --> B{Color Adjustments} B --> C["☀️ Brightness"] B --> D["📊 Contrast"] B --> E["🎨 Saturation"] B --> F["🔄 Hue"] B --> G["⬛ Grayscale"]
4. Image Normalization 📏
What is it?
Normalization is like making sure everyone speaks the same language. Pixel values can range from 0-255, but our robot learns better with smaller numbers between 0-1 or -1 to 1.
Why Normalize?
Imagine you’re comparing test scores. One test is out of 100, another out of 1000. It’s confusing! If we convert both to percentages, comparing becomes easy.
Simple Normalization (0 to 1)
# Divide by 255
# Values go from 0-255 to 0-1
normalized = image / 255.0
Centered Normalization (-1 to 1)
# Center around zero
# Values go from 0-255 to -1 to 1
normalized = (image / 127.5) - 1.0
Standard Normalization
Using mean and standard deviation (like what ImageNet uses):
# Subtract mean, divide by std
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
normalized = (image - mean) / std
| Method | Range | When to Use |
|---|---|---|
| Simple | 0 to 1 | Most cases |
| Centered | -1 to 1 | Some networks prefer this |
| Standard | varies | Pretrained models |
5. Random Image Augmentation 🎲
What is it?
Augmentation is like teaching with different versions of the same thing. If you only show a robot one picture of a cat, it might not recognize cats from different angles. But if you show MANY variations, it learns better!
The Magic of Randomness
Each time we train, we randomly change images. This creates “new” training data from what we already have!
Random Flip
# 50% chance to flip
maybe_flipped = tf.image.random_flip_left_right(
image
)
Random Brightness
# Random brightness change
maybe_bright = tf.image.random_brightness(
image,
max_delta=0.3
)
Random Contrast
# Random contrast change
maybe_contrast = tf.image.random_contrast(
image,
lower=0.8,
upper=1.2
)
Random Crop
# Randomly cut out a piece
maybe_cropped = tf.image.random_crop(
image,
size=[200, 200, 3]
)
Random Saturation & Hue
# Random color changes
image = tf.image.random_saturation(
image, 0.8, 1.2
)
image = tf.image.random_hue(
image, 0.1
)
Why Random?
graph TD A["1 Original Cat Photo"] --> B["Random Augmentation"] B --> C["🐱 Flipped Cat"] B --> D["🐱 Bright Cat"] B --> E["🐱 Cropped Cat"] B --> F["🐱 Dark Cat"] B --> G["🐱 Rotated Cat"] H["Robot sees 5+ versions!"] --> I["Learns Better!"]
One image becomes many training examples!
6. Data Augmentation Pipelines 🏭
What is it?
A pipeline is like an assembly line in a factory. Images go in one end, get processed step by step, and come out the other end perfectly prepared!
Building a Pipeline
Instead of doing each step separately, we chain them together:
def preprocess_image(image_path):
# Step 1: Load
image = tf.io.read_file(image_path)
image = tf.image.decode_jpeg(image)
# Step 2: Resize
image = tf.image.resize(
image, [224, 224]
)
# Step 3: Augment (training only)
image = tf.image.random_flip_left_right(
image
)
image = tf.image.random_brightness(
image, 0.2
)
# Step 4: Normalize
image = image / 255.0
return image
Using tf.data for Speed
TensorFlow’s tf.data makes pipelines super fast:
# Create dataset from file paths
dataset = tf.data.Dataset.list_files(
'images/*.jpg'
)
# Apply preprocessing
dataset = dataset.map(preprocess_image)
# Batch images together
dataset = dataset.batch(32)
# Prefetch for speed
dataset = dataset.prefetch(
tf.data.AUTOTUNE
)
The Complete Flow
graph TD A["📁 Image Files"] --> B["Load & Decode"] B --> C["Resize to 224x224"] C --> D["Random Augmentations"] D --> E["Normalize 0-1"] E --> F["Batch into Groups"] F --> G["🤖 Ready for Training!"]
Training vs Testing
Important: We only use random augmentation during training, not testing!
def preprocess_train(image):
image = tf.image.resize(image, [224, 224])
image = tf.image.random_flip_left_right(image)
image = tf.image.random_brightness(image, 0.2)
image = image / 255.0
return image
def preprocess_test(image):
image = tf.image.resize(image, [224, 224])
image = image / 255.0 # No random changes!
return image
🎉 Putting It All Together
Here’s a real-world example:
import tensorflow as tf
def create_training_pipeline(file_paths):
def process(path):
# Load
img = tf.io.read_file(path)
img = tf.image.decode_jpeg(img)
# Transform
img = tf.image.resize(img, [224, 224])
# Augment
img = tf.image.random_flip_left_right(img)
img = tf.image.random_brightness(img, 0.1)
img = tf.image.random_contrast(img, 0.9, 1.1)
# Normalize
img = img / 255.0
return img
dataset = tf.data.Dataset.from_tensor_slices(
file_paths
)
dataset = dataset.map(process)
dataset = dataset.batch(32)
dataset = dataset.prefetch(tf.data.AUTOTUNE)
return dataset
🧠 Key Takeaways
| Concept | One-Liner |
|---|---|
| Loading | Open the image file |
| Transformations | Resize, flip, rotate, crop |
| Color Adjustments | Brightness, contrast, saturation |
| Normalization | Scale values to 0-1 or -1 to 1 |
| Augmentation | Random changes = more training data |
| Pipelines | Chain all steps efficiently |
💪 You’ve Got This!
Image preprocessing might seem like a lot, but remember:
- Load the image
- Resize it to the right size
- Augment it randomly (training only)
- Normalize the pixel values
- Batch and prefetch for speed
That’s it! You’re now ready to prepare images like a pro. Your robot friend will be so happy with the clean, organized pictures you give it! 🤖✨
