Friday, 24 January 2020

Human Vision (Object Detection) and YOLO

How we humans detect and recognise object with our eyes:
  1. Location: Turn our eyes to some direction (left, right, top, bottom, centre, or some random point)
  2. Boundary: Detect the boundary
  3. Classification: Any object? classify it.
YOLO (You Only Look Once) network does somehow similarly:
  1. Location: Grid up an image (or video frame) to NxN, for example 3x3, that is also left, right, top, bottom, centre, etc.
  2. Boundary: From a point in a grid cell, predict a rectangle, that is boundary
  3. Classification: Any object? classify it.
Links:

No comments:

Post a Comment