Skip to content

Video API

Package

video

Video helpers for the drone API.

Optional backends are lazy-loaded so importing the core API does not require OpenCV, PyAV, ONNX Runtime, Flask, or YOLO dependencies.

FrameCallback module-attribute

FrameCallback = Callable[[Frame], Frame | None]

Frame dataclass

Video frame with optional detections and metadata.

The frame flows through the callback pipeline, allowing each callback to add detections or modify the image.

image instance-attribute

image: NDArray[uint8]

timestamp class-attribute instance-attribute

timestamp: float = field(default_factory=time)

frame_number class-attribute instance-attribute

frame_number: int = 0

detections class-attribute instance-attribute

detections: list[Detection] = field(default_factory=list)

metadata class-attribute instance-attribute

metadata: dict = field(default_factory=dict)

shape property

shape: tuple[int, int, int]

Image shape (height, width, channels).

height property

height: int

Image height in pixels.

width property

width: int

Image width in pixels.

size property

size: tuple[int, int]

Image size as (width, height).

copy

copy() -> Frame

Create a deep copy of the frame.

draw_detections

draw_detections(thickness: int = 2, font_scale: float = 0.6) -> NDArray[uint8]

Draw detection boxes and labels on a copy of the image.

Parameters:

Name Type Description Default
thickness int

Line thickness for boxes

2
font_scale float

Font scale for labels

0.6

Returns:

Type Description
NDArray[uint8]

Annotated image copy

to_rgb

to_rgb() -> NDArray[uint8]

Convert BGR image to RGB.

to_jpeg

to_jpeg(quality: int = 85) -> bytes

Encode frame as JPEG bytes.

Parameters:

Name Type Description Default
quality int

JPEG quality (0-100)

85

Returns:

Type Description
bytes

JPEG encoded bytes

Detection dataclass

Single object detection result.

label instance-attribute

label: str

confidence instance-attribute

confidence: float

bbox instance-attribute

class_id class-attribute instance-attribute

class_id: int | None = None

color class-attribute instance-attribute

color: tuple[int, int, int] = (0, 255, 0)

metadata class-attribute instance-attribute

metadata: dict = field(default_factory=dict)

BoundingBox dataclass

Bounding box for detected object.

x instance-attribute

x: int

y instance-attribute

y: int

width instance-attribute

width: int

height instance-attribute

height: int

x2 property

x2: int

Bottom-right X coordinate.

y2 property

y2: int

Bottom-right Y coordinate.

center property

center: tuple[int, int]

Center point (x, y).

area property

area: int

Area in pixels.

to_tuple

to_tuple() -> tuple[int, int, int, int]

Return as (x, y, w, h) tuple.

to_xyxy

to_xyxy() -> tuple[int, int, int, int]

Return as (x1, y1, x2, y2) tuple.

StreamConfig dataclass

Video stream configuration.

drone_ip class-attribute instance-attribute

drone_ip: str = field(default_factory=lambda: drone_ip)

drone_id class-attribute instance-attribute

drone_id: int = 1

timeout class-attribute instance-attribute

timeout: float = field(default_factory=lambda: timeout_sec)

buffer_size class-attribute instance-attribute

buffer_size: int = field(default_factory=lambda: buffer_size)

rtp_port property

rtp_port: int

Calculate RTP port from drone ID.

generate_sdp

generate_sdp(local_ip: str = '0.0.0.0') -> str

Generate SDP content for RTP stream.

Parameters:

Name Type Description Default
local_ip str

Local IP address to receive stream on (default 0.0.0.0 for any)

'0.0.0.0'

Returns:

Type Description
str

SDP file content string

Note

Format based on C# Unity app's SDP generation, with required SDP header fields for FFmpeg/PyAV compatibility.

StreamState

Bases: IntEnum

Video stream connection state.

DISCONNECTED class-attribute instance-attribute

DISCONNECTED = 0

CONNECTING class-attribute instance-attribute

CONNECTING = 1

CONNECTED class-attribute instance-attribute

CONNECTED = 2

STREAMING class-attribute instance-attribute

STREAMING = 3

ERROR class-attribute instance-attribute

ERROR = 4

STOPPED class-attribute instance-attribute

STOPPED = 5

Streaming

stream

Video stream decoder using PyAV for RTP/RTSP H.264 streams.

RTSPStream

RTSP video stream decoder using PyAV.

For testing without a drone or connecting to any RTSP source.

Example:

stream = RTSPStream("rtsp://localhost:8554/stream")
def detect(frame):
    frame.detections = model.detect(frame.image)
    return frame

stream.add_callback(detect)
stream.add_callback(VideoDisplay())
stream.start()
stream.wait()

url property

url: str

RTSP URL.

state property

state: StreamState

Current stream state.

is_streaming property

is_streaming: bool

Check if stream is actively running.

fps property

fps: float

Current frames per second.

frame_count property

frame_count: int

Total frames decoded.

latest_frame property

latest_frame: Frame | None

Get the most recent frame (thread-safe).

last_error property

last_error: str | None

Last error message, if any.

add_callback

add_callback(callback: FrameCallback) -> None

Add a frame processing callback.

remove_callback

remove_callback(callback: FrameCallback) -> bool

Remove a callback.

clear_callbacks

clear_callbacks() -> None

Remove all callbacks.

get_buffered_frames

get_buffered_frames(count: int | None = None) -> list[Frame]

Get frames from the buffer.

start

start(blocking: bool = False) -> None

Start the video stream.

stop

stop(timeout: float = 5.0) -> None

Stop the video stream.

wait

wait() -> None

Block until stream stops.

VideoStream

RTP H.264 video stream decoder using PyAV.

Decodes drone video stream and passes frames through a callback pipeline. Supports multiple callbacks for detection, display, recording, etc.

Example:

stream = VideoStream(drone_ip="192.168.100.1")

# Add detection callback
def detect(frame):
    frame.detections = my_model.detect(frame.image)
    return frame
stream.add_callback(detect)

# Add display
stream.add_callback(display)

stream.start()
stream.wait()  # Block until stopped

config instance-attribute

config = StreamConfig(drone_ip=drone_ip or drone_ip, drone_id=drone_id, timeout=timeout if timeout is not None else timeout_sec, buffer_size=buffer_size if buffer_size is not None else buffer_size)

state property

state: StreamState

Current stream state.

is_streaming property

is_streaming: bool

Check if stream is actively running.

fps property

fps: float

Current frames per second.

frame_count property

frame_count: int

Total frames decoded.

latest_frame property

latest_frame: Frame | None

Get the most recent frame (thread-safe).

last_error property

last_error: str | None

Last error message, if any.

add_callback

add_callback(callback: FrameCallback) -> None

Add a frame processing callback.

Callbacks are executed in order. Each receives the frame (potentially modified by previous callbacks) and can:

  • Add detections
  • Modify the image
  • Return modified Frame or None

Parameters:

Name Type Description Default
callback FrameCallback

Function taking Frame, returning Frame or None

required

remove_callback

remove_callback(callback: FrameCallback) -> bool

Remove a callback.

Parameters:

Name Type Description Default
callback FrameCallback

Callback to remove

required

Returns:

Type Description
bool

True if removed, False if not found

clear_callbacks

clear_callbacks() -> None

Remove all callbacks.

get_buffered_frames

get_buffered_frames(count: int | None = None) -> list[Frame]

Get frames from the buffer.

Parameters:

Name Type Description Default
count int | None

Number of frames to get (None for all)

None

Returns:

Type Description
list[Frame]

List of frames (oldest first)

start

start(blocking: bool = False) -> None

Start the video stream.

Parameters:

Name Type Description Default
blocking bool

If True, block until stream stops

False

stop

stop(timeout: float = 5.0) -> None

Stop the video stream.

Parameters:

Name Type Description Default
timeout float

Seconds to wait for thread to stop

5.0

wait

wait() -> None

Block until stream stops.

VideoStreamSimple

Simplified video stream for quick testing.

Opens stream and yields frames directly without callbacks.

Example:

for frame in VideoStreamSimple("192.168.100.1"):
    cv2.imshow("Video", frame.image)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

config instance-attribute

config = StreamConfig(drone_ip=drone_ip or drone_ip, drone_id=drone_id, timeout=timeout if timeout is not None else timeout_sec)

close

close() -> None

Close stream and cleanup.

Display

display

OpenCV-based video display for drone stream.

VideoDisplay

OpenCV window display as a frame callback.

Displays frames with optional detection overlays, FPS counter, and recording indicator.

Example:

stream = VideoStream(drone_ip="192.168.100.1")
display = VideoDisplay(window_name="Drone", show_fps=True)
stream.add_callback(display)
stream.start()

# Display handles its own window events
# Press 'q' to quit, 's' to screenshot

window_name instance-attribute

window_name = window_name

show_fps instance-attribute

show_fps = show_fps

show_detections instance-attribute

show_detections = show_detections

show_info instance-attribute

show_info = show_info

scale instance-attribute

scale = scale

fullscreen instance-attribute

fullscreen = fullscreen

on_key instance-attribute

on_key = on_key

fps property

fps: float

Current display FPS.

should_stop property

should_stop: bool

Check if user requested stop (pressed 'q').

set_recording

set_recording(recording: bool) -> None

Set recording indicator state.

close

close() -> None

Close the display window.

VideoDisplayAsync

Async display that runs in its own thread.

Useful when you want display to run independently of the frame processing pipeline.

Example:

display = VideoDisplayAsync()
display.start()

for frame in stream:
    detections = detector.detect(frame)
    frame.detections = detections
    display.update(frame)  # Non-blocking

display.stop()

should_stop property

should_stop: bool

Check if user requested stop.

start

start() -> None

Start display thread.

stop

stop(timeout: float = 2.0) -> None

Stop display thread.

update

update(frame: Frame) -> None

Update displayed frame (non-blocking).

Parameters:

Name Type Description Default
frame Frame

New frame to display

required

show_frame

show_frame(frame: Frame, window_name: str = 'Frame', wait_key: int = 0, show_detections: bool = True) -> int

Quick utility to show a single frame.

Parameters:

Name Type Description Default
frame Frame

Frame to display

required
window_name str

Window name

'Frame'
wait_key int

cv2.waitKey argument (0 = wait forever)

0
show_detections bool

Draw detections if present

True

Returns:

Type Description
int

Key code pressed

Recording

recording

Video recording functionality using PyAV.

Provides callback-based recording for the VideoStream pipeline.

RecordingConfig dataclass

Configuration for video recording.

output_path instance-attribute

output_path: str

codec class-attribute instance-attribute

codec: str = 'libx264'

fps class-attribute instance-attribute

fps: float = 30.0

bitrate class-attribute instance-attribute

bitrate: int | None = None

pix_fmt class-attribute instance-attribute

pix_fmt: str = 'yuv420p'

preset class-attribute instance-attribute

preset: str = 'fast'

crf class-attribute instance-attribute

crf: int = 23

container_format class-attribute instance-attribute

container_format: str | None = None

VideoRecorder

Video recording callback for VideoStream.

Records frames to a video file using PyAV (FFmpeg). Can be used as a callback in the VideoStream pipeline.

Example:

stream = VideoStream(drone_ip="192.168.100.1")

# Basic recording
recorder = VideoRecorder("flight.mp4")
stream.add_callback(recorder)
stream.start()
# ... stream video ...
stream.stop()
recorder.close()

# With context manager
with VideoRecorder("flight.mp4") as recorder:
    stream.add_callback(recorder)
    stream.start()
    stream.wait()
# File automatically finalized

# Record with detections drawn
recorder = VideoRecorder("flight_annotated.mp4", draw_detections=True)

config instance-attribute

config = RecordingConfig(output_path=str(output_path), codec=codec, fps=fps, bitrate=bitrate, preset=preset, crf=crf)

frame_count property

frame_count: int

Number of frames recorded.

duration property

duration: float

Recording duration in seconds.

is_recording property

is_recording: bool

Check if recorder is active.

write_frame

write_frame(frame: Frame) -> None

Manually write a frame (alternative to callback usage).

Parameters:

Name Type Description Default
frame Frame

Frame to record

required

close

close() -> None

Finalize and close the video file.

Must be called when recording is complete to flush remaining frames and write file trailer.

SegmentedRecorder

Records video in segments of fixed duration.

Useful for long recordings or when you want multiple smaller files instead of one large file.

Example:

# Record in 5-minute segments
recorder = SegmentedRecorder(
    output_dir="recordings",
    segment_duration=300,  # 5 minutes
    filename_pattern="flight_{timestamp}_{segment}.mp4"
)
stream.add_callback(recorder)

close

close() -> None

Close current segment and finalize.

Detection

detection

Object detection integration for video streaming.

Provides YOLO and other detector wrappers as VideoStream callbacks.

DetectorConfig dataclass

Configuration for object detectors.

confidence_threshold class-attribute instance-attribute

confidence_threshold: float = 0.25

iou_threshold class-attribute instance-attribute

iou_threshold: float = 0.45

max_detections class-attribute instance-attribute

max_detections: int = 100

classes class-attribute instance-attribute

classes: list[int] | None = None

device class-attribute instance-attribute

device: str = 'auto'

half class-attribute instance-attribute

half: bool = False

verbose class-attribute instance-attribute

verbose: bool = False

BaseDetector

Bases: ABC

Abstract base class for object detectors.

Subclass this to integrate custom detection models.

config instance-attribute

config = config or DetectorConfig()

avg_inference_time property

avg_inference_time: float

Average inference time in milliseconds.

detect abstractmethod

detect(image: ndarray) -> list[Detection]

Run detection on an image.

Parameters:

Name Type Description Default
image ndarray

BGR image (numpy array)

required

Returns:

Type Description
list[Detection]

List of Detection objects

YOLODetector

Bases: BaseDetector

YOLO object detector using ultralytics library.

Supports YOLOv8, YOLOv9, YOLOv10, YOLO11, and YOLO-World models.

Example:

from pypack.video import VideoStream, YOLODetector, VideoDisplay

# Basic usage
detector = YOLODetector("yolov8n.pt")
stream = VideoStream(drone_ip="192.168.100.1")
stream.add_callback(detector)
stream.add_callback(VideoDisplay())
stream.start()

# With custom configuration
detector = YOLODetector(
    model_path="yolov8s.pt",
    confidence=0.5,
    classes=[0, 1, 2],  # person, bicycle, car
    device="cuda",
)

# Using YOLO-World for open vocabulary detection
detector = YOLODetector("yolov8s-world.pt")
detector.set_classes(["person", "drone", "car"])

class_names property

class_names: dict

Get model class names (loads model if needed).

set_classes

set_classes(classes: list[str]) -> None

Set classes for YOLO-World open vocabulary detection.

Parameters:

Name Type Description Default
classes list[str]

List of class names to detect

required

detect

detect(image: ndarray) -> list[Detection]

Run YOLO detection on image.

Parameters:

Name Type Description Default
image ndarray

BGR image (numpy array)

required

Returns:

Type Description
list[Detection]

List of Detection objects

YOLOSegmentDetector

Bases: YOLODetector

YOLO instance segmentation detector.

Returns detections with segmentation masks in metadata.

Example:

detector = YOLOSegmentDetector("yolov8n-seg.pt")
stream.add_callback(detector)

# Access masks
for det in frame.detections:
    mask = det.metadata.get("mask")  # Binary mask

detect

detect(image: ndarray) -> list[Detection]

Run YOLO segmentation.

FilterDetector

Bases: BaseDetector

Wrapper that filters detections from another detector.

Useful for filtering by class, confidence, size, or region.

Example:

# Only detect persons with high confidence
detector = YOLODetector("yolov8n.pt")
filtered = FilterDetector(
    detector,
    classes=["person"],
    min_confidence=0.7,
)
stream.add_callback(filtered)

detect

detect(image: ndarray) -> list[Detection]

Run detection with filtering.

DrawDetections

Callback that draws detections on frames.

Use after a detector in the pipeline to visualize results.

Example:

stream.add_callback(detector)
stream.add_callback(DrawDetections())
stream.add_callback(display)

DetectionLogger

Callback that logs detections for analysis.

Example:

logger = DetectionLogger()
stream.add_callback(detector)
stream.add_callback(logger)
# ... stream ...
print(logger.get_summary())

total_detections property

total_detections: int

Total number of detections logged.

class_counts property

class_counts: dict

Detection counts per class.

get_summary

get_summary() -> str

Get detection summary string.

clear

clear() -> None

Clear logged data.

FrameCrop

Callback that crops margins from video frames.

Use BEFORE detection to remove unwanted regions (e.g., propellers, sky, ground). Detection coordinates will be relative to the cropped frame.

Example:

# Remove top 300px and bottom 200px from each frame
crop = FrameCrop(top=300, bottom=200)
stream.add_callback(crop)       # 1. Crop first
stream.add_callback(detector)   # 2. Then detect
stream.add_callback(display)

top instance-attribute

top = max(0, top)

bottom instance-attribute

bottom = max(0, bottom)

left instance-attribute

left = max(0, left)

right instance-attribute

right = max(0, right)

enabled instance-attribute

enabled = True

SaveDetectionCrop

Callback that saves cropped images of detected objects.

Use AFTER detection to save each detected object as a separate image file.

Example:

saver = SaveDetectionCrop(save_dir="detections", one_per_class=True)
stream.add_callback(detector)   # 1. Detect first
stream.add_callback(saver)      # 2. Save crops
stream.add_callback(display)

# After streaming, check saved images:
print(saver.saved_files)  # List of saved file paths

saved_files property

saved_files: list[str]

List of saved file paths.

saved_classes property

saved_classes: set

Set of class labels that have been saved.

reset

reset() -> None

Reset saved state to allow saving again.

ONNXDetector

Bases: BaseDetector

ONNX Runtime detector for YOLO-style models.

Handles models with output shape [1, 4+num_classes, num_boxes]. Applies NMS to filter detections.

Example:

from pypack.video import ONNXDetector, DrawDetections, VideoDisplay

detector = ONNXDetector(
    model_path="model.onnx",
    class_names=["House", "Tank", "Tree"],
    confidence=0.3,
)

stream = drone.start_video_stream(display=False)
stream.add_callback(detector)
stream.add_callback(DrawDetections())
stream.add_callback(VideoDisplay())

detect

detect(image: ndarray) -> list[Detection]

Run detection on image.

Parameters:

Name Type Description Default
image ndarray

BGR image (numpy array)

required

Returns:

Type Description
list[Detection]

List of Detection objects

Web

web

Web streaming support for drone video.

Provides MJPEG streaming over HTTP for browser viewing. Can be used standalone or as a frame callback.

Requires Flask (optional dependency):

pip install flask

MJPEGStreamer

MJPEG streamer as a frame callback.

Buffers frames and provides a generator for HTTP streaming. Compatible with Flask, FastAPI, or any WSGI/ASGI framework.

Example with Flask:

from flask import Flask, Response
from pypack.video import VideoStream, MJPEGStreamer

app = Flask(__name__)
streamer = MJPEGStreamer()

# Add to video stream
stream = VideoStream()
stream.add_callback(streamer)
stream.start()

@app.route('/video')
def video_feed():
    return Response(
        streamer.generate(),
        mimetype='multipart/x-mixed-replace; boundary=frame'
    )

app.run(host='0.0.0.0', port=5000)

quality instance-attribute

quality = quality

max_fps instance-attribute

max_fps = max_fps

draw_detections instance-attribute

draw_detections = draw_detections

frame_count property

frame_count: int

Number of frames processed.

client_count property

client_count: int

Number of active streaming clients.

get_frame

get_frame() -> bytes | None

Get latest JPEG frame.

Returns:

Type Description
bytes | None

JPEG bytes or None if no frame available

generate

generate() -> Generator[bytes, None, None]

Generate MJPEG stream for HTTP response.

Yields:

Type Description
bytes

MJPEG frame bytes with boundary markers

WebStreamServer

Standalone web server for video streaming.

Provides a simple Flask-based server for viewing the drone video stream in a web browser.

Example:

from pypack.video import VideoStream, WebStreamServer

stream = VideoStream()

# Start web server (runs in background)
server = WebStreamServer(stream, port=5000)
server.start()

print("Open http://localhost:5000 in browser")

stream.start(blocking=True)

host instance-attribute

host = host

port instance-attribute

port = port

url property

url: str

Server URL.

streamer property

streamer: MJPEGStreamer

Get the MJPEG streamer instance.

start

start(blocking: bool = False) -> None

Start the web server.

Parameters:

Name Type Description Default
blocking bool

If True, block until server stops

False

stop

stop() -> None

Stop the web server.