FiVE-Bench: Fine-grained Video Editing Benchmark Leaderboard

Welcome to the FiVE-Bench Leaderboard

This leaderboard presents a comprehensive comparison of diffusion- and flow-based video editing methods on the FiVE benchmark.

Metrics evaluated:

  • Structure Preservation: Measuring distortion from the original video structure
  • Background Preservation: PSNR, LPIPS, MSE, SSIM metrics
  • Text Alignment: CLIPS and CLIPS_edit scores
  • Image Quality: NIQE (Natural Image Quality Evaluator)
  • Motion Fidelity: MFS (Motion Fidelity Score)
  • Efficiency: Processing time per frame
  • VLM-based Editing Accuracy: FIVE-YN, FIVE-MC, FiVE-U, FiVE-∩, and FiVE-Acc metrics

Methods marked with * require optimization, and † require depth/segmentation maps.

{
  • "headers": [
    • "model",
    • "model_type",
    • "Structure Dist.↓",
    • "PSNR↑",
    • "LPIPS↓",
    • "MSE↓",
    • "SSIM↑",
    • "CLIPS↑",
    • "CLIPS_edit↑",
    • "NIQE↓",
    • "MFS↑",
    • "Time (s)↓",
    • "FIVE-YN↑",
    • "FIVE-MC↑",
    • "FiVE-U↑",
    • "FiVE-∩↑",
    • "FiVE-Acc↑"
    ],
  • "data": [
    • [
      • "DMT*",
      • "Diffusion-based",
      • 85.95,
      • 51.64,
      • 51.64,
      • 51.64,
      • 51.64,
      • 21.44,
      • 21.44,
      • 5.24,
      • 82.3,
      • 25.98,
      • 48.42,
      • 48.42,
      • 48.42,
      • 48.42,
      • 48.42
      ],
    • [
      • "Wan-Edit",
      • "Flow-based",
      • 12.53,
      • 82.55,
      • 82.55,
      • 82.55,
      • 82.55,
      • 21.23,
      • 21.23,
      • 6.54,
      • 89.43,
      • 3.07,
      • 46.97,
      • 46.97,
      • 46.97,
      • 46.97,
      • 46.97
      ],
    • [
      • "Pyramid-Edit",
      • "Flow-based",
      • 28.65,
      • 71.72,
      • 71.72,
      • 71.72,
      • 71.72,
      • 20.2,
      • 20.2,
      • 5.48,
      • 80.59,
      • 1.44,
      • 43.84,
      • 43.84,
      • 43.84,
      • 43.84,
      • 43.84
      ],
    • [
      • "AnyV2V",
      • "Diffusion-based",
      • 71.36,
      • 50.77,
      • 50.77,
      • 50.77,
      • 50.77,
      • 19.72,
      • 19.72,
      • 5.04,
      • 60.36,
      • 6.11,
      • 38.02,
      • 38.02,
      • 38.02,
      • 38.02,
      • 38.02
      ],
    • [
      • "VideoGrain",
      • "Diffusion-based",
      • 12.4,
      • 79.13,
      • 79.13,
      • 79.13,
      • 79.13,
      • 20.31,
      • 20.31,
      • 4.08,
      • 88.57,
      • 27.12,
      • 37.23,
      • 37.23,
      • 37.23,
      • 37.23,
      • 37.23
      ],
    • [
      • "TokenFlow",
      • "Diffusion-based",
      • 35.62,
      • 72.51,
      • 72.51,
      • 72.51,
      • 72.51,
      • 21.15,
      • 21.15,
      • 4.01,
      • 89,
      • 8.04,
      • 27.43,
      • 27.43,
      • 27.43,
      • 27.43,
      • 27.43
      ],
    • [
      • "VidToMe",
      • "Diffusion-based",
      • 22.37,
      • 70.69,
      • 70.69,
      • 70.69,
      • 70.69,
      • 21.05,
      • 21.05,
      • 4.68,
      • 90.06,
      • 3.25,
      • 26.77,
      • 26.77,
      • 26.77,
      • 26.77,
      • 26.77
      ],
    • [
      • "Source videos",
      • "Baseline",
      • 0,
      • 100,
      • 100,
      • 100,
      • 100,
      • 19.87,
      • 19.87,
      • 6.33,
      • 93.76,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0
      ]
    ],
  • "metadata": null
}