Acceptable Audio Latency and Lip Sync Error

What is the “Best” Output Audio Latency?

Output audio latency measurements for consumer electronics, besides music creation and live performance technology, are relevant for audio/video synchronization (lip sync) in almost all cases. Even audio latency issues with Bluetooth headsets are most prominent when paired with a video output on a smartphone, tablet, or PC.

For this reason, the “best” output audio latency for any given device is the audio latency that will most closely match the video latency of the display it will be used alongside. Said differently, a system’s total lip sync error should be as close to zero as possible.

Lip sync issues are more perceptible when audio leads video than when video leads audio.1 It is likely for this reason that an HDMI device with audio latency that is less than video latency by more than 20ms is forbidden by the HDMI Specification, since the required correction of this audio/video latency difference would “mean that an upstream device would have to delay the video (compared to the audio) which is cumbersome or impossible”.2 For these reasons, a lower audio latency is worse than a higher audio latency when video latency is high.

What is an Acceptable Lip Sync Error?


Humans can react to auditory stimulus faster than visual stimulus,3 so audio latency should be as close to zero as possible without causing a notable amount of lip sync error. It is assumed that a display used for esports will have a low video latency, much less than 20ms, so simply using an audio device that has the lowest latency possible is preferred. This way, when paired with a low-latency display, lip sync error will not be a significant issue because both devices will have low latency.

Author’s note: I have measured latency of the BenQ ZOWIE gaming monitor to find a 1ms video latency and a 1ms audio latency. With this as a key reference point, I believe that it is reasonable that esports setups should have no more than 4ms of total audio or video latency.

General Use

While a significant amount of research has been performed to assess what lip sync error is acceptable to humans, it cannot be expected that all media and game content will have perfectly synchronized audio and video. For example, a television show or movie may have some scenes that have poor audio/video synchronization due to production quality limitations. Or a video game’s sound effect playback might be slightly delayed due to processing and lead-in time. When these existing lip sync errors are added to a lip sync error in the user’s setup, the combined error may exceed a threshold that can be noticeable to the user. For these reasons, a lip sync error as close to zero as possible is recommended.

Common Scenarios

The following list describes common consumer electronics setups and the challenges of audio/video synchronization that arise.

  1. A device where both audio and video output are handled by this same device, such as a TV.
    • Audio/video synchronization may be affected by different operation modes, such as a TV’s game mode. For example, the Sony X800H TV, which has approximately 4ms of video latency in game mode, has 96ms of audio latency in game mode when using speaker output.4 This results in a lip sync error of around -92ms and demonstrates that a device which handles both audio and video output may still have problematic audio/video synchronization.
  2. Different devices for audio and video output where these devices cannot be automatically synchronized.
    • When video latency is low: audio latency should be 20 ms or less to match the low video latency.
      • Examples: computer monitor, computer projector, smartphone, tablet, or TV in game mode
    • When video latency may be high: it is impossible to give a general recommendation for the best audio latency in this scenario. If using a receiver, an audio delay may be added through device settings to attempt to match audio latency to video latency and reduce lip sync error.
      • Examples: TV or home theatre projector that is not in game/computer mode
  3. Different devices for audio and video output where these devices can be automatically synchronized, for example through HDMI “Auto Lip Sync” or ARC/eARC/SPDIF output.
    • HDMI “Auto Lip Sync” should reduce lip sync error, but can not be expected to always reduce this error. For example, a low-latency TV operating in game mode paired with a high minimum-latency receiver will still be limited by the minimum latency of the receiver, which will result in a high lip sync error. An example of such a receiver is the Yamaha RX-V4A when given a stereo input.5
    • ARC/eARC/SPDIF output may or may not be delayed by a TV to match its video latency or may be delayed because of the TV’s audio processing time.6 This may have a substantial impact on lip sync error. The audio output device receiving the ARC/eARC/SPDIF signal may introduce additional latency that could have a problematic impact on lip sync error.

Last updated on May 3rd, 2022.

  1. ITU-R BT.1359-1, Figure 2
  2. HDMI Specification 2.0, Section “EDID Latency Info – Devices without HDMI output”
  3. Comparison between Auditory and Visual Simple Reaction Times, Section 3 “Results”
  4. How to Measure HDMI Audio Latency Using a Wii U, “HDMI Audio Latency List”
  5. How to Measure HDMI Audio Latency Using a Wii U, “HDMI Audio Latency List”
  6. HDMI Specification 2.0, Section 10.7.3 “Latency of TV’s Audio Outputs”