Acceptable Audio Latency and Lip Sync Error

What is the “Best” Output Audio Latency?

Output audio latency measurements for consumer electronics, besides music creation and live performance technology, are relevant for audio/video synchronization (lip sync) in almost all cases. Even audio latency issues with Bluetooth headsets are most prominent when paired with a video output on a smartphone, tablet, or PC.

For this reason, the “best” output audio latency for any given device is the audio latency that will most closely match the video latency of the display it will be used alongside. Said differently, a system’s total lip sync error should be as close to zero as possible.

Audio latency issues are more preceptable when audio leads video than when video leads audio.1 It is likely for this reason that an HDMI device with audio latency that is less than video latency by more than 20ms is forbidden by the HDMI Specification, since the required correction of this audio/video latency difference would “mean that an upstream device would have to delay the video (compared to the audio) which is cumbersome or impossible”.2 For these reasons, a lower audio latency is worse than a higher audio latency when video latency is high.

Common Scenarios

  1. A device where both audio and video output are handled by this same device, such as a TV.
    • Audio/video synchronization may be affected by different operation modes, such as a TV’s game mode. For example, the Sony X800H TV, which has approximately 4ms of video latency in game mode, has approximately 96ms of audio latency in game mode when using speaker output.3 This results in a lip sync error of around -92ms and demonstrates that a device which handles both audio and video output may still have problematic audio/video synchronization.
  2. Different devices for audio and video output where these devices cannot be automatically synchronized.
    • When video output is from a computer monitor, computer projector, smartphone, tablet, or TV in game mode: it is expected that video latency is low (usually less than 20ms), so audio latency should be as close to zero as possible for the best experience.
    • When video output is from a TV or home theatre projector that is not in game/computer mode: it is impossible to give a general recommendation for the best audio latency in this scenario. If using a receiver, an audio delay may be added through device settings to attempt to match audio latency to video latency and reduce lip sync error.
  3. Different devices for audio and video output where these devices can be automatically synchronized, for example through HDMI “Auto Lip Sync” or ARC/eARC/SPDIF output.
    • HDMI “Auto Lip Sync” should reduce lip sync error, but can not be expected to always reduce this error. For example, a low-latency TV operating in game mode paired with a high minimum-latency receiver will still be limited by the minimum latency of the receiver which may result in a high lip sync error. An example of such a receiver is the Yamaha RX-V4A when given a stereo input.4
    • ARC/eARC/SPDIF output may or may not be delayed by a TV to match its video latency or may be delayed because of the TV’s audio processing time.5 This may have a substantial impact on lip sync error. The audio output device receiving the ARC/eARC/SPDIF signal may introduce additional latency that could have a problematic impact on lip sync error.

What is an Acceptable Lip Sync Error?


Humans can react to auditory stimulus faster than visual stimulus,6 so audio latency should be as close to zero as possible without causing a notable amount of lip sync error. It is assumed that a display used for esports will have a low video latency, much less than 20ms, so simply using an audio device that has the lowest latency possible is preferred. This way, when paired with a low-latency display, lip sync error will not be a significant issue because both devices will have low latency.

Author’s note: I have measured latency of the BenQ ZOWIE gaming monitor to find a 1ms video latency and a 1ms audio latency. With this as a key reference point, I believe that it is reasonable that esports setups should have no more than 4ms of total audio or video latency.

General Use

While a significant amount of research has been performed to assess what lip sync error is acceptable to humans, it cannot be expected that all media and game content will be perfectly synchronized. For example, a television show or movie may have some scenes that have poor audio/video synchronization due to production quality limitations. Or a video game’s sound effect playback might be slightly delayed due to processing and lead-in time. When these existing lip sync errors are added to an lip sync error in the user’s setup, the combined error may exceed a threshold that can be noticeable to the user. For these reasons, a lip sync error as close to zero as possible is recommended.

Author’s note: With the goal of having a near-zero lip sync error and the introduction of mandatory auto lip sync in HDMI 2.1 I do not currently have enough recent data to give a formal recommendation on an acceptable threshold. Please let me know if you have some thoughts on this.

Last updated on July 12th, 2021.

  1. ITU-R BT.1359-1, Figure 2
  2. HDMI Specification 2.0, Section “EDID Latency Info – Devices without HDMI output”
  3. How to Measure HDMI Audio Latency Using a Wii U, “HDMI Audio Latency List”
  4. How to Measure HDMI Audio Latency Using a Wii U, “HDMI Audio Latency List”
  5. HDMI Specification 2.0, Section 10.7.3 “Latency of TV’s Audio Outputs”
  6. Comparison between Auditory and Visual Simple Reaction Times, Section 3 “Results”