7

Recently I did a re-encode using ffmpeg and encountered an inaudible but waveform visible audio desynchronization in the process.

Source information of MKV file and audio stream:

Mediainfo shows a 6ms audio delay relative to the video:

Audio
ID                             : 2
Format                         : E-AC-3
Format/Info                    : Enhanced Audio Coding 3
Format settings, Endianness    : Big
Codec ID                       : A_EAC3
Duration                       : 1 h 0 min
Bit rate mode                  : Constant
Bit rate                       : 640 kb/s
Channel(s)                     : 6 channels
Channel positions              : Front: L C R, Side: L R, LFE
Sampling rate                  : 48.0 kHz
Frame rate                     : 187.500 FPS (256 SPF)
Compression mode               : Lossy
Delay relative to video        : 6 ms
Stream size                    : 276 MiB (7%)
Language                       : English
Service kind                   : Complete Main
Default                        : Yes
Forced                         : No

This can also be seen by looking at the pts and at the stream start time:

ffprobe -of compact -select_streams a:0 -show_frames -show_entries frame=pkt_pts,pkt_pts_time,pkt_dts,pkt_dts_time,best_effort_timestamp_time,pkt_duration,nb_samples %sourcemkv%

frame|pkt_pts=6|pkt_pts_time=0.006000|pkt_dts=6|pkt_dts_time=0.006000|best_effort_timestamp_time=0.006000|pkt_duration=32|nb_samples=1536

frame|pkt_pts=38|pkt_pts_time=0.038000|pkt_dts=38|pkt_dts_time=0.038000|best_effort_timestamp_time=0.038000|pkt_duration=32|nb_samples=1536

frame|pkt_pts=70|pkt_pts_time=0.070000|pkt_dts=70|pkt_dts_time=0.070000|best_effort_timestamp_time=0.070000|pkt_duration=32|nb_samples=1536

Also, the delay is detected via stream start_time inspection:

ffprobe -show_entries stream=codec_type,duration,start_time -of compact %sourcemkv%

stream|codec_type=video|start_time=0.000000|duration=N/A
stream|codec_type=audio|start_time=0.006000|duration=N/A
stream|codec_type=subtitle|start_time=0.000000|duration=3617.638000

Re-encoding process

I opted for transcoding EAC3 to AAC and downmixing to stereo, using only these flags (no timestamp manipulation): -acodec aac -b:a 160k -ac 2

One way I obtained the delayed result is this command to re-encode: ffmpeg -i %sourcemkv% -vcodec libx264 -profile:v high -crf 23 -tune film -level 4.1 -acodec aac -b:a 160k -ac 2 re-encoded-with-aac.mkv

Detecting the delay

I detected a delay by inspecting the waveform corresponding to a video frame. The delay itself is not quite audible, as I found out the encoded version to be 15.333 ms late compared to the original, but according to inspection via the following AviSynth+ script in AvsPmod it is present:

FFmpegSource2("re-encoded-with-aac.mkv", vtrack=-1, atrack=-1, cache=True, cachefile="", fpsnum=-1, fpsden=1, threads=-1, timecodes="", seekmode=1, overwrite=False, width=-1, height=-1, resizer="BICUBIC", colorspace="", rffmode=0, adjustdelay=-1, utf8=False, varprefix="")
ConvertToMono()
# Test if DelayAudio will sync it!
#AmplifydB(50.0)
#DelayAudio(-0.015333)
waveform( window=1, height=0.6)

I applied the same script to the original and compared the resulting waveform graphs for certain video frames: source mkv graph for frame 139 vs AAC re-encoded video's waveform view of frame 139 As can be seen, unless a DelayAudio(-0.015333) is applied to the re-encoded AAC audio stream - they are not synchronized. Thankfully, this delay is constant throughout the video!

Other information of re-encode

The remuxed audio stream has start_time=0.000000 and first audio frame's pts is 0.000000. It is a bit strange that although the reported known delay of the original audio relative to the video is of 6ms, the resulting delay between original and re-encoded video with aac stream is of 15.333ms. I don't know from where this may come; maybe it has to do with the length of the output's audio frames, each having a duration of 21.333ms?!

Questions

  • Is this a bug with ffmpeg when transcoding from EAC3 to AAC?
  • Did I misuse ffmpeg or expect too much from it to correctly re-encode this?
  • Is there a need to specify some extra ffmpeg parameters in order to have things synchronized?
  • What would be the best solution to handle situations like this other than visual inspection of frame's waveform for determining the rather arbitrary delays which may occur?
  • Does the 6ms delay of the original audio stream cause all these issues with the re-encode?

Other efforts

I also tried the following experiments:

Experiment 1

  • Open %sourcemkv% in gMKVExtractGUI. The 6ms of delay in the EAC3 audio stream is detected. Extract this audio stream.
  • Remux the re-encoded video stream in mkvtoolnix-gui with the original EAC3 audio, and specify the 6ms delay explicitely in the Delay field.

Result: AUDIO IS SYNCHRONIZED with the original, judging by the inspection method described above.

Experiment 2

  • convert the original audio to 2 channel AAC via FFmpeg
  • Remux the resulting AAC audio stream with the video in mkvtoolnix-gui, also specifying a 6ms delay for it - just as the original has.

Result: Audio is not synchronized between the original and the result

2 Answers 2

3

As counterintuitive as it sounds: This is not an error

The point is, that many audio codecs, can not start at PTS zero - and AAC is one of them. There is a "settling phase" at the beginning (I don't know the exact english word, in my native German it is "Einschwingphase"), where the audio output can not yet carry payload (silence is presented instead).

What all correct encoding tools do, is to start the audio track just so much earlier than the corresponding video track, that the PTS of the first image coincides with the PTS of the first payload carrying audio packet. This implies, that there are earlier audio packets rendered as silence.

Things start to become even messier, when you encode from one codec with a settling phase of X ms into another codec with a settling phase of Y ms - in this case, there might even be a positive delay, when the old codec needed longer to sttle than the new codec does.

This is BTW the reason, why exact edits are always done in a non-settling codec (typically a member of the PCM family).

0

Based on Eugen's answer, I searched and found out more information on this VideoHelp Forum thread about codec dependant silence i.e. delay generated at the beggining of encodings. I then stumbled upon Apple's QAAC encoder and one of its options which allows trimming of the encoder specific delay, also suggested on a VideoHelp thread on delays:

--no-delay             Compensate encoder delay by prepending 960 samples
                       of scilence, then trimming 3 AAC frames from
                       the beginning (and also tweak iTunSMPB).
                       This option is mainly intended for resolving
                       A/V sync issue of video.

By using the above option I managed to achieve my goal of keeping audio synchronized with the original, which I will describe below. I decided to post this as an aswer, as the main question was if there is a way to downmix to AAC while keeping audio in perfect sync with the source and I have found one way to do this.

Steps:

AAC downmix extraction:

ffmpeg -i %sourcemkv% -vn -ac 2 -acodec pcm_f32le -f wav - | qaac -v 160 --delay 0.006 --no-delay - -o outw6ms.m4a

The command does the following:

  • reads the original mkv's EAC3 audio stream and downmixes it to stereo by using -ac 2
  • it decodes it as floating point PCM (pcm_f32le) - this is needed so that the downmix retains the original stream's loudness
  • pipes the output to the qaac encoder
  • the qaac encoder has the --no-delay flag which will cut the encoder-specific delay
  • I also added the -delay 0.006 option to bake in 6ms of silence because the original had this amount of delay relative to video (I could have ommited this and istead remux the result in mkvmerge and specify there explicitely that the audio stream needs a 6ms delay - I tested this and it would have also worked)

Then, I remuxed the resulting audio with the transcoded video directly in ffmpeg:

ffmpeg -i encvideo.mkv -i outw6ms.m4a -codec copy -map 0:v -map 1:a result.mkv

I tried the result in AvsPMod with waveform() and the audio is in perfect sync with the source - from beggining to end.

So finally I have found one way of keeping the original audio-relative-to-video sync. I will wait some more until accepting this answer, because I wonder if any audio encoders available in ffmpeg allow this sort of behavior that qaac has, this --no-delay option. This seems to be very useful and made especially for this sort of things.

1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .