GitHub - YapengTian/AVVP-ECCV20: Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing, ECCV, 2020. (Spotlight)

Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing (To appear in ECCV 2020) [Paper]

Yapeng Tian, Dingzeyu Li, and Chenliang Xu

Audio-visual video parsing

We define the Audio-Visual Video Parsing as a task to group video segments and parse a video into different temporal audio, visual, and audio-visual events associated with semantic labels.

LLP Dataset & Features

# LLP dataset annotations
cd data
AVVP_dataset_full.csv: full dataset with weak annotaions
AVVP_train.csv: training set with weak annotaions
AVVP_val_pd.csv: val set with weak annotaions
AVVP_test_pd.csv: test set with weak annotaions
AVVP_eval_audio.csv: audio event dense annotations for videos in val and test sets
AVVP_eval_visual.csv: visual event dense annotations for videos in val and test sets

Note that audio-visual events can be derived from audio and visual events.

We use VGGish, ResNet152, and ResNet (2+1)D to extract audio, 2D frame-level, and 3D snippet-level features, respectively. The audio and visual features of videos in the LLP dataset can be download from this Google Drive link. The features are in the "feats" folder.

Requirements

pip install -r requirements

Weakly supervised audio-visual video parsing

Testing:

python main_avvp.py --mode test --audio_dir /xx/feats/vggish/ --video_dir /xx/feats/res152/ --st_dir /xx/feats/r2plus1d_18/

Training:

python main_avvp.py --mode train --audio_dir /xx/feats/vggish/ --video_dir /xx/feats/res152/ --st_dir /xx/feats/r2plus1d_18/

Download videos (optional)

Download raw videos in the LLP dataset. The downloaded videos will be in the data/LLP_dataset/video folder. Pandas and FFmpeg libraries are required.

python ./scripts/download_dataset.py

Data pre-processing & Feature extraction (optional)

Extract audio waveforms from videos. The extracted audios will be in the data/LLP_dataset/audio folder. moviepy library is used to read videos and extract audios.

python ./scripts/extract_audio.py

Extract video frames from videos. The extracted frames will be in the data/LLP_dataset/frame folder.

python ./scripts/extract_frames.py

Audio feature extractor can be found from here.

2D visual feature. pretrainedmodels library is required.

python ./scripts/extract_rgb_feat.py

3D visual feature.

python ./scripts/extract_3D_feat.py

Citation

If you find this work useful, please consider citing it.

@InProceedings{tian2020avvp,
  author={Tian, Yapeng and Li, Dingzeyu and Xu, Chenliang},
  title={Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing},
  booktitle = {ECCV},
  year = {2020}
}

License

This project is released under the GNU General Public License v3.0.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
Figs		Figs
data		data
models		models
nets		nets
scripts		scripts
utils		utils
README.md		README.md
dataloader.py		dataloader.py
main_avvp.py		main_avvp.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio-visual video parsing

LLP Dataset & Features

Requirements

Weakly supervised audio-visual video parsing

Download videos (optional)

Data pre-processing & Feature extraction (optional)

Citation

License

About

Releases

Packages

Contributors 2

Languages

YapengTian/AVVP-ECCV20

Folders and files

Latest commit

History

Repository files navigation

Audio-visual video parsing

LLP Dataset & Features

Requirements

Weakly supervised audio-visual video parsing

Download videos (optional)

Data pre-processing & Feature extraction (optional)

Citation

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages