---
license: apache-2.0
datasets:
- lighthouse-emnlp2024/Clotho-Moment
language:
- en
---
# Audio Moment-DETR
This is a Audio Moment DETR (AM-DETR) proposed in [Language-based Audio Moment Retrieval](https://arxiv.org/abs/2409.15672).
Given the text query, AM-DETR searches for specific audio segments relevant to the query from the long audio recording.

## Install
Installing [Lighthouse](https://github.com/line/lighthouse) is required.
Check the dependencies and your envirionment.
```
apt install ffmpeg
```
```bash
pip install 'git+https://github.com/line/lighthouse.git'
```
```bash
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 torchtext==0.16.0 transformers==4.51.3 --index-url https://download.pytorch.org/whl/cu118
```


## Sample script
```python
import io
import requests

import torch
from transformers import AutoModel, AutoConfig


repo_id = "lighthouse-emnlp2024/AM-DETR"

config = AutoConfig.from_pretrained(repo_id, trust_remote_code=True)
config.device="cpu"
model = AutoModel.from_pretrained(repo_id, config=config, trust_remote_code=True)

audio_bytes = io.BytesIO(requests.get('https://github.com/line/lighthouse/raw/refs/heads/main/api_example/1a-ODBWMUAE.wav').content)
query = "Heavy rain falls" 

feats = model.encode_audio(audio_path=audio_bytes)
prediction = model.predict(query, feats)
for start, end, score in prediction["pred_relevant_windows"]:
    print(f"Moment, Score: {start:05.2f} - {end:05.2f}, {score:.2f}")

```


## Citation
```bibtex
@inproceedings{munakata2025language,
  title={Language-based Audio Moment Retrieval},
  author={Munakata, Hokuto and Nishimura, Taichi and Nakada, Shota and Komatsu, Tatsuya},
  booktitle={ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2025},
  organization={IEEE}
}
```