

2023-12-27 04:47 0 微浪网
导语: DistilBert for Dense Passag...,


DistilBert for Dense Passage Retrieval trained with Balanced Topic Aware Sampling (TAS-B)

We provide a retrieval trained DistilBert-based model (we call the dual-encoder then dot-product scoring architecture BERT_Dot) trained with Balanced Topic Aware Sampling on MSMARCO-Passage.
This instance was trained with a batch size of 256 and can be used to re-rank a candidate set or directly for a vector index based dense retrieval. The architecture is a 6-layer DistilBERT, without architecture additions or modifications (we only change the weights during training) – to receive a query/passage representation we pool the CLS vector. We use the same BERT layers for both query and passage encoding (yields better results, and lowers memory requirements).
If you want to know more about our efficient (can be done on a single consumer GPU in 48 hours) batch composition procedure and dual supervision for dense retrieval training, check out our paper: https://arxiv.org/abs/2104.06967
For more information and a minimal usage example please visit: https://github.com/sebastian-hofstaetter/tas-balanced-dense-retrieval

Effectiveness on MSMARCO Passage & TREC-DL’19

We trained our model on the MSMARCO standard (“small”-400K query) training triples re-sampled with our TAS-B method. As teacher models we used the BERT_CAT pairwise scores as well as the ColBERT model for in-batch-negative signals published here: https://github.com/sebastian-hofstaetter/neural-ranking-kd


MRR@10 NDCG@10 Recall@1K
BM25 .194 .241 .857
TAS-B BERT_Dot (Retrieval) .347 .410 .978

1、本网页并非 sebastian-hofstaetter/distilbert-dot-tas_b-b256-msmarco 官网网址页面,此页面内容编录于互联网,只作展示之用;2、如果有与 sebastian-hofstaetter/distilbert-dot-tas_b-b256-msmarco 相关业务事宜,请访问其网站并获取联系方式;3、本站与 sebastian-hofstaetter/distilbert-dot-tas_b-b256-msmarco 无任何关系,对于 sebastian-hofstaetter/distilbert-dot-tas_b-b256-msmarco 网站中的信息,请用户谨慎辨识其真伪。4、本站收录 sebastian-hofstaetter/distilbert-dot-tas_b-b256-msmarco 时,此站内容访问正常,如遇跳转非法网站,有可能此网站被非法入侵或者已更换新网址,导致旧网址被非法使用,5、如果你是网站站长或者负责人,不想被收录请邮件删除:i-hu#Foxmail.com (#换@)




