SciNCL
SciNCL is a pre-trained BERT language model to generate document-level embeddings of research papers.
It uses the citation graph neighborhood to generate samples for contrastive learning.
Prior to the contrastive training, the model is initialized with weights from scibert-scivocab-uncased.
The underlying citation embeddings are trained on the S2ORC citation graph.
Paper: Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings (EMNLP 2022 paper).
Code: https://github.com/malteos/scincl
PubMedNCL: Working with biomedical papers? Try PubMedNCL.
How to use the pretrained model
from transformers import AutoTokenizer, AutoModel<br /> # load model and tokenizer<br /> tokenizer = AutoTokenizer.from_pretrained('malteos/scincl')<br /> model = AutoModel.from_pretrained('malteos/scincl')<br /> papers = [{'title': 'BERT', 'abstract': 'We introduce a new language representation model called BERT'},<br /> {'title': 'Attention is all you need', 'abstract': ' The dominant sequence transduction models are based on complex recurrent or convolutional neural networks'}]<br /> # concatenate title and abstract with [SEP] token<br /> title_abs = [d['title'] + tokenizer.sep_token + (d.get('abstract') or '') for d in papers]<br /> # preprocess the input<br /> inputs = tokenizer(title_abs, padding=True, truncation=True, return_tensors="pt", max_length=512)<br /> # inference<br /> result = model(**inputs)<br /> # take the first token ([CLS] token) in the batch as the embedding<br /> embeddings = result.last_hidden_state[:, 0, :]<br />
Triplet Mining Parameters
Setting | Value |
---|---|
seed | 4 |
triples_per_query | 5 |
easy_positives_count | 5 |
easy_positives_strategy | 5 |
easy_positives_k | 20-25 |
easy_negatives_count | 3 |
easy_negatives_strategy | random_without_knn |
hard_negatives_count | 2 |
hard_negatives_strategy | knn |
hard_negatives_k | 3998-4000 |
收录说明:
1、本网页并非 malteos/scincl 官网网址页面,此页面内容编录于互联网,只作展示之用;2、如果有与 malteos/scincl 相关业务事宜,请访问其网站并获取联系方式;3、本站与 malteos/scincl 无任何关系,对于 malteos/scincl 网站中的信息,请用户谨慎辨识其真伪。4、本站收录 malteos/scincl 时,此站内容访问正常,如遇跳转非法网站,有可能此网站被非法入侵或者已更换新网址,导致旧网址被非法使用,5、如果你是网站站长或者负责人,不想被收录请邮件删除:i-hu#Foxmail.com (#换@)
前往AI网址导航
2、本站所有文章、图片、资源等如果未标明原创,均为收集自互联网公开资源;分享的图片、资源、视频等,出镜模特均为成年女性正常写真内容,版权归原作者所有,仅作为个人学习、研究以及欣赏!如有涉及下载请24小时内删除;
3、如果您发现本站上有侵犯您的权益的作品,请与我们取得联系,我们会及时修改、删除并致以最深的歉意。邮箱: i-hu#(#换@)foxmail.com