roberta-large-mnli

2023-12-28 04:18 0 微浪网 0

AIGC网址导航

导语： roberta-large-mnli Tab...,

Task	en	fr	es	de	el	bg	ru	tr	ar	vi	th	zh	hi	sw	ur
91.3	82.91	84.27	81.24	81.74	83.13	78.28	76.79	76.64	74.17	74.05	77.5	70.9	66.65	66.81

roberta-large-mnli

Table of Contents

Model Details

How To Get Started With the Model

Uses

Risks, Limitations and Biases

Training

Evaluation

Environmental Impact

Technical Specifications

Citation Information

Model Card Authors

Model Details

Model Description: roberta-large-mnli is the RoBERTa large model fine-tuned on the Multi-Genre Natural Language Inference (MNLI) corpus. The model is a pretrained model on English language text using a masked language modeling (MLM) objective.

Developed by: See GitHub Repo for model developers

Model Type: Transformer-based language model

Language(s): English

License: MIT

Parent Model: This model is a fine-tuned version of the RoBERTa large model. Users should see the RoBERTa large model card for relevant information.

Resources for more information:

Research Paper

GitHub Repo

How to Get Started with the Model

Use the code below to get started with the model. The model can be loaded with the zero-shot-classification pipeline like so:
from transformers import pipeline classifier = pipeline('zero-shot-classification', model='roberta-large-mnli') 

You can then use this pipeline to classify sequences into any of the class names you specify. For example:
sequence_to_classify = "one day I will see the world" candidate_labels = ['travel', 'cooking', 'dancing'] classifier(sequence_to_classify, candidate_labels) 

Uses

Direct Use

This fine-tuned model can be used for zero-shot classification tasks, including zero-shot sentence-pair classification (see the GitHub repo for examples) and zero-shot sequence classification.

Misuse and Out-of-scope Use

The model should not be used to intentionally create hostile or alienating environments for people. In addition, the model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.

Risks, Limitations and Biases

CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propogate historical and current stereotypes.
Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). The RoBERTa large model card notes that: “The training data used for this model contains a lot of unfiltered content from the internet, which is far from neutral.”
Predictions generated by the model can include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. For example:
sequence_to_classify = "The CEO had a strong handshake." candidate_labels = ['male', 'female'] hypothesis_template = "This text speaks about a {} profession." classifier(sequence_to_classify, candidate_labels, hypothesis_template=hypothesis_template) 

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Training

Training Data

This model was fine-tuned on the Multi-Genre Natural Language Inference (MNLI) corpus. Also see the MNLI data card for more information.
As described in the RoBERTa large model card:

The RoBERTa model was pretrained on the reunion of five datasets:

BookCorpus, a dataset consisting of 11,038 unpublished books;

English Wikipedia (excluding lists, tables and headers) ;

CC-News, a dataset containing 63 millions English news articles crawled between September 2016 and February 2019.

OpenWebText, an opensource recreation of the WebText dataset used to train GPT-2,

Stories, a dataset containing a subset of CommonCrawl data filtered to match the story-like style of Winograd schemas.

Together theses datasets weight 160GB of text.

Also see the bookcorpus data card and the wikipedia data card for additional information.

Training Procedure

Preprocessing

As described in the RoBERTa large model card:

The texts are tokenized using a byte version of Byte-Pair Encoding (BPE) and a vocabulary size of 50,000. The inputs of
the model take pieces of 512 contiguous token that may span over documents. The beginning of a new document is marked
with <s> and the end of one by </s>
The details of the masking procedure for each sentence are the following:

15% of the tokens are masked.

In 80% of the cases, the masked tokens are replaced by <mask>.

In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace.

In the 10% remaining cases, the masked tokens are left as is.

Contrary to BERT, the masking is done dynamically during pretraining (e.g., it changes at each epoch and is not fixed).

Pretraining

Also as described in the RoBERTa large model card:

The model was trained on 1024 V100 GPUs for 500K steps with a batch size of 8K and a sequence length of 512. The
optimizer used is Adam with a learning rate of 4e-4, β1=0.9\beta_{1} = 0.9β1=0.9, β2=0.98\beta_{2} = 0.98β2=0.98 and
ϵ=1e−6\epsilon = 1e-6ϵ=1e−6, a weight decay of 0.01, learning rate warmup for 30,000 steps and linear decay of the learning
rate after.

Evaluation

The following evaluation information is extracted from the associated GitHub repo for RoBERTa.

Testing Data, Factors and Metrics

The model developers report that the model was evaluated on the following tasks and datasets using the listed metrics:

Dataset: Part of GLUE (Wang et al., 2019), the General Language Understanding Evaluation benchmark, a collection of 9 datasets for evaluating natural language understanding systems. Specifically, the model was evaluated on the Multi-Genre Natural Language Inference (MNLI) corpus. See the GLUE data card or Wang et al. (2019) for further information.

Tasks: NLI. Wang et al. (2019) describe the inference task for MNLI as:

The Multi-Genre Natural Language Inference Corpus (Williams et al., 2018) is a crowd-sourced collection of sentence pairs with textual entailment annotations. Given a premise sentence and a hypothesis sentence, the task is to predict whether the premise entails the hypothesis (entailment), contradicts the hypothesis (contradiction), or neither (neutral). The premise sentences are gathered from ten different sources, including transcribed speech, fiction, and government reports. We use the standard test set, for which we obtained private labels from the authors, and evaluate on both the matched (in-domain) and mismatched (cross-domain) sections. We also use and recommend the SNLI corpus (Bowman et al., 2015) as 550k examples of auxiliary training data.

Metrics: Accuracy

Dataset: XNLI (Conneau et al., 2018), the extension of the Multi-Genre Natural Language Inference (MNLI) corpus to 15 languages: English, French, Spanish, German, Greek, Bulgarian, Russian, Turkish, Arabic, Vietnamese, Thai, Chinese, Hindi, Swahili and Urdu. See the XNLI data card or Conneau et al. (2018) for further information.

Tasks: Translate-test (e.g., the model is used to translate input sentences in other languages to the training language)

Metrics: Accuracy

Results

GLUE test results (dev set, single model, single-task fine-tuning): 90.2 on MNLI
XNLI test results:

Task en fr es de el bg ru tr ar vi th zh hi sw ur

91.3 82.91 84.27 81.24 81.74 83.13 78.28 76.79 76.64 74.17 74.05 77.5 70.9 66.65 66.81

收录说明：
1、本网页并非 roberta-large-mnli 官网网址页面，此页面内容编录于互联网，只作展示之用；2、如果有与 roberta-large-mnli 相关业务事宜，请访问其网站并获取联系方式；3、本站与 roberta-large-mnli 无任何关系，对于 roberta-large-mnli 网站中的信息，请用户谨慎辨识其真伪。4、本站收录 roberta-large-mnli 时，此站内容访问正常，如遇跳转非法网站，有可能此网站被非法入侵或者已更换新网址，导致旧网址被非法使用,5、如果你是网站站长或者负责人，不想被收录请邮件删除：i-hu#Foxmail.com （#换@）

前往AI网址导航

1、本文来自 AIGC网址导航 投稿的内容 roberta-large-mnli ，所有言论和图片纯属作者个人意见，版权归原作者所有；不代表本站立场；
2、本站所有文章、图片、资源等如果未标明原创，均为收集自互联网公开资源；分享的图片、资源、视频等，出镜模特均为成年女性正常写真内容，版权归原作者所有，仅作为个人学习、研究以及欣赏！如有涉及下载请24小时内删除；
3、如果您发现本站上有侵犯您的权益的作品，请与我们取得联系，我们会及时修改、删除并致以最深的歉意。邮箱： i-hu#（#换@）foxmail.com

roberta-large-mnli

roberta-large-mnli

Table of Contents

Model Details

How to Get Started with the Model

Uses

Direct Use

Misuse and Out-of-scope Use

Risks, Limitations and Biases

Training

Training Data

Training Procedure

Preprocessing

Pretraining

Evaluation

Testing Data, Factors and Metrics

Results

猜你喜欢