Update README.md
#1
by
TimeRobber - opened
README.md
CHANGED
|
@@ -11,7 +11,9 @@ license: apache-2.0
|
|
| 11 |
|
| 12 |
## Version 1.1
|
| 13 |
|
| 14 |
-
[T5 Version 1.1](https://github.com/google-research/text-to-text-transfer-transformer/blob/master/released_checkpoints.md#t511) includes the following improvements compared to the original T5 model
|
|
|
|
|
|
|
| 15 |
|
| 16 |
- Dropout was turned off in pre-training (quality win). Dropout should be re-enabled during fine-tuning.
|
| 17 |
|
|
|
|
| 11 |
|
| 12 |
## Version 1.1
|
| 13 |
|
| 14 |
+
[T5 Version 1.1](https://github.com/google-research/text-to-text-transfer-transformer/blob/master/released_checkpoints.md#t511) includes the following improvements compared to the original T5 model
|
| 15 |
+
|
| 16 |
+
- GEGLU activation in feed-forward hidden layer, rather than ReLU - see [here](https://arxiv.org/abs/2002.05202).
|
| 17 |
|
| 18 |
- Dropout was turned off in pre-training (quality win). Dropout should be re-enabled during fine-tuning.
|
| 19 |
|