Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
appvoid
's Collections
palmer
cool datasets
arco releases
cool spaces
cool datasets
updated
Apr 1
some interesting datasets to use for language modeling
Upvote
-
54rt1n/wikipedia-summary-dataset
Viewer
•
Updated
Sep 10, 2024
•
5.32M
•
24
•
2
appvoid/raw-corpus
Viewer
•
Updated
Feb 23, 2025
•
1.6M
•
24
pszemraj/simple_wikipedia
Viewer
•
Updated
Dec 29, 2025
•
238k
•
313
•
8
common-pile/youtube
Viewer
•
Updated
Jun 6, 2025
•
1.13M
•
878
•
12
srinivasbilla/self-instruct-base
Viewer
•
Updated
Jan 24, 2023
•
82.6k
•
87
•
5
agentlans/high-quality-english-sentences
Viewer
•
Updated
Oct 1, 2024
•
1.71M
•
1.26k
•
34
agentlans/note-taking-v2
Viewer
•
Updated
Sep 22, 2025
•
17.6k
•
42
PleIAs/SYNTH
Viewer
•
Updated
10 days ago
•
68M
•
12.7k
•
263
Upvote
-
Share collection
View history
Collection guide
Browse collections