Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper
• 2401.17464
• Published
• 21
Divide and Conquer: Language Models can Plan and Self-Correct for
Compositional Text-to-Image Generation
Paper
• 2401.15688
• Published
• 11
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper
• 2401.15024
• Published
• 73
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on
Generalizability, Trustworthiness and Causality through Four Modalities
Paper
• 2401.15071
• Published
• 37
Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation
for Generative AI
Paper
• 2401.14019
• Published
• 23
ChatQA: Building GPT-4 Level Conversational QA Models
Paper
• 2401.10225
• Published
• 36
Do Large Language Models Latently Perform Multi-Hop Reasoning?
Paper
• 2402.16837
• Published
• 29
Divide-or-Conquer? Which Part Should You Distill Your LLM?
Paper
• 2402.15000
• Published
• 23
Linear Transformers with Learnable Kernel Functions are Better
In-Context Models
Paper
• 2402.10644
• Published
• 81
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM
Workflows
Paper
• 2402.10379
• Published
• 31
Chain-of-Thought Reasoning Without Prompting
Paper
• 2402.10200
• Published
• 109
Generative Representational Instruction Tuning
Paper
• 2402.09906
• Published
• 54
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper
• 2402.03620
• Published
• 117
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning
Tasks
Paper
• 2402.04248
• Published
• 32
Rethinking Optimization and Architecture for Tiny Language Models
Paper
• 2402.02791
• Published
• 13
OLMo: Accelerating the Science of Language Models
Paper
• 2402.00838
• Published
• 85
Can Large Language Models Understand Context?
Paper
• 2402.00858
• Published
• 24
Can large language models explore in-context?
Paper
• 2403.15371
• Published
• 33
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language
Models
Paper
• 2406.04271
• Published
• 29
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts
Language Models
Paper
• 2406.06563
• Published
• 20
Instruction Pre-Training: Language Models are Supervised Multitask
Learners
Paper
• 2406.14491
• Published
• 96
Unlocking Continual Learning Abilities in Language Models
Paper
• 2406.17245
• Published
• 30
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning
Paper
• 2406.15334
• Published
• 9
A Closer Look into Mixture-of-Experts in Large Language Models
Paper
• 2406.18219
• Published
• 17
Characterizing Prompt Compression Methods for Long Context Inference
Paper
• 2407.08892
• Published
• 11
Self-Training with Direct Preference Optimization Improves
Chain-of-Thought Reasoning
Paper
• 2407.18248
• Published
• 33
jina-embeddings-v3: Multilingual Embeddings With Task LoRA
Paper
• 2409.10173
• Published
• 34