Instructions to use ContextualAI/Contextual_KTO_Mistral_PairRM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ContextualAI/Contextual_KTO_Mistral_PairRM with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ContextualAI/Contextual_KTO_Mistral_PairRM")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ContextualAI/Contextual_KTO_Mistral_PairRM")
model = AutoModelForCausalLM.from_pretrained("ContextualAI/Contextual_KTO_Mistral_PairRM")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use ContextualAI/Contextual_KTO_Mistral_PairRM with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ContextualAI/Contextual_KTO_Mistral_PairRM"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ContextualAI/Contextual_KTO_Mistral_PairRM",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ContextualAI/Contextual_KTO_Mistral_PairRM

SGLang

How to use ContextualAI/Contextual_KTO_Mistral_PairRM with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ContextualAI/Contextual_KTO_Mistral_PairRM" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ContextualAI/Contextual_KTO_Mistral_PairRM",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ContextualAI/Contextual_KTO_Mistral_PairRM" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ContextualAI/Contextual_KTO_Mistral_PairRM",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ContextualAI/Contextual_KTO_Mistral_PairRM with Docker Model Runner:
```
docker model run hf.co/ContextualAI/Contextual_KTO_Mistral_PairRM
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

This repo contains the model and tokenizer checkpoints for:

model family mistralai/Mistral-7B-Instruct-v0.2
optimized with the loss KTO
aligned using the snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset
via 3 iterations of KTO on one epoch of each training partition, each previous iteration's model serving as the reference for the subsequent.

[03/06/2024]: We are #2 on the (verified) Alpaca Eval 2.0 Leaderboard scoring 33.23!

To prompt this model, ensure that the format is consistent with that of TuluV2. For example, a prompt should be formatted as follows, where <|user|> corresponds to the human's role and <|assistant|> corresponds to the LLM's role. The human should speak first:


<|user|>
Hi! I'm looking for a cake recipe.
<|assistant|>
What kind of cake?
<|user|>
Chocolate cake.
<|assistant|>

Note that a beginning-of-sequence (BOS) token is automatically added at tokenization time and does not have to be added by you. No end-of-sequence (EOS) token is added to the prompt. You may also use our tokenizer's apply_chat_template if doing inference with chatml set or evaluating generations through non-local clients.

For more info on KTO refer to our code repository or blog for more details on the methodology.

If you found this work useful, feel free to cite our work:

@techreport{ethayarajh2023halos,
  author = {Ethayarajh, Kawin and Xu, Winnie, and Jurafsky, Dan and Kiela, Douwe},
  title = {Human-Centered Loss Functions (HALOs)},
  institution = {Contextual AI},
  note = {https://github.com/ContextualAI/HALOs/blob/main/assets/report.pdf},
  year = {2023},
}

Downloads last month: 73

Safetensors

Model size

7B params

Tensor type

BF16

Model tree for ContextualAI/Contextual_KTO_Mistral_PairRM

Merges

5 models

Quantizations

3 models

Dataset used to train ContextualAI/Contextual_KTO_Mistral_PairRM

Spaces using ContextualAI/Contextual_KTO_Mistral_PairRM 9

Paper for ContextualAI/Contextual_KTO_Mistral_PairRM

KTO: Model Alignment as Prospect Theoretic Optimization

Paper • 2402.01306 • Published Feb 2, 2024 • 22