tatsu-lab/alpaca
Viewer • Updated • 52k • 109k • 979
How to use hiyouga/Baichuan-7B-sft with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="hiyouga/Baichuan-7B-sft", trust_remote_code=True) # Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("hiyouga/Baichuan-7B-sft", trust_remote_code=True, dtype="auto")How to use hiyouga/Baichuan-7B-sft with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "hiyouga/Baichuan-7B-sft"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "hiyouga/Baichuan-7B-sft",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/hiyouga/Baichuan-7B-sft
How to use hiyouga/Baichuan-7B-sft with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "hiyouga/Baichuan-7B-sft" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "hiyouga/Baichuan-7B-sft",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "hiyouga/Baichuan-7B-sft" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "hiyouga/Baichuan-7B-sft",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use hiyouga/Baichuan-7B-sft with Docker Model Runner:
docker model run hf.co/hiyouga/Baichuan-7B-sft
A bilingual instruction-tuned LoRA model of https://huggingface.co/baichuan-inc/baichuan-7B
Please follow the baichuan-7B License to use this model.
Usage:
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
tokenizer = AutoTokenizer.from_pretrained("hiyouga/baichuan-7b-sft", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("hiyouga/baichuan-7b-sft", trust_remote_code=True).cuda()
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
query = "晚上睡不着怎么办"
template = (
"A chat between a curious user and an artificial intelligence assistant. "
"The assistant gives helpful, detailed, and polite answers to the user's questions.\n"
"Human: {}\nAssistant: "
)
inputs = tokenizer([template.format(query)], return_tensors="pt")
inputs = inputs.to("cuda")
generate_ids = model.generate(**inputs, max_new_tokens=256, streamer=streamer)
You could also alternatively launch a CLI demo by using the script in https://github.com/hiyouga/LLaMA-Factory
python src/cli_demo.py --template default --model_name_or_path hiyouga/baichuan-7b-sft
You could reproduce our results with the following scripts using LLaMA-Factory:
CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
--stage sft \
--model_name_or_path baichuan-inc/baichuan-7B \
--do_train \
--dataset alpaca_gpt4_en,alpaca_gpt4_zh,codealpaca \
--template default \
--finetuning_type lora \
--lora_rank 16 \
--lora_target all \
--output_dir baichuan_lora \
--overwrite_cache \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 8 \
--gradient_accumulation_steps 8 \
--preprocessing_num_workers 16 \
--lr_scheduler_type cosine \
--logging_steps 10 \
--save_steps 100 \
--eval_steps 100 \
--learning_rate 5e-5 \
--max_grad_norm 0.5 \
--num_train_epochs 2.0 \
--val_size 0.01 \
--evaluation_strategy steps \
--load_best_model_at_end \
--plot_loss \
--fp16