test-heretic / README.md
ThijsL202's picture
Update README.md
f5506d3 verified

Test-Heretic Model

This is an attempt to decensored version of ThijsL202/test using Heretic v1.2.0 However, it is more a failure than success. (Mostly because of time and because I'll have to modify heretic for much faster trial runs)

Abliteration Parameters (Trial 1)

  • direction_index: per layer
  • attn.o_proj.max_weight: 0.88
  • attn.o_proj.max_weight_position: 38.23
  • attn.o_proj.min_weight: 0.28
  • attn.o_proj.min_weight_distance: 25.46
  • mlp.down_proj.max_weight: 0.87
  • mlp.down_proj.max_weight_position: 40.85
  • mlp.down_proj.min_weight: 0.70
  • mlp.down_proj.min_weight_distance: 33.35

Performance

  • Refusals: 17/100
  • KL divergence: 0.0380