transformers: axolotl

## Axolotl (alternative to hugginface/transformer)

It is a tool to full finetune model, parameter efficient finetuning and alignment techniques with support for multiple model architectures like llama, mixtral, phi, qwen, mixtral-moe, gemma, gpt-j, pythia etc.

Support includes fp16/fp32, lora, qlora, gptq and flash attn. pre-training, finetuning and preference-based post-training (DPO, ORPO AND PRMs)

Its installation requires **packaging==23.2, setuptools==75.8.0, wheel, ninja along with flash-attn and deepspeed.**

YAML file based finetuning technique.

### Dataset Format required for pre-training.

```python
{"text": "first row"}
{"text": "second row"}
...
```
in **.jsonl format.**

```python
Dataset.load_dataset
```
it loads various formats of dataset including *jsonl, csv, arrow, parquet, sql and Webdataset*

### Dataset Format required for SFT

SFT means training model to respond to an instruction or chat input. (chatbots like GPT and Gemini)

Formats supported are **Conversation Dataset and Instruction Dataset** along with *tokenized dataset*

#### Conversation dataset

It usually contain **role and content** key.

its called chat_templates which is a Jinja2 template which formats a list of messages into a prompt.

#### <|im_start|> and <|im_end|>

they are **de**limit**ers** which is a prompt that separates different speakers which allow model to identify which portion belongs to whom.

##### Sharegpt format

{"conversations": [{"from": "...", "value": "..."}]}

##### OpenAI format

{"messages": [{"role": "...", "content": "..."}]}

possible roles are *user, system, assistant*

{{**What do you want to mask?** }}

we can bring our own custom template via:

**chat_template_jinja: # your template**

#### Instruction Dataset

used for training instruction following models.

common format

```{"instruction": "...", "input": "...", "output": "..."}```

its called alpaca instruction dataset format.

but custom instruction prompt are welcomed.

## RLHF

RLHF means language model optimized through human feedback which means

### Methods for RLHF

#### DPO

Direct Preference Optimization

#### IPO

Identity preference optimization

#### KTO

Kahneman-Tversky Optimization

#### GRPO

Group relative policy optimization

#### ORPO

Odds ratio preference optimization

transformers

Pages

Saturday, 6 September 2025

axolotl

No comments:

Post a Comment