instruction-fine-tuning

Acronyme : IFT.

Aussi nommé : Reinforcement Learning with Human Feedback (RLHF).


Journaux liées à cette note :

Journal du jeudi 06 juin 2024 à 22:57 #llm, #JeMeDemande, #JaiLu

#JeMeDemande quelles sont les différences entre les modèles qui terminent par "rien", par -instruct et par -chat.

This brings us to the heart of the innovation behind the wildly popular ChatGPT: it uses an enhancement of GPT3 that (besides having a lot more parameters), was explicitly fine-tuned on instructions (and dialogs more generally) -- this is referred to as instruction-fine-tuning or IFT for short. In addition to fine-tuning instructions/dialogs, the models behind ChatGPT (i.e., GPT-3.5-Turbo and GPT-4) are further tuned to produce responses that align with human preferences (i.e. produce responses that are more helpful and safe), using a procedure called Reinforcement Learning with Human Feedback (RLHF). (from)