instruction-fine-tuning

Acronyme : IFT.

Aussi nommé : Reinforcement Learning with Human Feedback (RLHF).

Journaux liées à cette note :

#JeMeDemande quelles sont les différences entre les modèles qui terminent par "rien", par -instruct et par -chat.

#JaiLu Noob question: What's the difference between chat and instruct (or other?) models?/
- #JaiLu Language Models: Completion and Chat-Completion - langroid/

This brings us to the heart of the innovation behind the wildly popular ChatGPT: it uses an enhancement of GPT3 that (besides having a lot more parameters), was explicitly fine-tuned on instructions (and dialogs more generally) -- this is referred to as instruction-fine-tuning or IFT for short. In addition to fine-tuning instructions/dialogs, the models behind ChatGPT (i.e., GPT-3.5-Turbo and GPT-4) are further tuned to produce responses that align with human preferences (i.e. produce responses that are more helpful and safe), using a procedure called Reinforcement Learning with Human Feedback (RLHF). (from)

#JaiLu Difference between “chat”, “instruct”, and “chat-instruct”? : LocalLLaMA