Reinforcement Learning from Human Feedback

Software > Computer Software > Educational Software DeepLearning.AI

Course Overview

Large language models (LLMs) are trained on human-generated text, but additional methods are needed to align an LLM with human values and preferences. Reinforcement Learning from Human Feedback (RLHF) is currently the main method for aligning LLMs with human values and preferences. RLHF is also used for further tuning a base LLM to align with values and preferences that are specific to your use case. In this course, you will gain a conceptual understanding of the RLHF training process, and then practice applying RLHF to tune an LLM. You will: 1. Explore the two datasets that are used in RLHF training: the “preference” and “prompt” datasets. 2. Use the open source Google Cloud Pipeline Components Library, to fine-tune the Llama 2 model with RLHF. 3. Assess the tuned LLM against the original base model by comparing loss curves and using the “Side-by-Side (SxS)” method.

Course FAQs

What are the prerequisites for 'Reinforcement Learning from Human Feedback'?

Prerequisites for this continuing education class are set by DeepLearning.AI. Most professional development online classes benefit from some prior knowledge. Please check the provider's page for specific requirements.

Will I receive a certificate for this CE class?

Yes, upon successful completion, DeepLearning.AI typically offers a shareable certificate to showcase your new skills and fulfill your continuing education requirements.

How long does this online course take to complete?

Completion times for online continuing education courses vary. The provider's website will have the most accurate estimate of the time commitment needed.