Meta-Reward Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge

The concept of LLM-as-a-Judge has been gaining popularity recently, and it is imperative that the Judge's ability is strengthened accordingly. To achieve this, the author proposes a new method to improve the model's judgement capacity. The method involves three main steps: (1) having the model reason to produce a result, (2) evaluating the content of the answer simultaneously, and (3) adjusting the training model using the evaluation results. This process can significantly enhance the model's capacity for judgement and instruction following.

The method utilizes a seed-model that has already undergone SFT and possesses instruction following capabilities. The process consists of three stages: (1) As-a-Actor, where the model produces responses based on input; (2) As-a-Judge, where the model evaluates the input and responses, typically providing a thoughtful process; and (3) As-a-Meta-Judge, where the model's judgement is compared and evaluated.

The third stage is the core of the method, and the corresponding prompt is as follows:

Review the user's question and the corresponding response, along with two judgments. Determine which judgment is more accurate according to the rubric provided below.

The quality of the highest-ranked Judgement will be incorporated into the model's training, ultimately enhancing the model's judgement capacity.

Meta-Reward Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge

Fugitives Break Out of Detention Center, Police Launch Manhunt

That's all.