We are experimenting with ways to use ChatGPT to get better answers more reliably, remove hallucinations, etc.
This little library will generate multiple draft responses and then use a second model to judge the answers and pick a winner, which is then returned to the user. Google's Bard uses this same approach.
With this library you can apply the pattern to gpt-3.5 and gpt-4.
Drafts are generated in parallel and all drafts are evaluated with a single prompt.
This will use a lot of tokens. For example to generate 3 drafts, you are at 3x + you need to feed those drafts into another prompt + get that response, so >7x.
Nice! I noticed that Bard allows you to see drafts it made prior to selecting its final response and I kind of wanted chatGPT to do the same.
This is not just useful to reduce hallucinations or improve reliability in general, but also you could get as precise and specific as you want with the criteria to select the winning draft, which is something you can't control with Bard either. You could also extend this idea by then having another model extract and combine the best aspects out of each draft and so on.
This seems like a pattern / approach that would also be particularly great for cases where the output from the LLM has to be precise to be useful, such as writing code.
Great work on the Gladiator package! It's fascinating to see the application of parallel drafting and model judging to enhance the responses generated by ChatGPT. The ability to generate multiple drafts and then have a second model judge and select the best response brings a new level of reliability and accuracy to the system. This approach not only helps in reducing hallucinations but also allows for precise control over the criteria for selecting the winning draft.
I also appreciate the potential for extending this idea further by incorporating another model to extract and combine the best aspects from each draft. The versatility of this approach makes it particularly valuable for cases where the output needs to be highly precise, such as writing code.
Looking forward to exploring the Streamlit demo and seeing the Gladiator package in action. Keep up the great work!
It could be interesting to use this approach in a product that also lets humans pick what they thought was the best answer (in the cases where they are curious about seeing all three).
That data could be gathered internally by that product into an RLHF data set used to train future LLMs.
This little library will generate multiple draft responses and then use a second model to judge the answers and pick a winner, which is then returned to the user. Google's Bard uses this same approach.
With this library you can apply the pattern to gpt-3.5 and gpt-4.
Drafts are generated in parallel and all drafts are evaluated with a single prompt.
This will use a lot of tokens. For example to generate 3 drafts, you are at 3x + you need to feed those drafts into another prompt + get that response, so >7x.
Streamlit demo: https://theoremone-gptgladiator-streamlit-ui-5ljwmm.streamli...
reply