Overview

I fine-tune large language models (LLMs) for BatchPrompting, the ability to answer multiple questions in a single inference pass. Existing BatchPrompting techniques rely on lengthy prompts needing few-shot examples and suffer from decreased performance as the number of questions grows. We demonstrate that after fine-tuning, LLMs maintain consistent performance across a wide range of batch sizes without relying on lengthy prompts or few-shot examples. This enables users to efficiently include any number of questions into one prompt while preserving the model's response quality.