In her Inference Scaling: A New Frontier for AI Capabilities presentation at Sutter Hill Ventures, Azalia Mirohosfini shared her team's research showing that giving AI models multiple attempts at tasks and carefully selecting the best results can improve performance. Here's my notes from her talk:
Improving Model Performance
- Pre-training and fine-tuning have been key focus areas for scaling language models.
- Traditional fine-tuning starts with next-token prediction on high-quality specialized data
- Reinforcement Learning from Human Feedback (RLHF) introduced human preferences into the process where people rate/rank outputs for steering model behavior.
- Constitutional AI moves beyond collecting thousands of human labels to using ~10 human principles in a two-stage approach: models generate and critique outputs based on these principles then RLAIF (Reinforcement Learning from AI Feedback) adds model-generated labels.
- This improves harmlessness and helpfulness and reduces dependency on human data collection
Inference Time Scaling
- The "Large Language Monkeys" project showed that repeated sampling (trying multiple times) during inference can significantly improve performance on complex tasks like math and coding
- Even smaller models showed major gains from increased sampling
- Performance improvements follow an exponential power law relationship
- Some correct solutions only appeared in <10 out of 10,000 attempts
- Key inference time techniques that can be combined: repeated sampling (generating multiple attempts), fusion (synthesizing multiple responses), criticism and ranking of responses, verification of outputs.
- Verification falls into two categories of problems: automated (coding, formal math proofs) and manual(needs human judgment).
- Basic approaches like majority voting don't work well, we need better verifiers.
Future Directions
- Need deeper investigation into whether parallel or serial inference approaches are more effective
- As inference becomes a larger part of both training and deployment, high-throughput model serving infrastructure becomes increasingly critical.
- The line between inference and training is blurring, with inference results being fed back into training processes to improve model capabilities.
- Future models will need seamless self-improvement cycles that continuously enhance their capabilities.
- More similar to how humans learn through constant interaction and feedback rather than discrete training periods.