Chat Interfaces & Declaring Intent

by February 3, 2025

There's lots of debate within UI design circles about the explosion of chat interfaces driven by large-scale AI models. While there's certainly pros and cons to open text fields, one thing they are great at is capturing user intent. Which today's AI-driven systems can increasingly fulfill.

At their inception, computers required humans to adapt to how they worked. Developers had to learn languages with sometimes punishing syntax (don't leave the semicolon out!). People needed to learn CLI commands like cd ls pwd and more. Even with graphical user interfaces (GUIS), we couldn't simply tell a computer what to do—we had to understand what computers could do and modify our behavior accordingly by clicking on windows, icons, menus, and more.

Google's home page with an empty search box

Google changed this paradigm with a simple yet powerful way for people to declare their intent: an empty search box. Just type whatever you want into Google and it will find you relevant information in response. This open ended interface not only became hugely popular (close to 9 billion searches per day). It also created an enormous business for Google because matching people's expressed needs with businesses that can fulfill them monetizes extremely well.

But Google's empty text box was limited to information retrieval.

The emergence of large-scale language models (LLMs) expanded what an open-ended declaration of intent could do. Instead of information retrieval, LLMs enabled information manipulation through an empty text box often referred to as a "chat interface". People could now tell systems using natural language (and even misspellings) to summarize content, transform text into poetry, and generate or restructure information in countless ways. And once again this open-ended interface became hugely popular (ChatGPT has 300 million weekly actives since launching in 2022).

The next logical step was combining these capabilities—merging information retrieval with manipulation, as seen in retrieval augmented generation RAG applications (like Ask Luke!), Perplexity, and ChatGPT with search integration.

ChatGPT's home page with an empty prompt box

But finding and manipulating information is just a subset of the things computers allow us to do. An enormous set of computer applications exists to enable actions of all shapes and sizes from editing images to managing sales teams. Finding the right action amongst these capabilities requires remembering the app and how to access and use the feature.

Increasingly, though, AI models can not only find the right action for a task, they can even create an action if it doesn't exist. Through tool use and tool synthesis, LLMs are continuously getting better at action retrieval and manipulation. So today's AI models can combine information retrieval and manipulation with action retrieval and manipulation.

If that sounds like a mouthful, it is. But the user interface for these systems is still primarily an open text-field which allows people to declare their intent. What's changed dramatically is that today's technology can do so much more to fulfill that intent. With such vast and emergent capabilities, why do we want to constrain them with UI?

We've moved from humans learning to speak computer to computers learning to understand humans and I, for one, don't want to go backwards, which is why I'm increasingly hesitant to add more UI to communicate the possibilities of AI-driven systems (despite 30 years of designing GUIs). Let's make the computers figure out what we want, not the other way around.