OpenAI's vision: a chatbot in every app
Monday's event had a lot for developers to love—starting with price cuts.
When you have a successful tech product, a natural next step is to turn it into a platform. The original 2007 iPhone, for example, was a major breakthrough for smartphones. But it was the introduction of the App Store in 2008—and the subsequent creation of apps like Uber and Instagram—that cemented Apple’s position as the world’s most profitable smartphone maker.
You can think of ChatGPT as the iPhone of generative AI. If OpenAI wants to maintain its early lead, it needs to turn its technology into a platform other companies use to bring AI capabilities to their customers.
OpenAI has been hard at work on this since March, when the company announced an application programming interface (API) for ChatGPT that let third parties integrate chatbot functionality into their own apps. On Monday, OpenAI held its first developer day in San Francisco and announced a bunch of new features to enable developers to create software powered by GPT-4.
Charlie Guo, the author of the excellent AI newsletter Artificial Ignorance, attended the conference. He told me that for a lot of developers, the most exciting news was lower prices.
“GPT-4 is head and shoulders above other models that you can access via an API,” Guo told me. “But the big issue for many developers and startups is that it's extremely expensive.”
To address those concerns, OpenAI announced a new model, called GPT-4 Turbo. OpenAI CEO Sam Altman described it as better than the previous version of GPT-4 (some dispute this) and announced that it will be two to three times cheaper on a per-token basis.
What struck me about Monday’s announcements was how many options OpenAI is offering to developers. When Apple launched the iPhone in 2008, it offered developers a single clear vision for how apps should work. In contrast, OpenAI is offering a menu of possible approaches to developing AI-powered software and letting developers choose.
GPTs: Special-purpose chatbots
One of OpenAI’s first experiments in integrating ChatGPT with third-party services was called plugins. This feature, announced back in March, extends the capabilities of ChatGPT by allowing it to pull in data from third-party software.
For example, if I ask ChatGPT to find me a flight to San Francisco, it might use the Expedia plugin to perform a flight search. If plugins like this became popular, you could imagine ChatGPT challenging Google as the default starting point for accessing information online.
But plugins haven’t been popular, possibly due to an awkward user interface. To prevent abuse, OpenAI requires users to manually enable each plugin before it can be used. But there isn’t a great way for users to discover plugins.
By late May, Sam Altman was downplaying plugins, reportedly saying that “a lot of people thought they wanted their apps to be inside ChatGPT but what they really wanted was ChatGPT in their apps.”
Monday’s announcements reflect this evolution in Altman’s thinking. Instead of one chatbot that does everything, OpenAI’s new vision is that there will be many different chatbots, each optimized for a different purpose.
There are two ways this could work. One would be to have OpenAI host an “app store” of custom chatbots that people access on OpenAI’s website. The other is a “copilot” model where developers add chatbots to their own apps. OpenAI’s plan is to support both approaches and let developers choose which one they prefer.
On Monday, OpenAI announced a new platform for OpenAI-hosted custom chatbots, confusingly called GPTs. A GPT will accomplish a narrow task like planning meals or troubleshooting problems with your computer. GPTs can be private, limited to a specific organization, or available to the general public.
GPTs are simple to set up. During a demo, Altman created a GPT for advising startup founders. The GPT Builder app asked Altman for the GPT’s purpose (“I want to help startup founders think through their business ideas and get advice,” Altman replied), and suggested a name and logo. Based on just this information, the GPT Builder created the first version of a “Startup Mentor” chatbot.
There are a couple of ways developers can extend the capabilities of GPTs. First, users can upload documents for a GPT to consult when answering questions. For example, Altman uploaded a lecture he’s given about his advice for founders.
Second, GPTs can communicate with the outside world using actions. The documentation for GPTs states that “the design of actions builds upon insights from our plugins beta.” Like plugins, actions allow chatbots to invoke external APIs to do things like search for flights or look up restaurant reviews. But OpenAI says actions give developers more control than plugins did.
“It seems kind of obvious to me that OpenAI is going to phase out plugins and tell everyone they need to migrate to a GPT,” Guo told me.
It’s not hard to think of promising use cases for GPTs. For example, every large company has an intranet with an employee handbook and other official documents. It’s easy to imagine companies creating GPTs that answer new employee questions.
But I worry GPTs will suffer from the same basic discovery problem as plugins. One of the big selling points of a chatbot like ChatGPT is its versatility. Users can ask natural-language questions on almost any topic. With GPTs, in contrast, a user needs to find the right chatbot before they can ask a question. My guess is that many users won’t bother when there are plenty of other ways to find information.
Assistants: Copilots everywhere
A lot of companies are rushing to integrate AI into their apps, and a popular approach is to add a chatbot that sits on the right side of the user interface. For example, here’s what this looks like in the email app Shortwave:1
Microsoft has added a chatbot to its Office suite called Micorosft 365 Copilot, and Google is developing a chatbot called Duet that will appear alongside Gmail, Docs, Sheets, and other productivity apps.
On Monday, OpenAI introduced an API called Assistants. Assistants have many of the same capabilities as GPTs, but they are designed to be integrated into third-party apps.
The Assistants API will handle many of the tedious aspects of adding a chatbot to another piece of software. For example, a feature called Threads remembers the state of long-running conversations, saving apps the hassle of re-sending the chat history each time they send a new message.
Another common problem developers face is the limited size of the context window—that is, the number of tokens a large language model can take as input. OpenAI’s new GPT-4 Turbo model has a context window of 128,000 tokens—around 300 pages of text.
That’s four times more than classic GPT-4, and it’s plenty for personal use. But the limit starts to matter when developers try to build full-scale applications around large language models.
For example, suppose a company wants to make a collection of 10,000 corporate memos available to an internal chatbot. Even GPT-4 Turbo doesn’t have enough space to include the text of 10,000 PDFs in a prompt.
A lot of companies handle this with an approach called retrieval-augmented generation (RAG). First, documents are split into chunks and put into a database. Then when the user makes a query, the app performs a vector-based search to find the 10 or 20 most relevant documents. These are then added to the prompt, making them available for the chatbot to analyze and reference in its response.
RAG has rapidly evolved from a bleeding-edge technique to a standard part of the AI engineering toolkit. A bunch of startups have built tools to help with various steps in the RAG pipeline. For example, there are a number of vector databases that help developers compute vectors (also known as embeddings) for each document and then find documents that are closest in vector space (this is similar to the word vectors I explained back in July).
The Assistants API includes a feature called Knowledge Retrieval that handles most of these details on behalf of users. “OpenAI will automatically chunk your documents, index and store the embeddings, and implement vector search to retrieve relevant content to answer user queries,” the company writes.
The practical upshot will be to lower the barrier to entry for simple RAG apps. It’ll be easier for a small team or even a single developer to add chatbot functionality to an existing application or to build a custom chatbot based on a company’s proprietary documents.
Nathan Labenz, host of the Cognitive Revolution podcast, predicted that OpenAI’s RAG offering won’t be a good choice for everyone. “There are all kinds of reasons people are going to want to host their own vector database,” Labenz told me.
Some companies will want full control over how documents are chunked, indexed, and searched. But for a lot of companies, OpenAI’s solution will be good enough, and it will be a lot less work to set up.
Fine tuning: custom LLMs for specialized use cases
Before the advent of large language models, it was common to train different, specialized language models for different tasks. There might be one model for translating between French and English, another for generating simple news articles, and a third for doing sentiment analysis.
One of the most remarkable things about large language models is their versatility: you can simply describe the task you want ChatGPT to perform (like “translate this sentence to French”) and often it will do a good job.
There’s a related strategy called few-shot learning, where a prompt gives the chatbot examples of the desired output. For example, a prompt might say “English: milk, Spanish: leche. English: tree, Spanish: arbol” before asking the model to translate another word from English to Spanish. Providing examples like this gives the model more clarity about what the user is looking for, and makes it more likely it will produce a satisfactory result.
However, these techniques have their limits. Few-shot learning may not work if the examples are complex or too different from examples in the model’s original training data. A general-purpose chatbot is more likely to get confused and produce off-topic responses. Moreover, OpenAI charges on a per-token basis, so adding examples to prompts will make queries more expensive.
Enter fine-tuning. Instead of training a model from scratch, fine-tuning tweaks an existing model by training it on additional examples focused on a particular use case. Whereas few-shot learning may only allow a handful of examples, fine-tuning can be done on dozens, thousands, or millions of examples. The more examples used, the better the results will be.
Traditionally, fine tuning requires access to a model’s weights, and OpenAI has not published the weights for its newer models. But in August, OpenAI announced fine-tuning as a cloud service. A customer provides training examples to OpenAI, OpenAI uses those examples to fine-tune GPT-3.5, and then the user runs the fine-tuned model on OpenAI’s servers.
OpenAI made a couple of important announcements related to fine-tuning on Monday. First, the cost of running fine-tuned models will fall two- to four-fold. These fine-tuned models are still more expensive than vanilla GPT-3.5, but the lower prices should bring the technique within reach for more users.
Second, OpenAI is going to start letting people fine-tune GPT-4, though it will be limited to an “experimental access program.” I don’t think OpenAI has explained why it’s limiting access to GPT-4 fine-tuning, but my guess is that they are worried about it being misused. Recent research has shown that a small amount of fine-tuning can cripple safeguards that prevent an LLM from producing dangerous or offensive responses.
Finally, OpenAI is now offering to help companies build completely custom models. However, OpenAI’s website states that “pricing starts at $2-$3 million,” which will put it far out of reach for most companies.
Fine-tuning and custom models are best thought of as the do-it-yourself segment of OpenAI’s offerings for developers. GPTs and Assistants offer developers convenient, turnkey ways to build apps powered by chatbots. But some customers are going to want maximum control and flexibility, and OpenAI is supporting those customers too.
“I was impressed at OpenAI's shipping velocity,” Charlie Guo told me. OpenAI started out as a research organization, Guo noted, but the company has built an impressive lineup of new products over the last year.
Disclosure: My brother is the CEO of Shortwave and I’m an investor in the company.
Couple of things.
Very few people really had access to the GPT4 API with 32k context window. Most just had the 8k, so if the 128k context window is widely available then that's really a 16 times increase.
Also I agree about the problems with GPTs...people will have to find them and know what they want. I feel like they're so simplistic that when you combine them with discoverability issues it's an uphill battle to get people using them. There's no reason you can't just tell ChatGPT you want it to give startup advice...
Why not ask chatGPT 4 for the optimal public GPTs for your desired task, for instance to improve English from B2 to B3 level?
I would expect chatGPT to be superfast & efficient at searching for the less than 1,000 GPTs. And next month maybe 10,000. But maybe not 100,000 until next year.