Being lazy, I’m very interested in using agents powered by LLMs to accomplish tasks for me. In this post, I explore how this is done with Boxcars, a Ruby gem inspired by Langchain for building LLM apps.

Quick intro to Boxcars

Boxcars is a Ruby gem that makes it easy to build applications with LLMs. I’ve found it much easier to use than Langchain as it provides “just enough” abstractions to interact with an LLM and act on the output. See the getting started docs to get going on your own.

A single boxcar train for realtime weather

In this example, I’ll setup a train with a custom Boxcar, GoogleAnswerBox (source). GoogleAnswerBox returns the answer box in Google search results (the box at the top of the page that is displayed if Google can answer your question directly) as JSON.

boxcars = [GoogleAnswerBox.new]
train = Boxcars.train.new(boxcars: boxcars)
train.run("what is the temperature in Fort Collins?")
 => "The current temperature in Fort Collins is 45 degrees Fahrenheit." 

This matches the result when I run the same query direct on Google in my web browser:

How about a stock price?

train.run("what is the Tesla stock price?")
=> "The current Tesla stock price in USD is $169.36.\nNext Actions:\n1. What was the opening price of Tesla stock today?\n2. How has the Tesla stock price changed over the past week?\n3. What is the market capitalization of Tesla?" 

Or a holiday?

train.run("When is Memorial Day?")
 => "Memorial Day is on Monday, May 29, 2023.\nNext Actions: None, as the answer is straightforward." 

Or the time?

train.run("what time is it in Denver?")
 => "The current time in Denver is 08:41 AM. \nNext Actions: None, as the user's question has been answered." 

How does Boxcars take my query, interact with an external tool (Google Search), and generate an answer?

ReAct (Reason + Act) on Ruby

If I asked you for the current temperature, time, or score of an NBA playoff game, you would need an external tool to provide me with this information. It’s not stored in your brain, but your brain can determine which tool to use, interact with the tool, process the data displayed in the tool, and finally provide me with an answer.

Just like your brain, an LLM cannot provide you with information on current events, but you can give an LLM information on external tools they can use to fetch realtime data. Perhaps the most popular approach for having an LLM reason and use external tools is the ReAct (Reason + Act) framework, introduced in this paper (Shunyu et al., 2022). In the example above, Boxcars uses a Zero Shot (no training) ReAct prompt to provide answers.

Let’s walk through how Boxcars implements ReAct when using the GoogleAnswerBox tool.

First prompt

I’ll start by looking at the LLM prompt generated by the Boxcars Train:

>>>>>> Role: system <<<<<<
Answer the following questions as best you can. You have access to the following actions:

AnswerBox: useful for when you need to answer questions that require realtime data.You should ask targeted questions

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one from this list: [AnswerBox]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation sequence can repeat N times)
Thought: I know the final answer
Final Answer: the final answer to the original input question
Next Actions: Up to 3 logical suggested next questions for the user to ask after getting this answer.
Remember to start a line with "Final Answer:" to give me the final answer.
Begin!
>>>>>> Role: user <<<<<<
Question: what is the temperature in Fort Collins?
>>>>>> Role: assistant <<<<<<
Thought: 

Taking a step back: this is fascinating. There’s no training involved. It takes under 140 words of system instructions to get the answer to our question. The prompt is first broken down into three ChatGPT-specific roles:

  1. >>>>>> Role: system <<<<<< - these instructions guide the model throughout the conversation.
  2. >>>>>> Role: user <<<<<< - the person asking questions to ChatGPT.
  3. >>>>>> Role: assistant <<<<<< - responses from ChatGPT to questions.

You can learn more about ChatGPT roles from their docs.

Note how Thought: (the last line) is empty. This is the start of the Thought/Action/Action Input/Observation we’re asking ChatGPT to complete.

ChatGPT responds with the following:

Thought: I need to use the AnswerBox action to get the current temperature in Fort Collins.
Observation: I need to use the AnswerBox action to get the current temperature in Fort Collins

ChatGPT generates this reasoning from the GoogleAnswerBox boxcar description that is included in the prompt:

AnswerBox: useful for when you need to answer questions that require realtime data.You should ask targeted questions

If I omit the actions portion of the prompt and send the prompt to ChatGPT, I’ll get a response like this:

Thought: I should look up the current temperature in Fort Collins.
Action: Use a search engine to find the current temperature in Fort Collins.
Action Input: "Fort Collins current temperature" in a search engine.
Observation: The current temperature in Fort Collins is displayed on a search results page.
Thought: I should provide the temperature to the user.
Final Answer: The current temperature in Fort Collins is [insert current temperature].

ChatGPT realizes it should use a search to collect current information but it doesn’t have access to action to fetch the current temperature.

Second prompt

The Boxcars train is now ready to continue the thought/action/action input/observation loop by sending a second prompt. For brevity, I’ve omitted the system and user roles which remain the same:

>>>>>> Role: assistant <<<<<<
Thought:  I need to use the AnswerBox action to get the current temperature in Fort Collins.
Observation: I need to use the AnswerBox action to get the current temperature in Fort Collins.
Thought:

ChatGPT responds with:

I should ask for the current temperature in Fort Collins.
Action: AnswerBox
Action Input: "What is the current temperature in Fort Collins?"

The Boxcars::Train object takes the ChatGPT response and parses out the Action and Action Input, mapping these to the available actions (just AnswerBox for now). GoogleAnswerBox#run is called with the Action Input, returning the text below:

Answer: {"type":"weather_result","temperature":"49","unit":"Fahrenheit","precipitation":"0%%","humidity":"65%%","wind":"3 mph","location":"Weather","date":"Monday 7:00 AM","weather":"Mostly sunny","thumbnail":"https://serpapi.com/searches/6458f883ce87f81e4d7973c2/images/dccefb93a84c042f2c5d64fd510927300f2ceedeb076f4ff.png","forecast":[{"day":"Monday","weather":"Partly cloudy","temperature":{"high":"73","low":"45"},"thumbnail":"https://serpapi.com/searches/6458f883ce87f81e4d7973c2/images/dccefb93a84c042f85705f9091acff95c6669fc2faa0664facd6d2464297ada0.png"},{"day":"Tuesday","weather":"Mostly sunny","temperature":{"high":"78","low":"48"},"thumbnail":"https://serpapi.com/searches/6458f883ce87f81e4d7973c2/images/dccefb93a84c042f85705f9091acff9581bc24fa381ee107ae5c17df0e61b10a.png"},{"day":"Wednesday","weather":"Scattered thunderstorms","temperature":{"high":"72","low":"50"},"thumbnail":"https://serpapi.com/searches/6458f883ce87f81e4d7973c2/images/dccefb93a84c042f85705f9091acff9502e1868124e15c11dc1beb

Yes, that is just an ugly truncated JSON representation of the Google answer box in the search result. GoogleAnswerBox does not parse the contents of the answer box: there are many variations of answer box formats. Why not let ChatGPT parse this for me?

Third and final prompt

The Train now sends off a prompt with the answer box JSON above in the Observation to ChatGPT:

>>>>>> Role: assistant <<<<<<
Thought:  I need to use the AnswerBox action to get the current temperature in Fort Collins.
Observation: I need to use the AnswerBox action to get the current temperature in Fort Collins.
Thought: I should ask for the current temperature in Fort Collins by using the AnswerBox action.
Action: AnswerBox
Action Input: "What is the current temperature in Fort Collins?"
Observation: Answer:{"type":"weather_result","temperature":"49","unit":"Fahrenheit","precipitation":"0%%","humidity":"65%%","wind":"3 mph","location":"Weather","date":"Monday 7:00 AM","weather":"Mostly sunny","thumbnail":"https://serpapi.com/searches/6458f883ce87f81e4d7973c2/images/dccefb93a84c042f2c5d64fd510927300f2ceedeb076f4ff.png","forecast":[{"day":"Monday","weather":"Partly cloudy","temperature":{"high":"73","low":"45"},"thumbnail":"https://serpapi.com/searches/6458f883ce87f81e4d7973c2/images/dccefb93a84c042f85705f9091acff95c6669fc2faa0664facd6d2464297ada0.png"},{"day":"Tuesday","weather":"Mostly sunny","temperature":{"high":"78","low":"48"},"thumbnail":"https://serpapi.com/searches/6458f883ce87f81e4d7973c2/images/dccefb93a84c042f85705f9091acff9581bc24fa381ee107ae5c17df0e61b10a.png"},{"day":"Wednesday","weather":"Scattered thunderstorms","temperature":{"high":"72","low":"50"},"thumbnail":"https://serpapi.com/searches/6458f883ce87f81e4d7973c2/images/dccefb93a84c042f85705f9091acff9502e1868124e15c11dc1beb
Thought:

ChatGPT returns:

Final Answer: The current temperature in Fort Collins is 49 degrees Fahrenheit.

Our ZeroShot train sees the magic Final Answer text, exits, and returns the result.

Note: the temperature increased from 45 to 49 degrees while writing this blog post.

Going deeper: a two boxcar train to query both a Rails DB and Google

Just like you’ll use multiple tools to accomplish a task, the ReAct framework can do the same. Let’s setup a Boxcar train with two boxcars, ActiveRecord for searching my Rails database and Google Answer Box. I’ll find out the time in a user’s timezone:

boxcars = [Boxcars::ActiveRecord.new, GoogleAnswerBox.new]
train = Boxcars.train.new(boxcars: boxcars)
train.run("what country does the user derek@glass.io user's timezone belong to?")
 => "The timezone belonging to the user derek@glass.io belongs to the United States." 

First prompt => completion cycle highlights

There are now two actions listed in the prompt (note that Glass is the name of the Rails application):

You have access to the following actions:

Glass: useful for when you need to query a database for an application named .
AnswerBox: useful for when you need to answer questions that require realtime data.You should ask targeted questions
Use the following format:

ChatGPT returns:

We need to use a database to query the user's timezone and then determine the country it belongs to.

Second prompt => completion cycle highlights

The following is now appended to the second prompt:

>>>>>> Role: assistant <<<<<<
Thought:  We need to use a database to query the user's timezone and then determine the country it belongs to.
Observation: We need to use a database to query the user's timezone and then determine the country it belongs to.

ChatGPT responds with:

We can use Glass to query the database for the user's timezone and then use that information to determine the country it belongs to.
Action: Glass
Action Input: "Query timezone for derek@glass.io user"

Our ZeroShot Train parses the output above and calls Boxcars::Boxcar::ActiveRecord#run with the Action Input above. This uses ChatGPT to take a natural language query and convert it to ActiveRecord code (see this post for more on how the ActiveRecord boxcar works). ChatGPT returns:

ARCode: `User.where(email: 'derek@glass.io').pluck(:time_zone)` 

The boxcar executes this code, returning:

America/Denver

Third prompt => completion cycle highlights

Our assistant role now looks like:

>>>>>> Role: assistant <<<<<<
Thought:  We need to use a database to query the user's timezone and then determine the country it belongs to.
Observation: We need to use a database to query the user's timezone and then determine the country it belongs to.
Thought: We can use Glass to query the database for the user's timezone and then use that information to determine the country it belongs to.
Action: Glass
Action Input: "Query timezone for derek@glass.io user"
Observation: America/Denver
Thought:

ChatGPT responds with:

Now that we have the user's timezone, we can use AnswerBox to determine the country it belongs to.
Action: AnswerBox
Action Input: "What country does the timezone America/Denver belong to?"

Our train now executes the GoogleAnswerBox boxcar, returning the following truncated JSON result:

{"type":"organic_result","title":"America/Denver - Time zone in United States - TRAVEL GUIDE","link":"https://www.travelguide.de/en/timezone/america-denver/#:~:text=America%%2FDenver%%20is%%20a%%20time,Mountain%%20Standard%%20Time%%20(MST).","displayed_link":"https://www.travelguide.de › Home › Time zones","hover_info":[{"text":"Mountain Standard Time","info":"United States Six states are fully contained in the Mountain Time Zone: Colorado. Montana. New Mexico.","title":"Mountain Time Zone - Wikipedia","link":"https://en.wikipedia.org/wiki/Mountain_Time_Zone","displayed_link":"https://en.wikipedia.org › wiki › Mountain_Time_Zone"}],"snippet":"America/Denver is a time zone in United States. It corresponds to the Mountain Standard Time (MST).","snippet_highlighted_words":["United States"],"thumbnail":"https://serpapi.com/searches/64590f407f836163a84eff66/images/6d20911aeb0d64ec26e1f63ef2e6c3f28ed8ae8265617e28.png","about_this_result":{"source":{"description":"travelguide.de was first indexed by

Fourth prompt => completion cycle highlights

We’re now most of the way through. The train appends the JSON result above as observation to the prompt:

>>>>>> Role: assistant <<<<<<
Thought:  We need to use a database to query the user's timezone and then determine the country it belongs to.
Observation: We need to use a database to query the user's timezone and then determine the country it belongs to.
Thought: We can use Glass to query the database for the user's timezone and then use that information to determine the country it belongs to.
Action: Glass
Action Input: "Query timezone for derek@glass.io user"
Observation: America/Denver
Thought: Now that we have the user's timezone, we can use AnswerBox to determine the country it belongs to.
Action: AnswerBox
Action Input: "What country does the timezone America/Denver belong to?"
Observation: Answer: {"type":"organic_result","title":"America/Denver - Time zone in United States - TRAVEL GUIDE","link":"https://www.travelguide.de/en/timezone/america-denver/#:~:text=America%%2FDenver%%20is%%20a%%20time,Mountain%%20Standard%%20Time%%20(MST).","displayed_link":"https://www.travelguide.de › Home › Time zones","hover_info":[{"text":"Mountain Standard Time","info":"United States Six states are fully contained in the Mountain Time Zone: Colorado. Montana. New Mexico.","title":"Mountain Time Zone - Wikipedia","link":"https://en.wikipedia.org/wiki/Mountain_Time_Zone","displayed_link":"https://en.wikipedia.org › wiki › Mountain_Time_Zone"}],"snippet":"America/Denver is a time zone in United States. It corresponds to the Mountain Standard Time (MST).","snippet_highlighted_words":["United States"],"thumbnail":"https://serpapi.com/searches/64590f407f836163a84eff66/images/6d20911aeb0d64ec26e1f63ef2e6c3f28ed8ae8265617e28.png","about_this_result":{"source":{"description":"travelguide.de was first indexed by
Thought:

ChatGPT responds with a final answer. The train sees the magic Final Answer: text and returns the result:

Based on the AnswerBox response, the timezone America/Denver belongs to the United States.
Final Answer: The timezone belonging to the user derek@glass.io belongs to the United States.

TL;DR

Just like how an LLM can generate text for a blog post (not this one though!), it can can also generate a plan to answer a question that requires using external tools. The most popular framework for this is ReAct (Reason + Act), which we can use in Ruby via the Boxcars gem. Boxcars handles generating the ZeroShot ReAct prompt, parsing the model completions for actions, and running those actions.