Playing with GPT-3, LangChain, and the OpenAI Embeddings API
Over the last few weeks, my imagination has been captured by the potential of using Large Language Models (LLM) for interesting new applications to existing old patterns.
It all started while playing with ChatGPT, a GPT-3 powered interactive chatbot that responds to your messages as if it were a human that had read the entire body of the internet. The thing that struck me was how well it performed when summarizing text.
After playing with it for a few hours, I had the idea that it might be cool to try and build a tool that could read an entire corpus of developer documentation, and then answer arbitrary questions by referencing the documentation text. The endgame with this idea is that eventually, you could make something that responds with fully functional code responses based on your questions. Perhaps this could be done by using the Codex model.
Luckily the AI space is moving very quickly these days, and new ideas/techniques, and tools are being released every day. One such tool is LangChain, which helps you build applications with LLMs through composable chains of inputs/outputs. For example, you can google a question using the SerpAPI, and feed those search results as context into your GPT-3 prompt, as a means of giving it the ability to interact with recently published information.
Another cool use case is wiring together the Wolfram Alpha API with GPT-3. GPT-3 is bad at math, numbers and statistics, but if you first query Wolfram Alpha, then pass its output as context to a GPT-3 prompt, you can get shockingly good results. GPT-3 shines at the language, while Wolfram Alpha shines at math and statistics -- a perfect pairing! It's worth a few minutes to play with the demo here.
LangChain also has some interesting examples that employ the ideas from the ReAct: Synergizing Reasoning and Acting in Language Models paper, which demonstrates using chain of thought to reason about the input, and query the underlying resources that may hold the data. Geoffry Litt recently shared an excellent post about stringing together NBA statistics via Statmuse with GPT-3 to generate reasonable-ish answers to questions like how many points are the boston celtics allowing on defense per game this nba season 2022-2023? how does that compare to their average last season, as a percent change?
Building a Documentation Question Answering Agent
I wanted to build a tool that could read an entire corpus of developer documentation, and then answer arbitrary questions by referencing the documentation text. Unfortunately, that's sort of hard as an initial starting point, so I landed somewhere else entirely!
Instead, I chose to start with a much easier proof of concept. The idea is that I could index a body of text, index all of the text into an embedding space, then store all of the embeddings in a vector store and index, which can be searched using a tool like FAISS. Then, when a user asks a question, I can use the FAISS vector index to find the closest matching text, feed that into GPT-3 as context, and return a GPT-3 generated answer that accurately answers the question.
For generating the embedding vectors, I used the OpenAI Embeddings API, with the text-embedding-ada-002 model. When you pass it text, it returns a series of 1536 dimension vectors. These "vectors" are an array of 1536 floating point numbers, which map to a complex coordinate system. Then using FAISS, you can quickly perform a "similarity search", which attempts to find similar text. I chose FAISS because it's free and easy to use with a local .pkl memory dump file.
I chose to start with the CloudFlare markdown documentation, which is just a collection of markdown files hosted here on github.
How it works
The flow of the text indexing process works something like:
- Clone the CloudFlare docs repo
- Split up all of the text into 1500 character chunks.
- Send those chunks to the OpenAI Embeddings API, which returns a 1536 dimensional vector for each chunk.
- Index all of the vectors into a FAISS index.
- Save the FAISS index to a .pklfile.
The code, which I've written to a file named ingest.py looks like this:
import faiss
import pickle
from pathlib import Path
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
# Build a list of all markdown files in the cloudflare docs repo.
ps = list(Path("cloudflare-md-docs/").glob("**/*.md"))
# Build a list of all of the text, while also tracking the source of each text chunk.
data = []
sources = []
for p in ps:
    with open(p) as f:
        data.append(f.read())
    sources.append(p)
# Here we split the documents, as needed, into smaller chunks.
# We do this due to the context limits of the LLMs.
# Next, we split the text into 1500 character chunks. This is because GPT-3 can
# currently only handle a maximum of 4,000 tokens.
text_splitter = CharacterTextSplitter(chunk_size=1500, separator="\n")
docs = []
metadatas = []
for i, d in enumerate(data):
    splits = text_splitter.split_text(d)
    docs.extend(splits)
    metadatas.extend([{"source": sources[i]}] * len(splits))
# Create list of docs that are greater than 1500 characters.
bad_docs = [ i for i, d in enumerate(docs) if len(d) > 1600 ]
# delete docs that are > 1600 characters (these are mostly noise, like pem keys)
for i in sorted(bad_docs, reverse=True):
    print('deleting doc due to size', f'size:{len(docs[i])} doc: {docs[i]}' )
    del docs[i]
    del metadatas[i]
# Here we create a vector store from the documents and save it to disk.
store = FAISS.from_texts(docs, OpenAIEmbeddings(), metadatas=metadatas)
faiss.write_index(store.index, "docs.index")
store.index = None
with open("cloudflare_docs.pkl", "wb") as f:
    pickle.dump(store, f)
Then the flow of the question answering process works something like:
- User asks a question.
- FAISS index is loaded into RAM
- User's question is sent to the OpenAI Embeddings API, which returns a 1536 dimensional vector.
- The FAISS index is queried for the closest matching vector.
- The closest matching vector is returned, along with the text that it was generated from.
- The returned text is fed into GPT-3 as context in a GPT-3 prompt
- GPT-3 generates a response, which is returned to the user.
The code, which I've written to a file named qa.py looks like this:
import argparse
import pickle
import faiss
from langchain import OpenAI
from langchain.chains import VectorDBQAWithSourcesChain
# Parse the question string from the first CLI argument.
parser = argparse.ArgumentParser(description='Ask a question to the cloudflare docs.')
parser.add_argument('question', type=str, help='The question to ask the cloudflare docs')
args = parser.parse_args()
# Load the FAISS index from disk.
index = faiss.read_index("docs.index")
# Load the vector store from disk.
with open("cloudflare_docs.pkl", "rb") as f:
    store = pickle.load(f)
# merge the index and store
store.index = index
# Build the question answering chain.
chain = VectorDBQAWithSourcesChain.from_llm(llm=OpenAI(temperature=0, max_tokens=1500, model_name='text-davinci-003'), vectorstore=store)
# Run the chain.
result = chain({"question": args.question})
# Print the answer and the sources.
print(f"Answer: {result['answer']}")
print(f"Sources: {result['sources']}")
Results
I spent about an hour slapping this all together and making sense of all of the moving pieces, and I'm quite honestly surprised with the results. Both in that, I was able to build a proof of concept in about an hour, but also impressed that seems to work at all! I was able to ask a few random questions and get a few reasonable answers.
Below I've included a few examples of questions + answers that I was able to ask.
python qa.py "How do I add a new domain to cloudflare?"
Answer:
To add a new domain to Cloudflare, log in to the Cloudflare dashboard, click "Add site", enter the website's root domain, and then click "Add Site". Then, log in to your domain registrar account and update the nameservers to the provided Cloudflare nameservers. Finally, select your domain in the dashboard > Overview > Re-check now.
Sources:
cloudflare-md-docs/fundamentals/get-started/setup/add-site.md
cloudflare-md-docs/fundamentals/account-and-billing/account-setup/create-account/
cloudflare-md-docs/dns/zone-setups/full-setup/setup/
cloudflare-md-docs/fundamentals/get-started/basic-tasks/manage-domains/move-domain.md
python qa.py "Can you summarize how to setup email routing, in 5 steps?"
Answer:
To setup email routing, the following 5 steps should be followed:
1) Log in to the Cloudflare dashboard and select your account and domain;
2) Go to Email > Email Routing and select Get started;
3) Enter the custom email address you want to use in Custom address;
4) Enter the full email address you want your emails to be forwarded to in Destination address;
5) Select Create and continue.
Sources:
cloudflare-md-docs/email-routing/get-started/enable-email-routing.md
python qa.py "what is web3?"
Answer:
Web3 is a set of technologies that enables hosting of content and web applications in a serverless manner by leveraging distributed systems and consensus protocols.
Sources:
cloudflare-md-docs/web3/_index.md
python qa.py "how does cloudflare work with web3?"
Answer: 
Cloudflare's Web3 Gateways provide HTTP-accessible interfaces to Web3 networks, allowing users to read and write data to the Ethereum network without installing any software on their computer. Cloudflare's edge network of data centers in ove 270 cities allows content to be cached and served from data centers close to end users.
Sources:
cloudflare-md-docs/web3/about.md
cloudflare-md-docs/web3/_index.md
cloudflare-md-docs/web3/ethereum-gateway/_index.m
cloudflare-md-docs/web3/get-started.md
python qa.py "what does Cloudflare Security Insights do?"
Answer:
Cloudflare Security Insights provides a list of insights related to Cloudflare account settings, DNS record configuration, SSL/TLS certificates configuration, Cloudflare Access configuration, Cloudflare WAF configuration, threats stopped, threat identification and mitigation, rate limiting protection, and bad bot detection and mitigation.
Sources:
cloudflare-md-docs/security-center/about/how-it-works.md
cloudflare-md-docs/security-center/tasks/review-insights.m
cloudflare-md-docs/security-center/about/_index.md
cloudflare-md-docs/fundamentals/data-products/analytics-integrations/elastic/index.md
This next one is my favorite. I thought I was throwing it a curveball, and it handled it with grace, by responding with "I don't know."
python qa.py "is cloudflare better than cloudfront?"
Answer:
I don't know.
Sources:
None
Conclusion
This was a fun little hello-world toy project, which was heavily based on the LangChain Notion QA example by @hwchase17. I've been spending a bunch of time in the LangChain discord channel, if any of this stuff interests you, I'd recommend checking it out -- it's a blossoming community of smart people working on interesting problems.
If you're interested in playing with this on your own, I've published the code on GitHub, and also included the FAISS index and vector store that I built from the Cloudflare docs. To run the code, you'll need to make sure you have an OpenAI API key, and that the OPENAI_API_KEY environment variable is set.
You can find the code on Github at jakedahn/cloudflare-agent-qa.
Also, for what it's worth, this is not a very practical/viable real-life application, due to cost. Getting all of the text embeddings from the OpenAI Embeddings API cost ~$4, and each query costs ~$0.06/query. To do this "for real", you'd probably want to explore using some open source embeddings like Sentence Transformers, which are much cheaper to compute.
This hello-world project was a fun way to explore the world of LLMs, and get a feel for how LangChain works. In some ways, it feels like we're entering a new era of writing software; where the inputs and outputs are less deterministic and more probabilistic. I'm not sure where any of this leads, exactly, but I'm excited to see where it goes.
I for one welcome our new AI overlords.
 shruggingface
shruggingface