Part 6: Useful ChatGPT Libraries: Productization and Hardening

You might be surprised to learn that developing large language model applications based on ChatGPT doesn’t require extensive coding of the model itself. Instead, a significant portion of the development process focuses on the application architecture and managing the operational environment that supports your LLM engine.

These supportive tools are intended to streamline developers’ work, helping with data organization, memory optimization, and task monitoring, among other things. In this article, we’ll examine some frameworks that help you do this more closely and discuss their role in production environments.

The LangChain Framework

We’ll start with LangChain, a framework that’s been gaining attention for its efficacy in building LLM-powered applications.

LangChain, like many LLM tools, operates on top of OpenAI’s GPT model by default, and you would need an API key from OpenAI to run it. But first, don’t forget to install the Python library by running: pip install langchain.

What makes LangChain so effective is its modular approach. It breaks down functionalities into distinct “chains” that developers can easily modify or exchange to construct a customized template that meets their specific requirements.

An infographic depicting Langchain's modular approach. — LangChain’s modular approach

Langchain chains include modules such as:

Models: This module plays a central role in your LLM application, supporting integrations of different model types from providers like OpenAI, Hugging Face, Anthropic, Cohere, and more.
Prompt templates: These pre-defined recipes for creating prompts are structured in diverse ways to elicit varied responses.
**Indexes:**a newly released module enables the integration of LLMs with your existing data and helps manage previously processed documents.
Chains: enable the creation of call sequences that incorporate multiple models or prompts.
Agents: analyze user requests, make decisions, and select the appropriate instrument to complete a task, operating in a loop until a solution is found.
Memory: This feature maintains the state between chain or agent calls.

Dynamic Prompts

Let’s bring LangChain to life with a simple script. Here, we’re pairing OpenAI and LangChain for a straightforward text completion task:

from langchain.chat_models import ChatOpenAI
from langchain import PromptTemplate, LLMChain

template = """Question: {input}
Analyzing the information provided step by step.
Answer: """

prompt = PromptTemplate(template=template, input_variables=["input"])

llm = ChatOpenAI(model_name="gpt-3.5-turbo")
llm_chain = LLMChain(prompt=prompt, llm=llm)

question = """Who discovered penicillin?"""

llm_chain.run(question)

The response will look something like:

Step 1: Identification of the Key Question
Answer: The main question is: "Who discovered penicillin?"

Step 2: Contextual Understanding
Answer: Penicillin is an antibiotic used in medicine.

Step 3: Historical Context
Answer: Tn 1928, scientist Alexander Fleming noticed a mold killing bacteria in a petri dish at St. Mary's Hospital in London.

Step 5: Identification of the Scientist
The discoverer of penicillin is Sir Alexander Fleming.

Answer: Sir Alexander Fleming discovered penicillin in 1928.

The PromptTemplate is your master builder for model inputs, consisting of a modifiable template text string. It’s designed for personalization through the use of input_variables. In our demo, the prompt creatively integrates the concept of “Let’s think step by step” segment into the query, setting a systematic thinking stage.

Our LLM model comes to life using the ChatOpenAI() function. The LLMChain() function then establishes a connection between the prompt and the model, creating a symbiotic pair or ‘chain’.

The final step involves calling the run() function, thereby presenting our question to the system. Upon the activation of run(), the LLMChain() orchestrates the prompt template by utilizing the provided input key values (and memory key values when present), passes the structured string to the LLM, and ends with the LLM response.

Dynamic prompts, although simplistic, have enormous potential for improving complex applications and optimizing prompt management.

Agents and Tools

Other key elements that play a crucial role in LangChain workflow are agents and tools. They facilitate the resolution of complex issues by enabling LLMs to execute actions and integrate various functionalities.

A tool in this context is a functional abstraction designed to make LLM interactions more straightforward. It works through a simple interface that takes one text input and returns one text output. LangChain includes many predefined tools, such as Google search, a calculator, a world weather forecast API, among others. Additionally, users have the option to develop and add custom tools to their agents, improving the agent’s flexibility and effectiveness.

On the other hand, an agent is responsible for coordinating various steps and accessing multiple tools, choosing the most suitable ones to answer the user’s query. This process allows the agent more time to form a strategy, which helps it manage more complex tasks effectively.

The operation of an agent can be summarized as the following:

The agent receives the user’s input.
It determines which tool to use and the text to input.
The chosen tool is activated with the input text, producing an output text.
The agent then incorporates the tool’s output into its context.
Steps 2-4 are repeated until the agent resolves it has all it needs to reply to the user directly.

The image included gives a visual representation of how an agent employs tools in LangChain.

An infographic depicting interaction between an agent and tools in LangChain. — Interaction between an agent and tools in LangChain

Memory

Memory and context understanding are essential for Q&A applications, especially when it comes to chatbots. LangChain can add memory to your applications using states.

For example, with ConversationChain, you can turn your language model into an interactive chatbot with just a few lines of code.

from langchain import OpenAI, ConversationChain
chatbot_llm = OpenAI(model_name='gpt-3.5-turbo')
chatbot = ConversationChain(llm=chatbot_llm , verbose=True)
chatbot.predict(input='Hi!')

In the snippet above, the method predict(input='Hi!') prompts the chatbot to respond to the greeting Hi!. The model replies as follows:

> Initiating new ConversationChain sequence...
Prompt after processing:
Below is a friendly dialogue between a human and an AI. 
The AI is characterized by its eloquence and propensity for 
providing detailed information, drawing from its contextual 
understanding. Should the AI find a question beyond its 
knowledge, it will honestly admit its limitation.
Ongoing conversation:
Human: Hi!
AI:
> Finished chain.
' Hi! How can I assist you today?'

For better performance evaluation, setting the parameter verbose=True in ConversationChain can give us insight into the underlying prompt structure used by LangChain. When predict(input='Hi!') is invoked, the LLM is fed a comprehensive prompt, bounded by the markers > Initiating new ConversationChain sequence... and > Finished chain.

As the conversation continues, this function keeps track of the entire dialogue, updating the prompt as needed. For example, if you were to ask the bot’s name next, the prompt would update accordingly:

> Initiating new ConversationChain sequence...
Prompt following refinement:
The following [...] does not know.
Ongoing conversation:
Human: Hi"
AI: Hi! How can I assist you today?
Human: What's your name?
AI:
> Finished chain.
'\n\nI'm ChatGPT, an AI created by OpenAI.'

Thus, we could see that through strategic operational design and memory management techniques, the ConversationChain class effectively turns any text-filling LLM into an interactive chat tool.

But how can we customize an application specifically for our data?

The LlamaIndex Framework

Let’s now explore the integration of llama-index (previously GPT-index) into production environments with a vector database like Pinecone.

LlamaIndex is a dynamic library designed to optimize Retrieval Augmentation (RAG) pipelines in your LLM. It’s particularly useful when you need to enrich your LLM with comprehensive data from various sources to minimize model hallucinations.

Key Features of LlamaIndex:

Data Loaders simplify data extraction from multiple formats, whether you’re dealing with APIs, PDFs, databases, or CSVs.
Nodes help structure your data more intricately, establishing connections among different points (e.g., SOURCE, PREVIOUS, NEXT, PARENT, CHILD). They’re especially useful when you need to logically organize multiple text pieces from PDFs with clear labels like ‘previous’ and ’next’, simplifying data exploration.
Additional Functions: LlamaIndex also offers support for other processes, such as re-ranking, after data retrieval.

Getting Started With LlamaIndex

First, you should set up the required libraries with the pip install -q llama-index pinecone-client command.

With your environment ready, consider working with a sample dataset like SQUAD, which features columns like ID, context, and title. In LlamaIndex, these document objects are central to the context of your data.

An example table depicting data from an example SQUAD dataset. — An example squad dataset.

In our case, we’re attaching just title within extra_info. However, you’re encouraged to extend additional parameters as per your project’s requirements.

from llama_index import Document

docs = []

for i, row in data.iterrows():
  docs.append(Document(
    text=row['context'],
    doc_id=row['id'],
    extra_info={'title': row['title']}
  ))
docs[1]

Thus, the above script effectively transforms your DataFrame into a list of Document objects, making them ready for indexing in llama-index.

Document(
  doc_id='5733bf84d058e614000b61be',
  embedding=None,
  extra_info={'title': 'University_of_Notre_Dame'},
  excluded_embed_metadata_keys=[],
  excluded_llm_metadata_keys=[],
  relationships={},
  doc_hash='4731d2eb1d86f2798922d48727e4a8e77a27afeecbcdc8c3cbb31d77f65ba5ec',
  text: "As at most other universities, Notre Dame\'s students run…",
  start_char_idx=None,
  end_char_idx=None,
  text_template='{metadata_str}\n\n{content}',
  metadata_template='{key}: {value}',
  metadata_seperator='\n')

Creating Embeddings With LlamaIndex and ChatGPT

The next step involves the creation of embeddings, which are crucial for searching through your dataset. To do so, you should be ready to set up your OpenAI API key and initialize a SimpleNodeParser.

import os
os.environ['OPENAI_API_KEY'] = 'OPENAI_API_KEY' # Retrieve this from platform.openai.com

This parser will turn our list of Document objects into nodes, which are the basic units that llama_index uses for indexing and querying.

from llama_index.node_parser import SimpleNodeParser

parser = SimpleNodeParser.from_defaults()

nodes = parser.get_nodes_from_documents(docs)
print(nodes[1]) # Let's check out the first node!

Think of a node as a document object but with additional information about its relationship to other documents in your database.

An infographic depicting the relationship between a document and a node. — The relationship between a document and a node in LlamaIndex

For example, if you have several chunks of text from a PDF, a node will contain information about the order and relationship of these chunks. It knows that “Chunk One” comes before “Chunk Two”, and so on, providing a relational context that a standard document doesn’t have.

Here’s what a node looks like:

TextNode(
  doc_id='ed04a675-a461-49ac-91c5-6eaf32bf72b5',
  embedding=None,
  extra_info={'title': 'University_of_Notre_Dame'},
  excluded_embed_metadata_keys=[],
  excluded_llm_metadata_keys=[],
  relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(
    node_id='5733bf84d058e614000b61be',
    node_type=None,
    extra_info={'title': 'University_of_Notre_Dame'},
    doc_hash='4731d2eb1d86f2798922d48727e4a8e77a27afeecbcdc8c3cbb31d77f65ba5ec')},
  doc_hash='4731d2eb1d86f2798922d48727e4a8e77a27afeecbcdc8c3cbb31d77f65ba5ec',
  text: "As at most other universities, Notre Dame\'s students run…",
  start_char_idx=None,
  end_char_idx=None,
  text_template='{metadata_str}\n\n{content}',
  metadata_template='{key}: {value}',
  metadata_seperator='\n')

While nodes contain similar information to documents, they’re distinct in that they form the basis of our vector database. They hold the relational information necessary for more complex operations and efficient searching within the database.

Indexing in Pinecone

Great progress so far! Now, we’re set to explore Pinecone, which is a managed vector database service perfect for machine learning applications. We’ll be storing our llama_index data in Pinecone, allowing us to efficiently manage embeddings from our LLM for semantic-based searches.

Initiate Pinecone with API key and environment values, both of which are available at no cost in the console.

import pinecone
import os

# Retrieve your API key and environment from the console at app.pinecone.io
os.environ['PINECONE_API_KEY'] = 'PINECONE_API_KEY' # Replace with your real API key!
os.environ['PINECONE_ENVIRONMENT'] = 'PINECONE_ENVIRONMENT' # and with environment

# Initialize Pinecone
pinecone.init(
  api_key=os.environ['PINECONE_API_KEY'],
  environment=os.environ['PINECONE_ENVIRONMENT']
)

# Create the index if it doesn't exist
index_name = 'index'
if index_name not in pinecone.list_indexes():
  pinecone.create_index(
    index_name,
    dimension=1536, # Match this with the text embedding model's dimension
    metric='cosine' # Cosine is usually efficient for text embeddings
  )

# Connect to the index
pinecone_index = pinecone.Index(index_name)

If this is your first time running this, don’t worry, your index won’t exist yet. When creating the index, it’s important to match the dimensionality to that of the text-embedding-ada-002 model, which is 1536. You’re free to choose a different metric, but in our example, we’re using cosineas it’s typically the most efficient for similarity calculations in text embeddings.

After setting up the index, we’ll connect to it. This is where PineconeVectorStore comes into play. It’ll function as our interface for storing and retrieving document embeddings.

from llama_index.vector_stores import PineconeVectorStore
vector_store = PineconeVectorStore(pinecone_index=pinecone_index)

Next, we’ll call ServiceContext to feed documents through our embedding pipeline and into our vector store which is stored in StorageContext. The result will be wrapped into a GPTVectorStoreIndex instance which handles the indexing and querying process.

from llama_index import GPTVectorStoreIndex, StorageContext, ServiceContext
from llama_index.embeddings.openai import OpenAIEmbedding

# Preparing our storage venue (aka vector db)
storage_context = StorageContext.from_defaults(
  vector_store=vector_store
)
# setup the index/query process
embedding = OpenAIEmbedding(model='text-embedding-ada-002', embed_batch_size=100)
service_context = ServiceContext.from_defaults(embed_model=embedding)

# Voilà! Our index, born from documents and nurtured by contexts!
index = GPTVectorStoreIndex.from_documents(
  docs, storage_context=storage_context,
  service_context=service_context
)

One critical parameter here is the embedding batch size. By default, it processes data in batches, which means it sends text chunks to OpenAI, receives the embeddings, and then stores them in Pinecone. We’ve set the batch size to 100, allowing us to send larger batches to OpenAI and then to Pinecone. This approach reduces the number of required requests, effectively speeding up the process since it cuts down on network latency.

Querying with LlamaIndex

Now that we’ve built our index, we can start having some real fun: querying. Think of the query engine as your key to unlocking the information stored in that index. It’s not complicated–it’s essentially our index reformatted into a query-friendly form.

Take a look at how simple it is:

query_engine = index.as_query_engine()
response = query_engine.query("What are the various student-run media outlets at the University of Notre Dame?")
print(response)

And just like that: "The various student-run media outlets at the University of Notre Dame include three newspapers, a radio station, a television station, several magazines and journals, and a yearbook."

Easy, right? That’s the beauty of LlamaIndex in action.

Conclusion

Orchestration frameworks, like LangChain and LlamaIndex, have significantly simplified the usage of ChatGPT and other LLMs. They’re capable of handling various tasks, from data extraction and to memory maintenance across multiple LLM interactions.

LangChain stands out as a key library in the LLM space due to its extensive set of tools and modules. Its versatility in integrating different models, managing prompts, sequencing chains, processing agents, and employing memory management opens new doors for developers.

Meanwhile, LlamaIndex presents itself as a powerful competitor, carving out a niche with its distinctive approach to handling document structures. This unique method underscores its potential and reinforces its position as an influential entity in the dynamic field of language model orchestration.