Creating a Knowledge-Based Chatbot with OpenAI Embedding API, Pinecone, and Langchain.js

In this tutorial, we'll walk you through the process of creating a knowledge-based chatbot using the OpenAI Embedding API, Pinecone as a vector database, and langchain.js as a large language model (LLM) framework. This chatbot will be able to accept URLs, which it will use to gain knowledge from and provide answers based on that knowledge.

Introduction

To create our knowledge-based chatbot, we will use the following technologies:

Pinecone: A vector database that helps us store and query embeddings.
OpenAI Embedding API: An API that provides embeddings for text inputs.
langchain.js: A JavaScript library for LLM frameworks that makes it easier to work with Pinecone and OpenAI.

What is a Vector Database?

Different types of databases Source: pinecone.io

Vector databases are used to store and query vectors efficiently. They allow you to search for similar vectors based on their similarity in a high-dimensional space. In this tutorial, we will use Pinecone as our vector database.

Embeddings are dense vector representations of data, such as text, images, or audio. In our case, we'll be using text embeddings generated by the OpenAI Embedding API. These embeddings help us find semantically similar content in the vector space.

Prerequisites

Before we dive into the details, make sure you have the following:

Familiarity with JavaScript, TypeScript, and Svelte
Basic understanding of web scraping and natural language processing (NLP)
Node.js and NPM installed on your system
An OpenAI API key
A Pinecone API key

Overview

Our chatbot will consist of the following components:

Web scraper: to extract content from given URLs
Text splitter: to split the acquired text into chunks for processing
Embedding generator: to create embeddings from the text chunks using OpenAI's Embedding API
Pinecone vector store: to store and retrieve embeddings efficiently
Langchain LLM: to provide question-answering capabilities based on the embeddings

Architecture diagram

Let's dive into each component.

1. Web Scraper

To build the web scraper, we will use Puppeteer, a Node.js library that provides a high-level API to control headless Chrome or Chromium browsers.

async function scrape_researchr_page(url: string, browser: Browser): Promise<string> {
	const page = await browser.newPage();
	await page.setJavaScriptEnabled(false);
	await page.goto(url);

	const element = await page.waitForSelector('#content > div.row > div', {
		timeout: 100
	});

	if (!element) {
		throw new Error('Could not find element');
	}

	// keep only content elements (like p, h1, h2, h3, h4, h5, h6, li, blockquote, pre, code, table, dl, div)
	await element.evaluate((element) => {
		const elements = element.querySelectorAll(
			'*:not(p, h1, h2, h3, h4, h5, h6, li, blockquote, pre, code, table, dl, div)'
		);
		for (let i = 0; i < elements.length; i++) {
			elements[i].parentNode?.removeChild(elements[i]);
		}
	});

	const html_of_element = await element.evaluate((element) => element.innerHTML);

	return turndownService.turndown(html_of_element);
}

Of course, you will need to adapt the scrapper for your pages.

At the end of the function, we use the turndown library. It is a library that allows to convert HTML into Markdown. We have added that step because it will be later easier to split and GPT have a better understanding of Markdown than HTML. It is also a lighter syntax than HTML, meaning less tokens, thus cheaper API calls.

2. Text Splitter

The next step is to split the acquired text into smaller chunks for processing. We will use the MarkdownTextSplitter class from the langchain/text_splitter package. This class takes a chunkSize and chunkOverlap parameter to control the size and overlap of the generated text chunks.

const textSplitter = new MarkdownTextSplitter({
    chunkSize: 1000,
    chunkOverlap: 20
});
const chunks = await textSplitter.splitText(markdowns.join('\n\n'));

Ideally, we want one information per chunk. If you have very structured markdown files, one chunk could be equal to one subsection. In our case, the markdown comes from HTML and is badly structured, we then really on fixed chunk size, making our knowledge base less reliable (one information could be split into two chunks).

3. Embedding Generator

We will use OpenAI's Embedding API to generate embeddings for our text chunks. We will create an instance of the OpenAIEmbeddings class from the langchain/embeddings package and pass it our OpenAI API key via the ENV OPENAI_API_KEY=<your_openai_api_key>.

const embeddingModel = new OpenAIEmbeddings({ maxConcurrency: 5 });

4. Pinecone Vector Store

Pinecone is a vector database designed for efficient storage and retrieval of high-dimensional vectors. We will use the PineconeStore class from the langchain/vectorstores package to store our generated embeddings.

We first initialize the client and connect to the index created on Pinecone dashboard (the vectors have 1536 dimensions).

import {PineconeClient} from "@pinecone-database/pinecone";

const client = new PineconeClient();
await client.init({
    apiKey: PINECONE_API_KEY,
    environment: PINECONE_ENVIRONMENT,
});

export const pineconeIndex = client.Index(PINECONE_INDEX);

This code will get embeddings from the OpenAI API and store them in Pinecone.

5. Langchain

To provide question-answering capabilities based on our embeddings, we will use the VectorDBQAChain class from the langchain/chains package. This class combines a Large Language Model (LLM) with a vector database to answer questions based on the content in the vector database.

In our example, we use the ChatOpenAI class from the langchain/chat_models package as our LLM, and the PineconeStore instance as our vector database.

const model = new ChatOpenAI({ temperature: 0.9, openAIApiKey: OPENAI_API_KEY, modelName: 'gpt-3.5-turbo' });

const chain = VectorDBQAChain.fromLLM(model, vectorStore, {
k: 5,
returnSourceDocuments: true
});

With these components in place, we can now create a SvelteKit endpoint to handle user requests. The POST request handler will receive the user's query, call the VectorDBQAChain, and return the chatbot's response along with the source document (if available).

export const POST = (async ({ request }) => {
    const { text } = await request.json();
    if (!text) {
        throw error(400, 'Missing text');
    }
    if (text.length > 200) {
        throw error(400, 'Text too long');
    }

    try {
        const response = await chain.call({ query: text });
        const { text: responseText, sourceDocuments } = response;

        return json({
            text: responseText,
            sources: sourceDocuments
        });
    } catch (e) {
        console.log(e);
        throw error(500, 'Internal Server Error');
    }
}) satisfies RequestHandler;

Conclusion

We've demonstrated how to build a knowledge-based chatbot using the OpenAI Embedding API, Pinecone as a vector database, and Langchain. This chatbot can acquire knowledge from given URLs and answer user queries based on that knowledge.

By combining web scraping, text processing, embeddings, vector databases, and LLMs, you can create powerful chatbots that can learn and provide useful answers based on the content they consume.

Find the reference code on GitHub.