🎉
 Zelta joins Pendo
🎉
 Zelta joins Pendo for more intelligent product experiences
Read the news
🎉
 Zelta joins Pendo

From RAG’s to Riches: Building a Generative AI feature

Pierce Healy
CEO & Founder

In the previous article in this series, we evaluated possible use cases for AI within your product and tradeoffs to consider. Now we’ll use a working example to discuss some of the decisions you’ll face when building an AI feature

Feature we will build: AI Co-Pilot for Hiring Managers on LinkedIn

LinkedIn owns millions of job postings and candidate profiles, and therefore also owns important data on things like salary ranges, language that attracts the top candidates, candidate suitability and much more.  This opens up some high value use cases for using AI to assist hiring managers.

  • Write the best job post to  attract demographic X
  • Identify fair salary and benefit ranges for X role in Y location
  • Automate applicant screening

For this example, we will consider two simple use cases:

  1. AI Generated job posting based on title were hiring for
  2. Chatbot style Q&A tool to answer questions with LinkedIn data

Code repository for this prototype feature here

The dataset we will use is some publicly available job postings here

Constraints we must consider

Data size

Analyzing millions or even 1000s of job descriptions is not something third party API’s like GPT can do out of the box. The largest model, GPT 4 turbo, has a context window of 108,000 tokens meaning it can analyze a maximum of ~1000 job descriptions in one response, additionally performance degrades the longer the input.

Performance drop by context size on GPT-4 Turbo

Data types

Language Models like GPT are good at dealing with… language… they are not suited for quantitative tasks such as computing an optimized salary range or ranking performance of a post. Additionally, metadata such as company size and industry give important context when deciding on a job posting but will not be known by the model out of the box. Therefore we’ll need to pass these tasks somewhere else while utilizing the LLM, only for the tasks it does best...

Solution

To circumvent these issues we will use technique called: Retrieval Augmented Generation (RAG)

The RAG method allows you to automatically feed an LLM with the external data it needs based on the task to be completed.

Simple RAG methods could be inputting a variable to your prompt that looks up a value in your database, more sophisticated RAG methods typically use vector search to find data required for the action to be completed

To illuminate how this method can be used, we will step through a simple example 👇

Steps to Implement

Step 1: Build a model to pass relevant data from LinkedIn's database to the LLM based on the task at hand.

Creating embeddings of our dataset of millions of job postings, allows us to transform unstructured text into something that can be searched efficiently.

Embeddings in text analysis are a method of converting words or phrases into numerical vectors. These vectors represent the words in a way that captures their meanings and relationships to other words. For example, words with similar meanings will have similar numerical vectors. This technique helps computers process and understand text by translating it into a mathematical format that algorithms can work with.

More on embeddings here

💡Choosing an embeddings model

Embeddings models have been around a long time and have become somewhat commoditized but certain use case specific nuances exist. There are numerous open source and third party API options (Google, OpenAI, Cohere etc.). We've found the Cohere model effective with use cases containing modern business language.

For this example, the data we want to embed on is the title field attached to the job posting and the job posting itself. Each of these will have a separate use case

Creating an embeddings model 👇

#create embeddings model!pip install cohereimport cohereimport pandas as pdimport numpy as npapi_key = 'your-cohere-api-key'co = cohere.Client(api_key)# Get text embeddingsdef get_embeddings(texts,model='embed-english-v2.0'):  output = co.embed(                model=model,                texts=texts)  return output.embeddingsfrom sklearn.metrics.pairwise import cosine_similaritydef get_similarity(target,candidates):  candidates = np.array(candidates)  target = np.expand_dims(np.array(target),axis=0)  # Calculate cosine similarity  sim = cosine_similarity(target,candidates)  sim = np.squeeze(sim).tolist()  sort_index = np.argsort(sim)[::-1]  sort_score = [sim[i] for i in sort_index]  similarity_scores = zip(sort_index,sort_score)  # Return similarity scores  return similarity_scoresdf_clean['title_embeds'] = get_embeddings(df_clean['title'].tolist())embeds = np.array(df_clean['title_embeds'].tolist())df_clean.head()

Step 2: Saving embeddings

After creating embeddings of our dataset of job postings, we will want to save these to be used in user queries. For our example, we can store the embeddings directly in a pandas dataframe, however for a scalable production application we will want to use a vector database.

💡Choosing a vector database

A vector database is a special type of database that stores data as numbers arranged in a line (vectors). It's used to handle complex things like pictures, text, or sounds that are turned into numbers. This makes it good for searching and finding things that are similar, like matching images or finding texts that are alike. Common dedicated vector databases are Weaviate and Pinecone, there are also open source alternatives like Chroma and multi-product vendors like MongoDB. The decision to choose one over the other comes down to a tradeoff of necessary performance and cost. For very high dimensional data, i.e. datasets were many attributes must be considered, dedicated vectors stores such as Weaviate and Pinecone are best.

Step 3: Setting up our query

Based on some user input or action we will want to query our embeddings dataset to return data relevant to the task.

The user input should be designed in such a way as to ensure we have all of the information we need to run an effective search and crucially that it is in the correct format.

💡Parsing the input

Parsing on the input is required where there is some free form nature e.g. a chat window or open ended text field. For example, to effectively answer a question like "What salary ranges are typical for senior software engineers in New York City?" we will want to extract A) request type B) metadata to search on. In this case, Request type = question and associated meta data would be key: salary, title: senior software engineer, location: New York City. We can use GPT to help us in this parsing, by setting up a prompt to extract info from the input before running a search, this is illustrated in the example we walk through for Q&A in step 5 below. Once we have this info, we can search efficiently for required data and take users down a pre-defined path. In the case of our example, these two flows would be "user is asking a question" or "user wants to draft a job posting"

This technique can also be used for use cases like automatically applying filters to a dashboard or allowing Text to SQL queries in your application e.g. "Show me customers who have churned in the last 6 months?" in this case, we would want to extract metadata that aligns to fields in our database e.g. customer_segment: churned and date: within last 6 months.

💡Choosing retrieval thresholds

We will then create embeddings of the parsed user input to match with embeddings from our dataset. Two important decisions then arise, choosing a similarity threshold and setting a top-k value for retrieval logic

Similarity threshold usually measured by cosine similarity, refers to how close data should be to the search to be deemed relevant. When the format of the input and the data being queried are similar e.g. input "Senior Procurement Manager" querying a dataset of titles, we expect a close match and can use high cosine e.g. 0.75+. For scenarios, where exact matching is less likley, some more advanced techniques can help which we will which we will discuss later.

Choosing a top-k parameter, refers to the number of documents from our query dataset we want to include in the prompt. In this case, this means the number of job postings we would like to include. The number chosen here is again use case dependent, if the user had a specific question lets say "What salary did we agree to pay Pierce?" they would want a minimum number sources possibly even one document e.g. my employment agreement whereas if we want our answer to be based on aggregate of a sample e.g. "What salary should we offer an engineer in NYC?" we would consider multiple sources.

Step 4: Passing relevant data to the LLM to complete the job

Once you have relevant data from the embeddings query, we can complete our task with the LLM without hitting the context window/ data input limit.

💡Choosing an AI model

In choosing an LLM to use, we must weigh up cost, latency, performance, data size and privacy requirements. We discussed these tradeoffs at length in part one of this series here

To rehash, the capabilities of most common LLM options summarized here.

LLM capabilities

For our use cases, the features we will prioritize are high input limit and high performance, latency is not a huge issue, if it takes 20 seconds to receive a response that is still faster then a human completing these tasks, so we will go with GPT 4 Turbo.

Step 5: Completing the above steps for our two use cases

Use Case 1: Generating a job posting based on title were hiring for

  • Input: Title were hiring for
  • Output: Job Posting

Here user will input details on job posting and our get_embeddings function converts the input into embeddings representation

Creating embeddings on user input👇

# Add details for job posting (user input)my_company_website_url = 'https://www.zelta.ai/'my_HQ = "New York City"my_job_title = "Junior Software Engineer"# Get embeddings of the new querymy_job_title_embeds = get_embeddings([my_job_title])[0]

Next step is to search our vector database for job postings which match this title. We're using a cosine cutoff of 0.75 and k=50 (referenced as counter below) here.

Vector search👇

#find job postings from linkedin database relevant to title we are hiring for # Get the similarity between the search query and existing queriessimilarity = get_similarity(my_job_title_embeds, embeds[:15000])# Convert the similarity zip object into a list of tuples (index, similarity_score)similarity = list(similarity)# View the top 10 articlesprint('Search query:')print(my_job_title, '\n')def method(similarity, df_clean):    results_method = []    counter = 0    for idx, sim in similarity:        if sim > 0.75 and counter < 50:            results_method.append(f'"Title": {df_clean.iloc[idx]["title"]}, "Job Description": {df_clean.iloc[idx]["description"]}, "Salary": {df_clean.iloc[idx]["max_salary"]}, "Location": {df_clean.iloc[idx]["max_salary"]}')            counter = counter + 1        if counter >= 50:            break    return results_method, counterresults, count = method(similarity, df_clean)formatted_results = '\n'.join(results)print(formatted_results)print(count)

Finally we pass relevant data from our query and dataset to the LLM to draft a job posting

LLM completes the task👇

#Pass context to LLM from job posting dataset to create draft job posting !pip install openaiimport osimport openaiopenai.api_key = "your-API-key"analysis = openai.chat.completions.create(        model="gpt-4-1106-preview",        # We'll include a prompt to instruct the model what sort of description we're looking for                                        messages=[{"role": "system",                                                  "content": "Imagine you are a recruiter creating a job posting for LinkedIn"},                                                {"role": "user",                                                  "content": f"""Generate a job posting for this title '{my_job_title}' for this company: '{company_name}' who describes themselves as '{company_description}' based on these sample job descriptions '{formatted_results}'"""},                                                {"role": "user",                                                  "content": f"""In your answer, make sure to cover responsibilities relevant to the role of '{my_job_title}' at '{company_name}' and include a salary range and benefits package. Format it nicely and remember to do a good job! """},],                                                 temperature=0.5)job_description = analysis.choices[0].message.contentprint(job_description)

And our output is a draft job posting for the title we are hiring for

Output👇

**Job Title:** Junior Software Engineer  **Company:** Zelta AI  **Location:** [City, State or indicate if remote]  **Salary Range:** Competitive - [Specify Range Based on Research and Location]  **About Zelta AI:**  Zelta AI is the world's leading AI-powered customer insights platform, designed specifically for product teams within the SaaS industry. Our cutting-edge platform ingests, tags, organizes, and analyzes customer conversations and feedback data from various sources like Gong, Zoom, Zendesk, and Intercom. This enables product leaders to gain unparalleled insights into their customers' needs and preferences, driving informed decision-making and fostering product innovation.**Job Description:**  As a Junior Software Engineer at Zelta AI, you will contribute to a team that's at the forefront of transforming customer insights through artificial intelligence. You'll be part of a dynamic group of engineers responsible for developing and maintaining the AI-driven platform that helps our clients understand their customers like never before.**Responsibilities:**  - Collaborate with a team of experienced engineers to develop new features and improve existing ones on our AI platform.  - Write clean, maintainable, and efficient code in languages such as Python, Java, or JavaScript.  - Participate in the entire application lifecycle, focusing on coding, debugging, and testing.  - Contribute to the integration of user-facing elements with server-side logic and functionality.  - Assist in the development and maintenance of APIs to ensure seamless data exchange and integration across systems.  - Engage in peer code reviews and team discussions to continuously improve code quality and application performance.  - Stay updated with emerging trends in software engineering and AI technologies to suggest and implement improvements.  - Support the creation and maintenance of technical documentation.**Qualifications:**  - Bachelor's degree in Computer Science, Engineering, or a related field.  - 0-2 years of experience in software development, with a foundational understanding of programming languages and frameworks.  - Familiarity with cloud services such as AWS, Azure, or GCP is a plus.  - Strong problem-solving skills and an eagerness to learn and grow within a fast-paced environment.  - Excellent communication and teamwork abilities.**Benefits:**  - Competitive Base Salary + Performance Bonus  - Health, Dental, and Vision Insurance  - 401(k) Plan with Company Match  - Generous Paid Time Off and Flexible Working Arrangements  - Professional Development and Career Growth Opportunities  - Collaborative and Innovative Work Environment  **Join Us:**  At Zelta AI, you'll have the chance to make a significant impact on the future of customer insights and SaaS products. If you're ready to embark on an exciting career journey with a company that's shaping the future of AI-driven analytics, we'd love to hear from you. Apply now to become a part of our visionary team!---**How to Apply:**  Please submit your resume, a cover letter, and any relevant work samples via the LinkedIn job posting or directly on our careers page at [Zelta AI careers link]. Zelta AI is an equal opportunity employer committed to diversity and inclusion in the workplace. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, disability status, protected veteran status, or any other characteristic protected by law.

Use Case 2: Asking questions of the dataset to help us in hiring plans

  • User Input: Free form question
  • Output: Answer to the free form question

Setting up our query👇

# Add new querynew_query = "What salary and benefits should we offer to a junior software engineer?"
For the Q&A use case, we must do some additional parsing on the input to ensure the query is in right format given our dataset.
For this analysis, were using job title as the key, this can be expanded to be more flexible for more diverse question types.

Cleaning our query and extracting info to search on👇

#extract data required from question to complete search !pip install openaiimport osimport openaiopenai.api_key = "your-api-key"analysis = openai.chat.completions.create(        model="gpt-4-1106-preview",        # We'll include a prompt to instruct the model what sort of description we're looking for                                        messages=[{"role": "system",                                                  "content": "Imagine you are a recruiter creating a job posting for LinkedIn"},                                                {"role": "user",                                                  "content": f"""Extract only the job title from this query'{new_query}'"""},                                                {"role": "user",                                                  "content": f"""Respond with job title only and nothing else"""},],                                                 temperature=0.5)cleaned_query= analysis.choices[0].message.contentcleaned_query_embeds = get_embeddings([cleaned_query])[0]

Now again, we will find rows from our dataset relevant to the task user wants to complete using the cleaned up query.

Vector search👇

#find data from linkedin database relevant to the question asked# Get the similarity between the search query and existing queriessimilarity = get_similarity(cleaned_query_embeds, embeds[:15000])# Convert the similarity zip object into a list of tuples (index, similarity_score)similarity = list(similarity)# View the top 10 articlesprint('Search query:')print(cleaned_query, '\n')def method(similarity, df_clean):    results_method = []    counter = 0    for idx, sim in similarity:        if sim > 0.75 and counter < 50:            results_method.append(f'"Title": {df_clean.iloc[idx]["title"]}, "Job Description": {df_clean.iloc[idx]["description"]}, "Salary": {df_clean.iloc[idx]["max_salary"]}, "Location": {df_clean.iloc[idx]["max_salary"]}')            counter = counter + 1        if counter >= 50:            break    return results_method, counterresults, count = method(similarity, df_clean)formatted_results_QA = '\n'.join(results)print(formatted_results_QA)print(count)

And finally, pass this data with our original query to GPT to complete the task

LLM completes the task👇

#answer the query with GPT and our relevant data!pip install openaiimport osimport openaiopenai.api_key = "your-api-key"analysis = openai.chat.completions.create(        model="gpt-4-1106-preview",        # We'll include a prompt to instruct the model what sort of description we're looking for                                        messages=[{"role": "system",                                                  "content": "Imagine you are a recruiter creating a job posting for LinkedIn"},                                                {"role": "user",                                                  "content": f"""Answer this query: '{new_query}' to the best of your knowledge based on the data contained in the dataset below, in your response be careful to answer specifically what the query: '{new_query}' asks and nothing else, here is a dataset for context (some of this may not be directly relevent): '{formatted_results_QA}'"""},                                                {"role": "user",                                                  "content": f"""'For context, the company asking is: '{company_name}' who describes themselves as '{company_description}' and is based in {my_HQ}. List examples from dataset provided that you have used to answer this, dont use other sources"""},],                                                 temperature=0.1)answer = analysis.choices[0].message.contentprint(new_query, '\n')print(answer)

And our output this time, is a direct answer to the question with referenced sources from our dataset

Output👇

What salary and benefits should we offer to a junior software engineer? Based on the dataset provided, to determine an appropriate salary and benefits package for a junior software engineer, we can look at the entries for positions that are either for junior roles or require a similar level of experience. Here are the relevant examples from the dataset:1. "Title": Junior Project Developer, "Salary": 90000.0   - Benefits: Competitive Base Salary + Bonus, Health, Dental, and Vision2. "Title": Junior Web Developer, "Salary": 80000.0   - Benefits: Not explicitly mentioned, but given the role and requirements, similar benefits such as health, dental, and vision could be inferred.3. "Title": Software Engineer, "Salary": 80000.0   - Benefits: Active Secret Clearance required (this is more of a requirement than a benefit, but it indicates a level of responsibility and trust)Given that Zelta AI is based in New York City, which has a higher cost of living compared to other cities, and considering the provided dataset, a competitive salary for a junior software engineer could be in the range of $80,000 to $90,000. The benefits package should at least include health, dental, and vision insurance. Additionally, considering the innovative nature of Zelta AI, offering a bonus structure could also be attractive to potential candidates.It's important to note that the dataset does not provide specific benefits for junior roles other than the Junior Project Developer position. However, it is common for tech companies, especially in competitive markets like New York City, to offer additional perks such as stock options, 401(k) matching, flexible work hours, remote work options, professional development opportunities, and a positive company culture that fosters growth and learning.

Now we have a simple working prototype for the two use cases we considered. With updates to the code and prompts we can cater to further use cases like tweaking the language based on target demographic, maximizing for open rates and anything else which may be relevant to the user

This is rudimentary implementation and much more would be required to make this a production grade feature. Further enhancements we could make detailed below.

Advanced RAG techniques (more technical)

A drawback of this simple implementation of RAG is the format of the input can be difficult to parse e.g. our Q&A feature only works if the question contains a job title and that data related to job titles is required to complete the task.

Vector search can be thought of as advanced keyword matching, where strings of words can be matched based on semantic meaning rather than just keywords. As such, the text in the query or semantically similar text needs to be present in the dataset to return a match.

For example, in searching the job description dataset, questions phrased as “Job descriptions talking about software engineering” will return some results because the words “Job” “Description”, “Software Engineering” are likely to exist in the dataset. Whereas a question like  “Most frequently mentioned words” will not return good results.

Some advanced RAG techniques can be implemented, to help us with these challenges.

HyDe

Hypothetical Document Embedding, is a fancy name that essentially means asking to GPT to come up with some potential answers to the query and performing a search on these answers instead of the raw value. This is a novel and highly effective technique for Q&A use cases around within a defined context. More detail here

Decoupling Chunks Used for Retrieval vs. Chunks Used for Synthesis

A key technique for better retrieval is to decouple chunks used for retrieval with those that are used for synthesis.

The optimal chunk representation for retrieval might be different than the optimal consideration used for synthesis. For instance, a raw text chunk may contain needed details for the LLM to synthesize a more detailed answer given a query. However, it may contain filler words/info that may bias the embedding representation, or it may lack global context and not be retrieved at all when a relevant query comes in.

There’s two main ways to take advantage of this idea:

1. Embed a document summary, which links to chunks associated with the document.

This can help retrieve relevant documents at a high-level before retrieving chunks vs. retrieving chunks directly (that might be in irrelevant documents).

2. Embed a sentence, which then links to a window around the sentence.

This allows for finer-grained retrieval of relevant context (embedding giant chunks leads to “lost in the middle” problems), but also ensures enough context for LLM synthesis.

Structured Retrieval for Larger Document Sets

A big issue with the standard RAG stack (top-k retrieval + basic text splitting) is that it doesn’t do well as the number of documents scales up - e.g. if you have 1000s of different documents like in our example. In this setting, given a query you may want to use structured information to help with more precise retrieval; for instance, if you ask a question that’s only relevant to two job postings, using structured information to ensure those two job postings get returned beyond raw embedding similarity with chunks.

There’s a few ways of performing more structured tagging/retrieval for production-quality RAG systems, each with their own pros/cons.

1. Metadata Filters + Auto Retrieval Tag each document with metadata and then store in a vector database. During inference time, use the LLM to infer the right metadata filters to query the vector db in addition to the semantic query string.

  • Pros ✅: Supported in major vector dbs. Can filter document via multiple dimensions.
  • Cons 🚫: Can be hard to define the right tags. Tags may not contain enough relevant information for more precise retrieval. Also tags represent keyword search at the document-level, doesn’t allow for semantic lookups.

2. Store Document Hierarchies (summaries -> raw chunks) + Recursive Retrieval Embed document summaries and map to chunks per document. Fetch at the document-level first before chunk level.

  • Pros ✅: Allows for semantic lookups at the document level.
  • Cons 🚫: Doesn’t allow for keyword lookups by structured tags (can be more precise than semantic search). Also autogenerating summaries can be expensive.

Chat UX

The UX of our Q&A feature could be improved by using OpenAI's dedicated Assistants API which is better suited to chat style interactions i.e. where the user is asked questions by the LLM and the answers inform the next questions asked. This can be used to further refine the input beyond simply a title and take account of specific responsibilities within the company, years of experience etc.

Love to hear your feedback 👇

This is an exciting time of major disruption to product development. I'd love to hear about your experiences and ideas in the comments, or reach out directly

Tune in next for part three of this series where we will discuss the GTM implications of AI features, what you should consider around pricing and monitoring usage.

More BLOG POSTS