Another reason Chatgpt might not work well out of the box is that Chatgpt is not up to date. In its current version (as of June, 2023), it is trained up to 2021. That means that Chatgpt has no knowledge about what happened in 2022 or 2023. Any questions whose answers change over time is therefore possibly wrong.

This is particularly important because it means that whenever data recency is relevant, there is room to build a product on top of Chatgpt that potentially fixes this problem. And several products like that have already been built or people are building them right now. For instance, one of the most common areas of development is integrating Chatgpt with news, so that if you ask it who won the last world cup it actually correctly returns Argentina, but it defaults to Chatgpt for older information (France won the world cup in 2018). OpenAI plugins specifically try to address this issue and Google's Bard also is mostly focused on this.

The way you solve this problem is typically very similar to the previous lesson. If you are building an application of Chatgpt on a specific vertical, you need to find a data source with up-to-date information, scrape it, and pass it inside the prompt when needed, so that Chatgpt has its own knowledge up to 2021 + the up to date information from the specific data source.
Note that even in the previous lesson there was a clear example of the importance of having up-to-date data. When we asked Chatgpt what employees dislike the most about working at FB, some of those reasons were unlikely to show up prior to 2022.

When it comes to text data related to people sentiment and/or knowledge, data can change very quickly and suddenly. So not only you need recent data, but you might also need to disregard older data, which in certain cases might be counterproductive.



Importance of date


Let's do the exact same analysis as before on FB employee reviews, but using several different cut-off dates and compare them.





This includes only reviews starting Nov 2022, after the layoffs.

library(tidyverse)
library(openai)

#this is a fake key just to show how to use it. You should store it as an environment variable as previously explained
openai_api_key = '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'

#read FB data scientist reviews previously scraped from glassdoor
data=read.csv("https://drive.google.com/uc?export=download&id=XXXXXXXXXX")

#make it into a date
data$Date = as.Date(data$Date, "%B %d, %Y")

#only keep data from Nov 2022, after the layoffs
data_after = subset(data, Date>"2022-11-01")

#build the prompt
question = "Based on these negative employee reviews, give me the top 3 reasons employees working at Facebook dislike their job."
context = "Employee reviews:"
reviews_prompt = paste(data_after$Cons, collapse= "\n")
prompt_string = paste(question, context, reviews_prompt, sep="\n\n")

chat_example = create_chat_completion(model = "gpt-3.5-turbo",
                                      temperature=0,
                                      openai_api_key = openai_api_key,
                                      messages = list(list("role" = "system", "content" = "You help me summarize these reviews"),
                                                      list("role" = "user",  "content" = prompt_string)
                                      )
                                  )

#print the response. The gsub part is just cosmetic to make the output look better in the html
cat(gsub(pattern = "\n", replacement = "  \n", chat_example$choices$message.content))
Based on the negative employee reviews, the top 3 reasons employees working at Facebook dislike their job are:  
  
1. Poor work-life balance and high mental stress  
2. Unclear responsibilities, frequent reorgs, and toxic culture  
3. Bad management, biased performance ratings, and layoffs



As we can see the reviews are pretty similar to what we saw before. It makes sense. In the previous lesson we just picked the first 100 reviews based on Glassdoor sorting algorithm, and obviously date plays a big role in Glassdoor sorting algo. So pretty much we picked the most recent reviews even then.

Let's now check with slightly older reviews (Jan 2022 - Nov 2022), without including anything after or before that range. This is when the company was already feeling pressure from slower growth and stock price decline, but before any major layoff (the Covid layoff was a completely different situation).



#only keep reviews within that time range
data_before = subset(data, Date>"2022-01-01" & Date<"2022-11-01")

#build the prompt
question = "Based on these negative employee reviews, give me the top 3 reasons employees working at Facebook dislike their job."
context = "Employee reviews:"
reviews_prompt = paste(data_before$Cons, collapse= "\n")
prompt_string = paste(question, context, reviews_prompt, sep="\n\n")

chat_example = create_chat_completion(model = "gpt-3.5-turbo",
                                      temperature=0,
                                      openai_api_key = openai_api_key,
                                      messages = list(list("role" = "system", "content" = "You help me summarize these reviews"),
                                                      list("role" = "user",  "content" = prompt_string)
                                      )
                                  )

#print the response. The gsub part is just cosmetic to make the output look better in the html
cat(gsub(pattern = "\n", replacement = "  \n", chat_example$choices$message.content))
Based on the negative employee reviews, the top 3 reasons employees working at Facebook dislike their job are:  
  
1. Workload and work-life balance: Many employees complain about long working hours, frequent re-orgs, and lack of work-life balance.  
  
2. Uncertainty and lack of growth opportunities: Employees are unsure about the company's direction and future, and some feel that the company is becoming too corporate and slow. There are also complaints about lack of growth opportunities and promotions, and frequent changes in management.  
  
3. Pressure and negative culture: Employees feel pressure to perform and meet subjective performance reviews, and there are complaints about negative culture, politics, and lack of teamwork.



Now this is different. And if we go before 2022, we get an even slightly different picture of the situation:

#only keep reviews within that time range
data_before = subset(data, Date<"2022-01-01") 

#build the prompt
question = "Based on these negative employee reviews, give me the top 3 reasons employees working at Facebook dislike their job."
context = "Employee reviews:"
reviews_prompt = paste(data_before$Cons, collapse= "\n")
prompt_string = paste(question, context, reviews_prompt, sep="\n\n")

chat_example = create_chat_completion(model = "gpt-3.5-turbo",
                                      temperature=0,
                                      openai_api_key = openai_api_key,
                                      messages = list(list("role" = "system", "content" = "You help me summarize these reviews"),
                                                      list("role" = "user",  "content" = prompt_string)
                                      )
                                  )

#print the response. The gsub part is just cosmetic to make the output look better in the html
cat(gsub(pattern = "\n", replacement = "  \n", chat_example$choices$message.content))
Based on the negative employee reviews, the top 3 reasons employees working at Facebook dislike their job are:  
  
1. Poor work-life balance: Many employees feel overworked and under a lot of pressure to make an impact, which can lead to burnout and high turnover.  
  
2. Lack of clear direction and transparency: Some employees feel that the company is too big and bureaucratic, which can lead to a lack of internal transparency and difficulty in making an impact.  
  
3. Poor management: Many employees feel that middle management is poor, with unclear and constantly shifting performance expectations, leading to a lack of accountability and transparency. Additionally, some managers are inexperienced and refuse to listen to suggestions, leading to frustration and demoralization.



This is super interesting because you can actually see thanks to Chatgpt the unfolding of the entire story. Before 2022, even with stock price at all time high and everything seemingly great from the outside, data scientists were already questioning especially the middle management and the company essentially becoming too slow/big. These are the exact issues that FB leadership identified as the main things to fix, just that they did it much later, at the end of 2022.

At first, employees were mostly focused on their own performance and how to succeed inside the organization. Finally, in 2023, everything changes and the main concerns become layoffs, reorgs, job security, and criticizing the upper management. It is obviously easy in hindsight to say: "oh FB should have listened to its employees more", but in this case is incredibly true. Regardless, the two main points here are:

1. How you filter by date makes a huge difference in the outcome, likely more than what you are used to if you worked in product data science at tech companies

2. Chatgpt + glassdoor reviews is an amazing tool to support HR. Chatgpt is pretty much spot-on on everything it says and these kinds of products were just not possible a short time ago. The amount of suddenly new product possibilities is mind blowing

There are ways to check if anything major has happened and it would be better to remove old data or, more generally, do some sort of date-related data processing. For instance, you could cluster reviews by month and ask chatgpt to return keywords per cluster. A sudden change in keywords, such as "layoff" suddenly showing up, is a very good indicator that something has changed and you need to take action. We will go through clustering later in this course.

Chatgpt course - Full Curriculum





This includes only reviews starting Nov 2022, after the layoffs.

import openai
import pandas
import sys
import numpy as np
pandas.set_option('display.max_columns', 10)
pandas.set_option('display.width', 350)
#this can be important to print the entire text in the prompt
np.set_printoptions(threshold=sys.maxsize)

#this is a fake key just to show how to use it. You should store it as an environment variable as previously explained
openai.api_key = "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"

#read FB data scientist reviews previously scraped from glassdoor
data=pandas.read_csv("https://drive.google.com/uc?export=download&id=XXXXXXXXXXX")

#make it into a date
data['Date'] = pandas.to_datetime(data['Date'], format="%b %d, %Y")

#only keep data from Nov 2022, after the layoffs
data_after = data[data['Date']>"2022-11-01"]

#building the prompt
question = "Based on these negative employee reviews, give me the top 3 reasons employees working at facebook dislike their job."
context = "Employee reviews:"
reviews_prompt = '\n'.join(data_after.Cons)
prompt_string = f'{question}\n\n {context}\n\n {reviews_prompt}'

#chatgpt
chat_example = openai.ChatCompletion.create(
                    model="gpt-3.5-turbo",
                    temperature = 0,
                    messages=[{"role": "system", "content": "You help me summarize these reviews"},
                              {"role": "user", "content": prompt_string}
                              ]
                 )

print(chat_example["choices"][0]["message"]["content"])
Based on the negative employee reviews, the top 3 reasons employees working at Facebook dislike their job are:

1. Poor work-life balance and high mental stress
2. Unclear responsibilities, frequent reorgs, and toxic culture
3. Bad management, biased performance ratings, and layoffs



As we can see the reviews are pretty similar to what we saw before. It makes sense. In the previous lesson we just picked the first 100 reviews based on Glassdoor sorting algorithm, and obviously date plays a big role in Glassdoor sorting algo. So pretty much we picked the most recent reviews even then.

Let's now check with slightly older reviews (Jan 2022 - Nov 2022), without including anything after or before that range. This is when the company was already feeling pressure from slower growth and stock price decline, but before any major layoff (the Covid layoff was a completely different situation).



#only keep reviews within that time range
data_before = data[(data['Date']>"2022-01-01") & (data['Date']<"2022-11-01")]

#building the prompt
question = "Based on these negative employee reviews, give me the top 3 reasons employees working at facebook dislike their job."
context = "Employee reviews:"
reviews_prompt = '\n'.join(data_before.Cons)
prompt_string = f'{question}\n\n{context}\n\n{reviews_prompt}'

#chatgpt
chat_example = openai.ChatCompletion.create(
                    model="gpt-3.5-turbo",
                    temperature = 0,
                    messages=[{"role": "system", "content": "You help me summarize these reviews"},
                              {"role": "user", "content": prompt_string}
                              ]
                 )

print(chat_example["choices"][0]["message"]["content"])
Based on the negative employee reviews, the top 3 reasons employees working at Facebook dislike their job are:

1. Workload and work-life balance: Many employees complain about long working hours, frequent re-orgs, and lack of work-life balance. Some also mention that the company moves too slowly or too fast, depending on the team.

2. Lack of growth opportunities and management issues: Employees feel that there are limited growth opportunities, and managers quit frequently, leading to changing priorities and lack of direction. Some also mention that the performance review process is highly subjective, and there is pressure to perform.

3. Uncertainty about the company's future and negative publicity: Employees are unsure about the company's direction and future, and negative media narratives can weigh on them. Some also mention that the company is becoming too corporate and bureaucratic.



Now this is different. And if we go before 2022, we get an even slightly different picture of the situation:

#only keep reviews before 2022
data_before = data[data['Date']<"2022-01-01"]

#building the prompt
question = "Based on these negative employee reviews, give me the top 3 reasons employees working at facebook dislike their job."
context = "Employee reviews:"
reviews_prompt = '\n'.join(data_before.Cons)
prompt_string = f'{question}\n\n{context}\n\n{reviews_prompt}'

#chatgpt
chat_example = openai.ChatCompletion.create(
                    model="gpt-3.5-turbo",
                    temperature = 0,
                    messages=[{"role": "system", "content": "You help me summarize these reviews"},
                              {"role": "user", "content": prompt_string}
                              ]
                 )

print(chat_example["choices"][0]["message"]["content"])
Based on the negative employee reviews, the top 3 reasons employees working at Facebook dislike their job are:

1. Poor work-life balance: Many employees feel overworked and under a lot of pressure to make an impact, which can lead to burnout and high turnover.

2. Lack of clear direction and transparency: Some employees feel that the company is too big and bureaucratic, which can lead to a lack of internal transparency and difficulty in making an impact.

3. Poor management: Many employees feel that middle management is poor, with unclear and constantly shifting performance expectations, a lack of accountability, and a lack of support for professional development.



This is super interesting because you can actually see thanks to Chatgpt the unfolding of the entire story. Before 2022, even with stock price at all time high and everything seemingly great from the outside, data scientists were already questioning especially the middle management and the company essentially becoming too slow/big. These are the exact issues that FB leadership identified as the main things to fix, just that they did it much later, at the end of 2022.

At first, employees were mostly focused on their own performance and how to succeed inside the organization. Finally, in 2023, everything changes and the main concerns become layoffs, reorgs, job security, and criticizing the upper management. It is obviously easy in hindsight to say: "oh FB should have listened to its employees more", but in this case is incredibly true. Regardless, the two main points here are:

1. How you filter by date makes a huge difference in the outcome, likely more than what you are used to if you worked in product data science at tech companies

2. Chatgpt + glassdoor reviews is an amazing tool to support HR. Chatgpt is pretty much spot-on on everything it says and these kinds of products were just not possible a short time ago. The amount of suddenly new product possibilities is mind blowing

There are ways to check if anything major has happened and it would be better to remove old data or, more generally, do some sort of date-related data processing. For instance, you could cluster reviews by month and ask chatgpt to return keywords per cluster. A sudden change in keywords, such as "layoff" suddenly showing up, is a very good indicator that something has changed and you need to take action. We will go through clustering later in this course.

Chatgpt course - Full Curriculum