Applied and Generative AI – Applied

With technological advancements and increasing popularity of AI, many questions arise around the practical and ethical impact it will have across multiple industries. As an ever-changing landscape, it can be difficult to predict what's on the horizon and what the implications may be. In this article, we at Applied have endeavoured to cover the most common questions around this topic as it pertains to the hiring process.

What is ChatGPT?

Generative AI, including the currently most widely known version of it, ChatGPT, is a programme trained on large amounts of data. It uses AI to respond to natural language text prompts based on the information used in its training.

Generative AI (GAI), like the one that powers ChatGPT, works in a similar way to autocomplete. GAI predicts the best next word based on the contextual data it has, making it capable of writing highly plausible and readable text, but not necessarily accurate. For an in-depth look at what ChatGPT is and its boundaries we recommend reading this New Yorker article from computer science professor Cal Newport.

Will Generative AI change the way people work and/or the way recruitment is done?

Generative AI is set to change the ways we think, work and hire. It’s likely to change how people access large quantities of data, perform repeated tasks, and improve their work.

There are already examples of how ChatGPT can be used to write resumes, cover letters and even perform well on various standardised assessments. The strength of AI is in synthesising data, as well as identifying and reproducing patterns, to which those assessment methods are sensitive.

There is a lessened impact on Work Samples and Sift Questions as, in order to perform exceptionally well, these often require creativity, critical thinking and a candidate’s specific understanding of how a role would work.

Much like the use of Google to search for information, we expect this to become the new normal; using GPT to set up a draft or a template response to various tasks, and then working to add content and a more personal approach to it. This is a great use of technology to improve efficiency and generate a good structure for a response.

Can people use ChatGPT to answer Work Samples?

Yes, they can, and as GPT technology becomes more widespread and remains freely available, we see this manifest in a few ways. Instead of conducting a Google search and looking at multiple sources of data, candidates can use GPT as a more efficient way to search for an answer.

This tends to lead to more generalised content, and a replication of the obvious points across candidates who simply copy-paste an answer, rather than critically thinking and adapting the answer.

After an initial top-level analysis, we see this manifest in 3 main ways:

Generalised answers with broad content - well written but overly generalised, where answers tend to achieve middling scores. *(This varies by the type and content of question, with knowledge questions showing higher scores for AI than behaviour questions.)*
Using GPT for assistance with English phrasing, grammar and structuring an answer by candidates for whom English is a second language.
In a minority of cases, for Work Samples that are knowledge or research based, we can see some well scoring answers with a score of 4+. This is due to the efficiency of AI models in scanning and synthesising large quantities of existing data, which in many cases would be superior to a single person’s ability.

Before the availability of ChatGPT to the wider public, candidates who were unable or not confident in their ability to respond to a Work Sample question tended to self-select out of the process rather than put effort into an application. These candidates can now copy-paste an auto generated answer, reducing barriers to entry but requiring more time from the reviewers to sift them out.

At Applied, we require candidates to confirm that they are personally answering the questions in their application, including a recent amendment to confirm that the answer is not directly lifted generative software like ChatGPT.

Are work samples still effective, given ChatGPT?

While automated answers can provide general responses to Work Sample questions, they may lack specificity and fail to address the particular details of the scenario. In contrast, the best candidates will demonstrate their ability to reference the actual scenarios mentioned in the Work Sample, showcasing their attention to detail and problem-solving skills. Great sift answers will show a degree of critical thinking beyond what is showcased in an answer simply copied and pasted into an application.

Instead of asking general questions like 'how to run a marketing campaign', it's better to ask specific questions that reference a previous campaign. Take a marketing role as an example:

Good: asking candidates to describe the steps they would take to design a marketing campaign.
- “Design a marketing campaign for retail consumer goods”
Better: asking candidate to describe the steps they would take to design a marketing campaign for a specific product or purpose and how they would measure success. An excellent answer would take into account different audiences and product types, which AI may not do without specific prompts, and mediocre candidates may struggle to provide detailed and relevant answers.
- “Imagine we are setting up a marketing campaign for the launch of a new line of cat-themed backpacks for teens, can you describe the steps you’d take to design this campaign, what information you might look for, and how you will measure success?’”
- Even better: look at a specific past campaign you’ve run and ask the candidate to design that one. That would also make the review guide easier to write as you’ll know what could go wrong and what approach you’d like to see.

A good answer might reference general tenets of campaign creation, while an excellent answer would look into how marketing for different audiences might differ (teens vs. adults vs. B2B), and how the marketing of the specific product (consumer facing backpacks) would be done as opposed to another type of product (e.g. software or electronics).

Creating specific Work Samples that reference the challenges faced on the job is key to identifying outstanding candidates, as it allows them to showcase their knowledge and problem-solving skills. An excellent answer would go into detail and specificity, which AI and mediocre candidates may struggle to achieve. This also leads to increased candidate satisfaction and a positive candidate experience when they can engage with challenges they may face on the job.

Will people be able to 'cheat' their way into a job?

Cheating on job applications has always been possible, and generative AI makes it easier to create a basic application using both traditional (CV, cover letter) and newer assessment methods. With the introduction of resilient assessments such as Work Samples looking for specificity and uniquely human abilities, the risk of a miss-hire is fairly low.

So far, with questions testing for specific skills and tasks rather than broad knowledge, we’ve seen a trend in which the top scoring candidates tend to be those with their own answers. The focus on Work Samples also helps to circumvent the risks of hiring on the basis of auto-generated cover letters, and the known risks of over-indexing based on credentials rather than an assessment of skill.

With a multi-layered approach to hiring, based on multiple assessments (e.g. sift + interviews) and not a single one, the risk of candidates being hired without actually possessing the required skills is even lower, especially where candidates are interviewed and asked about work scenarios in live settings.

Highlight in the job ad that this will be part of the process, to dissuade candidates from faking their way through an initial sorting, exactly in the same way that people understand fake CVs will not get them through an interview where they are questioned on specific details.

Are there benefits to ChatGPT that can help us with recruitment?

As ChatGPT is really an advanced form of Googling and synthesising content that can then be moulded to your liking, you can use it in a few ways to improve your hiring. From generating a basic template for a job description, through crafting candidate communications or creating broad scenarios for interviewing.

We would not recommend using verbatim output because it will often require your own expertise to shape, but you can ask ChatGPT to help generate an outline of a job description with some skills related to a role, and then adapt that to your specific organisation’s needs. You can also ask ChatGPT for assistance with generating some skeleton work scenarios, by feeding it a job description and then trying the following process to see if it generates helpful inspiration as a starting point:

List 8 skills in bullet point that are needed to be great at this job
Can you rewrite those skills using fewer than 32 words?
Write 4 interview questions that assess each of the above listed skills without using ‘tell me a time when’ phrasing.
Rewrite the above questions without asking about past experience
Rewrite the above questions into a scenario to give the questions more context

While technology can provide helpful guidance for creating work sample questions, specific industry or role expertise is still needed to create the most predictive assessments. One effective way to generate good Work Sample questions is to use past projects or challenges your team has faced as scenarios for candidates to respond to.

Given the above, Applied has put a 3-step plan into action

Control to hiring team

Giving you the tools needed to create robust work samples and review guidelines for the hiring team, to identify stand-out candidates above others; tools to handle larger volumes of applications (which may happen with less self-selection from candidates), and best practice support for layering multiple types of assessments.

Detect

There's an ongoing technical arms-race to detect AI-generated content, with varying levels of success and a continued challenge of false positives. Applied has tested numerous detection models and is keeping on top of any new findings and research into the field, as language models continue to evolve faster than the detectors of their use, and most detection methods we've assessed are rife with false positives.

We would expect this to be ongoing work as the future of GAI evolves and the technology rapidly changes. As we believe that the future of work will involve more common use of GAI in everyday life, with many positive effects, detection will not likely be our core focus moving forward.

In light of this, we’ve released functionality in the review flow to allow hiring teams to flag any answers they wish to look at again, whether to compare different answers to GPT-generated ones, compare between different answers, or look at the content in more detail in a non-anonymised way post review.

To help with answer comparison, we have introduced Referent AI Answers to help give examples of what a generated answer would look like. You can read more about this in our article here.

This allows hiring teams to flag answers throughout the review process, and then see an overall view of candidate scores alongside the number and type of flags attached to each of their sift answers. In this way, you can flag and remove candidates from the hiring flow early on, or flag and consider how to proceed with candidates whose answers resemble generative AI responses.

Dissuade

We've been incorporating behavioural science methods into the app to discourage candidates from copying answers not written by them, clarifying normative behaviour and improving compliance. This was done based on research into behavioural science across different realms, such as improving income reporting for tax, or preventing academic plagiarism.

Our research has looked at different behavioural science frameworks assessing compliance, plagiarism and ways to encourage normative behaviour. We know that compliance is a function of:

The expected benefit of a given ‘cheating’ behaviour
The probability of being detected
The cost/penalty for being detected
Social and psychological factors (e.g. self-perception of being honest/moral, what others do, feeling of fairness and reciprocity)*

In looking at incorporating messaging into the application flow, our focus is on prompting honesty at key moments while maintaining optimal candidate experience. This requires assessing and testing two main components:

The location of the message - optimised for relevance and action: not too early (before starting to apply altogether) and not too late (after the candidate has already responded to sift questions).

→ Messaging is shown within the application flow, before starting to respond to sift questions
The content of the message - clarify desired behaviour, relevance and salience to candidates, require active engagement, and make it personal

→ We’ve tested messaging around different angles for efficacy, e.g. whether emphasising fairness, potential costs of cheating, negative or positive etc.

Live in the candidate flow: honesty statement at the commencement of an application.

Related to