At Applied we want to partner with you to build good quality questions with the help of data
That's why you'll notice different statistics about questions when you're building a role and are exploring your library of questions or checking out the reports section of a particular role.
Statistics on the Library of Questions
All the statistics that you see on the library of questions correspond to a 0-10 score. The closer a score is to 10, the better the quality of the question on that dimension.
We've created a traffic light system to make it easy for you to interpret the score given to a question on a particular dimension. Below is a summary of each dimension and the thresholds that make it a green, yellow or red label.
Dimension | What does it capture? | Thresholds for traffic light system | |||
New or N/A |
Red |
Yellow |
Green |
||
Maturity | It captures how many jobs have used the question and how many answers the question has received. The more mature the question, the better. | The question hasn't received any answers, therefore we cannot show you any scores. | N/A |
From 0.0 to 7.4 |
From 7.5 to 10 |
Agreement | For each answer, it captures the similarity of the scores among reviewers. The overall agreement score aggregates the levels of agreement at an answer level. The more agreement, the less subjective the question is. |
From 0.0 to 4.0 |
From 4.1 to 6.5 |
From 6.6 to 10 |
|
Spread | It captures how much a question separates candidates through the average scores they get. The more spread of average scores, the better. |
From 0.0 to 5.9 |
From 6 to 7.4 |
From 7.5 to 10 |
Statistics on each role's report section
Average score
After your hiring team reviewed all questions to a job, you'll see what was the overall average that candidates got. The higher the average score, the less likely that a question has helped you to separate the field.
Adverse impact
This metric captures whether the average scores of sociodemographic groups (e.g women, men and non-binary candidates) are statistically different or not. It helps you to check if a question has inadvertently affect a particular group of candidates.
Subjectivity
This metric captures the differences in scores given by reviewers. It's the negative version of the agreement score listed on the library of questions. A very subjective question might be indicative that you and your team need to agree on what good looks like. Although the good thing is that the wisdom of the crowds helps you to average out these differences among reviewers.
Review distribution
The chart shows the overall distribution of scores (i.e. how many answers got a 1, 2, .., 5), and the distribution of scores given by each reviewer. When there's a reviewer who differs a lot form the rest, it might be worth asking what were the criteria they used to score the answers. With that information you can update the question's review guides.
We're always open to feedback! Building these statistics is an ongoing and fascinating process at Applied, so we'd love to get your feedback about any bit that can improve your experience when interpreting and using these statistics. |
Comments
0 comments
Please sign in to leave a comment.