How We Score a Reddit Thread for Pain and Willingness to Pay

01Why score at all

Reading Reddit by hand gives you stories, and stories don’t rank — you remember the most dramatic thread, not the most common problem. To decide anything, you need every thread reduced to the same handful of comparable fields.

So every thread the pipeline keeps is sent to a language model with one job: read the post and its comments and return a fixed set of structured fields. No free-form essays — a schema.

02The two numbers that carry the most weight

pain_signal is a 0–100 score for how acute the frustration is. A casual “would be nice” sits low; “I waste an hour on this every single day” sits high.

wtp_tier buckets willingness to pay into high, medium, low, or none — based on whether people describe paying for a fix, asking for one, or merely grumbling. Those two fields are what the report sorts on.

03The full field set

Beyond those two, each thread also carries:

tools_mentioned[] — the products named in the discussion
sentiment_toward_tools — positive / negative / mixed / neutral, per tool
primary_use_case — market research, lead gen, brand monitoring, content ideation, or other
relevance_score — 0–10 match between the thread and your claim
key_quotes, best_quote_from_OP, best_quote_from_top_reply — verbatim, with source links
summary — a one-line plain-English gist

04Why a fixed vocabulary beats free text

The temptation is to let the model write whatever it notices. The problem shows up at aggregation: a free-form “top advice” field comes back as a unique sentence every time, so nothing tallies.

Constraining fields to a fixed enum — high/medium/low/none, positive/negative/mixed/neutral — is what lets the report say “34% medium willingness to pay” instead of listing 300 one-off opinions. Even tool names get case-normalised before counting, so “F5Bot”, “F5bot”, and “f5bot” don’t split into three.

05The cost lever is depth, not breadth

Scoring is the only paid step, and it stays inexpensive. Pulling more comments per thread gives richer quotes for a slightly higher run cost.

A top-N cap lets you decide exactly how many threads get the full treatment, so cost is a dial you set rather than a surprise you get.

See it end to end

Scoring is one stage of a six-step run, from hypothesis to report.

Read the full methodology →

Frequently asked questions

A language model reads the post and its comments and sorts the thread into one wtp_tier — high, medium, low, or none. The signal is what people actually say: describing paying for a fix, asking someone to build one, or just grumbling. It is a judgement call, so every tier links back to the source thread for you to check.

pain_signal is a 0–100 score for how acute the frustration in a thread is. A casual “would be nice” sits low; “I waste an hour on this every single day” sits high. Together with the willingness-to-pay tier it is what the report sorts on, turning a pile of anecdotes into rows you can rank and compare.

Treat it as a fast first pass, not a verdict. A language model does the scoring, and it can misread tone, sarcasm, or context and land a thread in the wrong tier. That is exactly why every field — pain_signal, the tier, the quotes — ships with a link to the original thread, so you can read it yourself before you act on it.

Each kept thread is reduced to the same fixed fields — a 0–100 pain signal, a willingness-to-pay tier, a 0–10 relevance score, and a few enum tags. Because the vocabulary is fixed rather than free text, the numbers aggregate and the report can sort threads against each other instead of leaving you with 300 one-off opinions.

How we score a thread for pain and willingness to pay

01Why score at all

02The two numbers that carry the most weight

03The full field set

04Why a fixed vocabulary beats free text

05The cost lever is depth, not breadth

See it end to end

Frequently asked questions

Related guides & use cases.

Validate your startup idea with evidence, not optimism

Find the pain points your customers never put in a ticket

Does AI hallucinate subreddit names? We tested 100

Validate what people actually say, not what you wish they would.