# AI Recruiting Assistant — Guide Book (Updated) ## 0) Overview ### What this tool does This AI Recruiting Assistant is a **decision-support** system that helps recruiters and hiring managers: * Extract **job requirements** from a job description (JD) * Evaluate resumes against **verified requirements** using **evidence-based** matching * Assess job-relevant **culture/working-style signals** using retrieved company documents * Run **factuality checks** to detect ungrounded claims * Run a **bias & fairness audit** across the JD, analyses, and the model’s final recommendation ### The problem it addresses Recruiting teams often face three recurring issues when using AI: 1. **Hallucinated requirements**: LLMs may “invent” skills that are not explicitly required. 2. **Opaque scoring**: Many tools produce fit scores without clearly showing evidence. 3. **Bias risks**: Hiring language and reasoning can leak pedigree/class proxies or subjective criteria. This tool addresses those issues by enforcing: * **Deterministic verification gates** (requirements are verified before scoring) * **Evidence-backed scoring** (only verified requirements are scored; each match includes a quote) * **Self-verification and self-correction** (factuality checks can trigger automatic revision) * **Bias auditing** (flags risky language and inconsistent standards) ### How it differentiates from typical recruiting tools Compared with “black-box” resume screeners or generic LLM chatbots, this system emphasizes: * **Transparency**: Outputs include *what was required*, *what was verified*, *what was dropped*, and *why*. * **Auditability**: The scoring math is deterministic and traceable to inputs. * **Self-verifying behavior**: Claims are checked against source text; unverified claims can be removed. * **Bias checks by design**: Bias-sensitive content is audited explicitly instead of implicitly influencing scores. * **Culture check that’s job-performance aligned**: Culture attributes are framed as job-relevant behaviors, not background proxies. --- ## 1) Inputs and Document Handling ### 1.1 What the user uploads The tool operates on three inputs: 1. **Company culture / values documents** (PDF/DOCX) 2. **Resumes** (PDF/DOCX) 3. **Job description** (pasted text) ### 1.2 Resume anonymization Before resumes are stored or analyzed, the tool applies heuristic redaction: * Emails, phone numbers, URLs * Addresses / location identifiers * Explicit demographic fields * Likely name header (first line) This reduces exposure of personal identifiers and keeps analysis focused on job evidence. ### 1.3 Vector stores (retrieval) The tool maintains two separate Chroma collections: * **Resumes** (anonymized + chunked) * **Culture docs** (chunked) Chunking uses a recursive splitter with overlap to preserve context. --- ## 2) End-to-End Logic Flow (Step-by-Step) Below is the stepwise flow executed when a recruiter clicks **Analyze Candidates**. ### Step 0 — Prerequisite: Documents exist in storage * Culture docs and resumes must be stored first. * If not stored, retrieval will be empty or low-signal. ### Step 1 — Extract required skills from the Job Description (LLM-driven) **Goal:** Identify only skills that are explicitly required. * The tool prompts the LLM to return **JSON only**: * `required_skills: [{skill, evidence_quote}]` * The LLM is instructed to: * include only **MUST HAVE** / explicitly required skills * exclude “nice-to-haves” and implied skills * copy a short **verbatim quote** as evidence **LLM role:** structured extraction. **Failure behavior:** If JSON parsing fails, the tool stops and prints the raw output. ### Step 2 — Verify extracted skills against the JD (deterministic, Python) **Goal:** Block hallucinated requirements from entering scoring. Each extracted item is classified: * **Quote-verified (strong):** the evidence quote appears verbatim in the JD * **Name-only (weak):** the skill name appears in the JD, but the quote doesn’t match * **Unverified (dropped):** neither quote nor name appears **Deterministic gate:** * Only **quote-verified** skills are used as the final required list for scoring. * Name-only and dropped skills are reported for transparency. **Output:** “Requirements Verification” section shows: * extracted count * quote-verified vs name-only vs dropped * list of skills used for scoring * list of retracted/dropped items (with reason) ### Step 3 — Retrieve the most relevant culture chunks (deterministic retrieval) **Goal:** Ground culture evaluation in actual company documents. * The tool runs similarity search over culture docs using the JD as query. * It selects the top **k** chunks (e.g., k=3). **Deterministic component:** vector retrieval parameters. **Output artifact:** `culture_context` is the concatenated text of retrieved culture chunks. ### Step 4 — Generate job-performance culture attributes (LLM-driven) **Goal:** Create a small set of job-relevant behavioral attributes to evaluate consistently. * The tool prompts the LLM to return JSON: * `cultural_attributes: ["...", "..."]` (4–6 items) **Attribute rules:** * Must be job-performance aligned behaviors (e.g., “evidence-based decision making”). * Must avoid pedigree / class / prestige language. * Must avoid non-performance preferences (e.g., remote-first, time zone). **LLM role:** label generation from retrieved culture context. ### Step 5 — Retrieve top resume chunks for the JD (deterministic retrieval) **Goal:** Identify the most relevant candidates and their relevant resume text. * The tool runs similarity search over resumes using the JD. * It retrieves top **k** chunks (e.g., k=10) and groups them by `resume_id`. **Note:** Only retrieved chunks are analyzed. If relevant evidence isn’t retrieved, it may be missed. ### Step 6 — Culture evidence matching per candidate (LLM + deterministic cleanup + deterministic scoring) **Goal:** Determine which culture attributes are supported by resume evidence. **LLM-driven matching:** * For each attribute, the LLM may return a match with: * `evidence_type`: `direct` or `inferred` * `evidence_quotes`: 1–2 verbatim resume quotes * `inference`: required for inferred * `confidence`: 1–5 **Deterministic cleanup rules (Python):** A match is kept only if: * attribute is present * evidence_type is `direct` or `inferred` * at least one non-trivial quote exists * confidence is an integer 1–5 * inferred matches include an inference sentence * inferred matches can be required to meet a minimum confidence **Deterministic culture scoring (Python):** * Direct evidence weight: **1.0** * Inferred evidence weight: **0.5** Culture score is computed as: * `(sum(weights for matched attributes) / number_of_required_attributes) * 100` ### Step 7 — Skills matching per candidate (LLM + deterministic scoring) **Goal:** Match only the verified required skills to resume evidence. **Inputs:** * Candidate resume text (retrieved chunks) * Verified required skills list (quote-only) **LLM output (JSON):** * `matched: [{skill, evidence_snippet}]` * `missing: [skill]` (treated as advisory; missing is recomputed deterministically) **Deterministic missing calculation (Python):** * Missing = required_set − matched_set **Deterministic skills scoring (Python):** * `(number_of_matched_required_skills / number_of_required_skills) * 100` ### Step 8 — Implied competencies (NOT SCORED) for phone-screen guidance (LLM-driven, advisory) **Goal:** When a required skill is missing explicitly, suggest whether it may be **implied** by adjacent evidence. * This step is **not scored** and does not affect proceed/do-not-proceed. * The LLM may suggest implied competencies only if it: * uses conservative language (“may be implied”) * includes **verbatim resume quotes** * provides a **phone-screen validation question** **Hard guardrail:** Tool-specific skills (e.g., R/SAS/MATLAB) must be explicitly present in the resume to be suggested. ### Step 9 — Factuality verification (LLM-driven verifier) **Goal:** Detect ungrounded evidence claims. * The verifier checks evidence-backed match lines (e.g., `- Skill: snippet`). * It ignores: * numeric score lines * missing lists * policy text **Outputs:** * verified claims (✓) * unverified claims (✗) * factuality score ### Step 10 — Final recommendation (LLM, policy-constrained) **Goal:** Produce a structured recommendation without changing scores. * The model is given: * skills analysis * culture analysis * fixed computed scores * deterministic decision policy **Decision policy:** * If skills_score ≥ 70 → PROCEED * If skills_score < 60 → DO NOT PROCEED * If 60 ≤ skills_score < 70 → PROCEED only if culture_score ≥ 70 else DO NOT PROCEED **Non-negotiables:** * LLM must not re-score. * LLM must not introduce new claims. ### Step 11 — Self-correction (triggered by verification issues) **Goal:** Remove/correct any unverified claims while preserving scores/policy. * If any unverified claims exist: * The tool asks the LLM to revise the recommendation * Only the flagged claims may be removed/corrected * Scores and policy must remain unchanged ### Step 12 — Bias audit (LLM-driven audit across docs + reasoning) **Goal:** Flag biased reasoning, biased JD language, or inconsistent standards. **Audit scope includes:** * Job description * Skills analysis * Culture analysis * Final recommendation text * Culture context **What it flags (examples):** * Prestige/pedigree signals (elite employers/education as proxy) * Vague “polish/executive presence” language not tied to job requirements * Non-job-related culture screening * Inconsistent standards (penalizing requirements not in JD) * Overclaiming certainty **Outputs:** * structured list of bias indicators (category, severity, trigger text, why it matters, recommended fix) * recruiter guidance --- ## 3) Scoring and Decision Rules (Deterministic) ### 3.1 Skills score * Only quote-verified required skills count. * Score = matches / required. ### 3.2 Culture score * Score = weighted matches / attributes. * Direct = 1.0; inferred = 0.5. ### 3.3 Labels * ≥70: Strong fit * 50–69: Moderate fit * <50: Not a fit ### 3.4 Recommendation Recommendation follows the fixed policy described in Step 10. --- ## 4) System Flow Diagram (Textual) Below is a simplified, end-to-end flow of how data moves through the system. ``` [User Uploads] | v +-------------------+ | Culture Documents | +-------------------+ +-----------+ | | Job Desc | v +-----------+ +-------------------+ | | Culture Vector DB |<--------------+ +-------------------+ | | v | +---------------------+ | | Skill Extraction | | | (LLM, JSON Output) | | +---------------------+ | | | v | +---------------------+ | | Requirement | | | Verification | | | (Deterministic) | | +---------------------+ | | | v | Verified Required Skills | | | v +-------------------+ +---------------------+ | Resume Documents |------->| Resume Vector DB | +-------------------+ +---------------------+ | v Similarity Search (k=10) | v Resume Chunks (Grouped) | v +-----------------------------+ | Culture Attribute Generator | | (LLM, JSON Output) | +-----------------------------+ | v +-----------------------------+ | Culture Evidence Matching | | (LLM + Rules + Weights) | +-----------------------------+ | v Culture Score (Deterministic) | v +-----------------------------+ | Technical Skill Matching | | (LLM + Deterministic Scoring)| +-----------------------------+ | v Skills Score (Deterministic) | v +-----------------------------+ | Implied Competencies (LLM) | | (Not Scored, Advisory) | +-----------------------------+ | v +-----------------------------+ | Factuality Verification | | (LLM Verifier) | +-----------------------------+ | v +-----------------------------+ | Recommendation Generator | | (Policy-Constrained LLM) | +-----------------------------+ | v +-----------------------------+ | Bias & Fairness Audit | | (LLM Audit) | +-----------------------------+ | v Final Recruiter Report ``` --- ## 5) Audit Artifacts and Traceability For every analysis run, the system produces and retains multiple audit artifacts that enable post-hoc review, regulatory defensibility, and debugging. ### 5.1 Input Artifacts 1. **Original Job Description** * Full pasted JD text 2. **Sanitized Resume Text** * Redacted resume content * Redaction summary (internal) 3. **Retrieved Culture Chunks** * Top-k (default: 3) culture document segments * Vector similarity scores (internal) 4. **Retrieved Resume Chunks** * Top-k (default: 10) resume segments * Resume ID metadata --- ### 5.2 Requirement Verification Artifacts 1. **Raw LLM Skill Extraction Output** 2. **Parsed Required Skills JSON** 3. **Verification Classification Table** * Quote-verified * Name-only * Dropped 4. **Dropped-Skill Justifications** --- ### 5.3 Culture Analysis Artifacts 1. **Generated Culture Attribute List** 2. **LLM Raw Matching Output** 3. **Cleaned Match Records** * Evidence type * Quotes * Inference * Confidence 4. **Weighted Match Table** 5. **Computed Culture Score** --- ### 5.4 Skills Analysis Artifacts 1. **Verified Required Skill List** 2. **LLM Raw Matching Output** 3. **Accepted Matched Skills** 4. **Deterministic Missing-Skill Set** 5. **Computed Skills Score** --- ### 5.5 Implied Competency Artifacts (Advisory) 1. **Missing Skill List** 2. **LLM Implied Output (JSON)** 3. **Accepted Implied Records** * Resume quotes * Explanation * Phone-screen questions 4. **Rejected Inferences (internal)** --- ### 5.6 Verification and Correction Artifacts 1. **Verifier Prompt and Output** 2. **Verified / Unverified Claim Lists** 3. **Factuality Scores** 4. **Self-Correction Prompts and Revisions (if triggered)** --- ### 5.7 Recommendation and Policy Artifacts 1. **Final Recommendation Prompt** 2. **Policy Threshold Snapshot** 3. **Immutable Score Values** 4. **Generated Recommendation Text** --- ### 5.8 Bias Audit Artifacts 1. **Bias Audit Prompt** 2. **Audit Input Bundle (JD + Analyses + Recommendation)** 3. **Structured Bias Indicator List** 4. **Severity and Mitigation Suggestions** 5. **Recruiter Guidance Text** --- ### 5.9 System Metadata 1. Timestamp of run 2. Model version 3. Prompt versions 4. Chunking parameters 5. Retrieval k-values 6. Scoring parameters --- ## 6) Known Limitations 1. **Retrieval scope**: evaluation depends on retrieved chunks; some evidence may be missed. 2. **Attribute generation variance**: culture attributes can vary per run unless cached or cataloged. 3. **LLM evidence overreach**: mitigated by verification and cleanup, but not eliminated. 4. **Bias audit is advisory**: it flags issues; it does not enforce policy changes unless you add an auto-rewrite step. --- ## 6) Governance and Change Control * Prompt changes must preserve JSON contracts. * Any change that affects scoring or policy should be versioned. * Audit outputs should be retained for traceability. --- ## 7) Intended Use This tool is built for: * faster, evidence-based screening * transparent reasoning * safer use of LLMs via verification and audits It is not a substitute for: * human judgment * legal review * formal HR policy compliance --- ### High-level pipeline (inputs → outputs) **Inputs uploaded by recruiter** 1. Company culture/values docs (PDF/DOCX) 2. Resumes (PDF/DOCX) 3. Job description (text) ⬇️ **Indexing (deterministic, Python)** * Culture docs → chunk + embed → `culture_store` * Resumes → anonymize → chunk + embed → `resume_store` ⬇️ **Candidate assessment (per JD run)** 1. **Extract required skills (LLM)** → JSON `required_skills[{skill,evidence_quote}]` 2. **Verify extracted skills (Python)** → quote-verified / name-only / dropped → *quote-only list used for scoring* 3. **Retrieve relevant culture context (deterministic retrieval)** * Query: JD * Retrieve: top-k culture chunks (**current: k=3**) * Output: `culture_context` 4. **Generate job-relevant culture attributes (LLM)** → JSON `cultural_attributes[4–6]` 5. **Retrieve relevant resume chunks (deterministic retrieval)** * Query: JD * Retrieve: top-k resume chunks (**current: k=10**) * Group by `resume_id` 6. **Per candidate: culture matching (LLM → cleanup → deterministic score)** * LLM proposes matches (direct/inferred) + quotes * Python enforces validity gates * Deterministic weighted culture score (direct=1.0, inferred=0.5) 7. **Per candidate: skills matching (LLM → deterministic score)** * LLM proposes matched skills + evidence snippets * Python recomputes missing list deterministically * Deterministic skills score using quote-verified requirements only 8. **Per candidate: implied competencies (LLM, NOT SCORED)** * Inputs: missing skills + matched skills + resume + JD * Output: implied items with quotes + phone-screen questions * Guardrail: tool-like skills (R/SAS/MATLAB) require explicit mention 9. **Factuality verification (LLM verifier)** → ✓/✗ for evidence-backed match lines + factuality score 10. **Recommendation (LLM, policy constrained)** → uses fixed scores + fixed decision policy 11. **Self-correction (conditional)** → triggered if any unverified claims exist 12. **Bias audit (LLM)** → audits JD + analyses + recommendation → structured bias indicators + guidance ⬇️ **Outputs per candidate** * Requirements verification summary (global) * Culture analysis + score * Skills analysis + score * Implied (not scored) follow-ups * Fact-check results * Final recommendation (+ revision note if corrected) * Bias audit --- ### Component map (LLM vs deterministic) **LLM-driven components** * Required skill extraction (JSON) * Culture attribute generation (JSON) * Culture match proposals (JSON) * Skills match proposals (JSON) * Implied (not scored) follow-ups (JSON) * Factuality verification (✓/✗) * Final recommendation (policy constrained) * Bias audit (structured) **Deterministic / Python-enforced components** * Resume anonymization * Chunking + embedding + storage * Retrieval parameters (top-k) * Required-skill verification (quote/name-only/dropped) * Deduplication of requirements * Culture match cleanup rules (validity gates) * Skills missing list recomputation * Skills score computation * Culture score computation with weights * Decision thresholds (proceed / do not proceed) * Self-correction trigger (presence of unverified claims) --- ## Audit Artifacts This section lists the primary artifacts produced (or recommended to persist) to make runs reviewable and defensible. ### Inputs (source-of-truth) * Job description text (as provided) * Culture documents (original files) * Resumes (original files) ### Pre-processing * Sanitized resume text (post-anonymization) * Redaction notes (what was removed/masked) * Chunking configuration (chunk_size, chunk_overlap) * Embedding configuration (embedding model + settings) ### Retrieval * Culture retrieval query: JD text * Culture retrieved chunks: top-k (**current: k=3**) * Resume retrieval query: JD text * Resume retrieved chunks: top-k (**current: k=10**) * Candidate grouping: chunks grouped by `resume_id` ### Requirements verification * LLM `required_skills` JSON (raw) * Normalized required skill list (deduped) * Verification output: * quote-verified list * name-only list * dropped/unverified list * counts and factuality score * Final scoring-required list: quote-verified only ### Per-candidate analyses **Culture analysis** * Raw LLM culture-match JSON * Post-cleanup matched culture list * Missing culture attributes list * Culture score + label * Culture evidence lines shown to recruiters **Skills analysis** * Raw LLM skills-match JSON * Matched skills list (with evidence snippets) * Deterministically computed missing skills list * Skills score + label **Implied (NOT SCORED)** * Raw LLM implied JSON * Filtered implied list (must include resume quotes + phone-screen questions) ### Verification & correction * Verifier raw output (✓/✗ lines) * Verified claims list * Unverified claims list * Factuality score * Self-correction trigger status (yes/no) * Corrected recommendation (if triggered) + revision note ### Bias audit * Bias audit raw output (structured) * Bias indicators list (category, severity, trigger_text, why_it_matters, recommended_fix) * Overall assessment * Recruiter guidance ### Run-level trace (recommended) For reproducibility/governance, also persist: * Timestamp, model name, temperature, seed * Prompt versions (hash or version ID) * Retrieval parameters (k values) * Score thresholds and policy version * Any configuration overrides used during the run ## End-to-End Pipeline (Swim-Lane View) | Step | Recruiter / Input | Python / Deterministic Logic | LLM (Groq) | Storage / Output | |------|------------------|------------------------------|-----------|------------------| | 1 | Upload culture documents | Chunk + embed | — | `culture_store` (indexed) | | 2 | Upload resumes | Anonymize → chunk → embed | — | `resume_store` (indexed) | | 3 | Paste JD + Run | Send JD to LLM | Extract required skills + evidence quotes | `required_skills` JSON | | 4 | — | Verify requirements (quote / name-only / dropped) | — | Verified list + debug report | | 5 | — | Retrieve culture context (k=3) | — | `culture_context` | | 6 | — | — | Generate culture attributes (job-performance aligned) | `cultural_attributes` JSON | | 7 | — | Retrieve resume chunks (k=10), group by `resume_id` | — | Candidate chunks | | 8 | — | — | Propose culture matches (direct/inferred + quotes) | Raw culture-match JSON | | 9 | — | Cleanup + weighted scoring (direct=1.0, inferred=0.5) | — | Culture score + evidence | | 10 | — | — | Propose skill matches + evidence snippets | Raw skills-match JSON | | 11 | — | Compute missing list + skills score (verified reqs only) | — | Skills score + missing list | | 12 | — | — | Infer implied skills (NOT SCORED) + phone questions | Implied follow-ups | | 13 | — | — | Verify evidence (✓/✗) | Factuality report | | 14 | — | — | Generate recommendation (policy constrained) | Final recommendation | | 15 | — | Trigger self-correction (if needed) | Revise flagged claims only | Corrected recommendation | | 16 | — | — | Run bias audit (JD + analyses + decision) | Bias indicators + guidance | | 17 | Review output | Assemble final report | — | Full candidate report | ### Current Retrieval Parameters - Culture store: `k = 3` chunks (JD query) - Resume store: `k = 10` chunks (JD query)