MIDS Capstone Project Spring 2025

MIRRA: Matching Intelligence for Resume to Role Alignment

Team members

Problem & Motivation

Current online job platforms such as LinkedIn and Indeed present job seekers with an overwhelming number of listings, often in the thousands for a single search, without providing meaningful insight on which roles are most relevant to their background. The process remains largely manual, requiring candidates to read through lengthy job descriptions, which have requirements that may not align with their qualifications. This inefficiency contributes to the prolonged timeline many face when searching for a new role, with the average job search in the United States taking approximately six months. According to the U.S. Bureau of Labor Statistics’ (BLS) Employment Situation Summary published in March 2025, this affects an estimated 13 million individual job seekers, both employed and unemployed.

While artificial intelligence has been integrated into many of these platforms, its current use is limited. Most systems provide case-by-case assessments without offering a clear or consistent metric that quantifies how well a candidate matches a given role. We propose a more scalable and interpretable solution that uses natural language processing along with structured resume and job description information to produce a quantifiable match score. This would enable job seekers to quickly assess their fit for a role and streamline the path to meaningful employment. In Figure 1 below, we present LinkedIn’s interface that presents a single detailed job description alongside a long and potentially overwhelming list of job postings. The interface exemplifies the common challenge of sifting through a vast number of opportunities, one job at a time, with no clear indication of which roles are the best match for the candidate’s skills and background. MIRRA addresses these issues head-on by automating the matching process and assigning each position a quantifiable score based on multiple relevant dimensions, thereby helping candidates quickly prioritize the opportunities that truly fit their qualifications.

_{Figure 1: Screenshot of LinkedIn’s search interface for “Top job picks for you”, highlighting a detailed job listing among a long list of postings.}

Our Solution

MIRRA leverages advanced feature extraction and semantic matching techniques to align candidate qualifications with the detailed requirements of job listings. Our system performs comprehensive feature extraction of both resumes and job descriptions, enabling precise comparisons across dimensions such as education, credentials, and professional experience. We provide a user-facing interface powered by Streamlit, where candidates can upload a PDF of their resume and receive personalized job matches based on how well their background aligns with specific role requirements.

Data Source & Data Science Approach

Data Source

Our dataset comprises 10,000 randomly selected job descriptions and resumes sourced from Dice.com, with a focus on roles in technology-related fields. Each job listing includes unstructured text from the body of the posting, along with structured metadata such as the job title, posting date (year, month, and day), location, and a URL linking to the original listing.

The resume data consists solely of unstructured text extracted from candidate resumes. To ensure privacy and compliance with data protection standards, all resumes are anonymized to remove personally identifiable information (PII) prior to any processing or analysis. These datasets form the basis for developing and evaluating our resume-to-job matching algorithms using natural language processing techniques.

Feature Extraction Model

At the core of MIRRA is a unified feature extraction module designed to enable structured comparison between unstructured job descriptions and resumes. This system leverages prompt-engineered large language models to convert free-form text into structured JSON representations, capturing semantic attributes such as requirements for education, credentials, and the candidate’s domain-specific professional experience. To operationalize this capability at scale, we employ a teacher–student distillation framework that fine-tunes a parameter-efficient LLaMA 3.1 model using QLoRA. The student model is trained to simulate the extraction behavior of a high-capacity teacher model (GPT-4o), preserving extraction quality while enabling low-latency inference and reduced computational overhead. This architecture provides a solid foundation for scalable, high-resolution resume–job matching in real-world settings.

_{Figure 2: MIRRA’s distillation pipeline}

Prompt Engineering to Create Labeled Dataset

Since we did not have a labeled dataset of extracted requirements, we used GPT-4o as our teacher model to create the labeled dataset for the distilled model. For both job descriptions and resumes, the extraction logic was divided into 3 sequential prompts with the final output being structured JSON. An overview of the pipeline logic is outlined below:

_{Figure 3: Prompt Engineering Pipeline}

A high-level overview of the key features of the structured JSON output schemas are outlined below:

Top-level objects (Job Description Only):

1. mandatory: Job qualifications that are mandatory or required.

2. preferred: Job qualifications that are preferred or nice-to-have.

3. responsibility: The listed responsibilities and duties that contain hard skills.

4. details: Details about the job including (but not limited to) wage, locations, benefits, work-from-home policy, and employment type.

Qualification Categories:

- hard_skills: Specific, measurable, and technical abilities that are acquired through education, training, or hands-on experience. Includes years of experience.

- professional_background: Job titles, roles, and industry experience. Includes years of experience.

- education: Formal degrees and fields of study.

- credentials: Professional certifications, licenses, and security clearances.

Multidimensional Semantic Matching for Candidate-Job Alignment

MIRRA’s matching algorithm performs a fine-grained, multi-dimensional alignment between candidate resumes and job descriptions by computing structured similarity scores across key dimensions: skills, education, professional background, credentials, and responsibilities. The system begins by extracting relevant free-text elements from structured JSON input, preserving the nesting and logical grouping of those elements - whether elements appear together within a list (logical AND) or in separate lists (logical OR). These extracted strings are transformed into dense vector embeddings using a pre-trained sentence model such as multilingual-e5-large-instruct. Since our application is limited to English-language content, cosine similarity values are rescaled to mitigate variance introduced by multilingual training.

A central feature of MIRRA’s architecture is its logic-aware matching pipeline, which semantically and structurally interprets natural language requirements. For example, a phrase like “implementation of interactive dashboards” is encoded conjunctively as [["implement", "interactive dashboards"]], while an OR condition such as “experience with Python, Java, or R” is encoded as multiple disjoint options: [["Python"], ["Java"], ["R"]]. The scoring logic distinguishes between these structures, averaging similarity scores within conjunctive groups and taking the maximum across disjunctive alternatives—thus mimicking the compositional logic embedded in human language.

_{Figure 4: Parsing Structure}

In contrast to conventional similarity methods, MIRRA provides interpretability by explicitly identifying both met and unmet requirements. For instance, if a job requires both Python and Java, and a candidate has experience with Python and C, the system surfaces the match with Python while clearly indicating the absence of Java.

Education matching in MIRRA combines semantic similarity with hierarchical level reasoning and an intelligent experience-based fallback. First, the algorithm maps education levels (e.g., Bachelor’s, Master’s, PhD) to ordinal ranks using a custom education level ranking dictionary. When a job specifies a minimum education level, the candidate’s degree is first evaluated against this threshold. If the degree level meets or exceeds the requirement, the algorithm evaluates semantic similarity between the candidate’s declared major and the required fields of study (e.g., “Finance” vs. “Statistical Computing”).

However, job descriptions frequently include conditional requirements like “Master’s degree or equivalent experience.” In such cases, MIRRA activates an experience-based proxy. As part of the feature extraction step, we infer a field of study for each job role listed in the candidate’s resume to make an inference on domain relevance (e.g., a role titled Data Engineer might imply a background in Computer Science or Information Systems). These inferred academic domains are used to semantically compare a candidate’s actual work history against the required academic field - weighted by the number of years of experience in each role. If sufficient alignment is found, this experiential path contributes to the education score, reflecting the candidate’s practical equivalence to a formal degree.

Skill matching incorporates similar logic, with an emphasis on experience relevance. For skill requirements with minimum experience thresholds (e.g., 5+ years of AWS), MIRRA sorts the candidate’s relevant skills by similarity and accumulates years until the year requirement is met, assigning greater weight to highly similar experiences.

Finally, scores across dimensions are aggregated using safe averaging, which excludes undefined criteria from the job descriptions. This ensures candidates are not penalized for missing criteria when the job does not specify a requirement in that category. Through this modular, logic-aware, and semantically rich scoring architecture, MIRRA provides robust, interpretable match scores aligned with real-world hiring expectations.

Evaluation

For evaluation, we used the LLM-as-a-judge approach with human review, using OpenAI’s O1 reasoning model as the evaluator. The LLM is given a set of five criteria categories with instructions on how to judge them, using a relevance score as the evaluation metric. The five criteria categories are: Hard Skills, Professional Experience, Education, Credentials, and Job Fit (relevance of seniority level and responsibilities). If a candidate’s resume meets at least 70% of the qualifications in a category, then 1 point is added, for a maximum total score of 5 points. Due to lack of formal research in this area, the 70% number was determined through the recommendations of various online articles .

For our evaluation data, we pulled all job postings from the last 7 days (April 1st - 7th 2025) totalling 37k. We then let the LLM judge evaluate the top 10 results (no filtering) from both Dice and MIRRA for 10 candidate resumes (5 randomly selected, 5 member-provided) across 10 different job titles. This evaluation process aligns with our goal of streamlining job selection by reducing the clutter of irrelevant jobs.

As an additional qualitative analysis, we used the Linkedin profiles and resumes of 4 members to assess the relevance score of the top 10 results for the same positions. The same methodology was used in this process. Below is an overview of the results of both evaluations:

_{Figure 5: MIRRA vs. Dice Evaluation Results}

_{Figure 6: MIRRA vs Linkedin Evaluation Results}

As shown in the charts, on average we see a 55% and 17% improvement in relevance score against Dice and Linkedin respectively. When analyzing the errors, we found that in the majority of evaluated errors MIRRA was actually correct and O1 mistakenly penalized the results. In many cases, O1 would consider responsibilities and duties when requirements were sparse resulting in a penalty for qualifications unrelated to the requirements. We also saw Job Fit was one of lowest performers, enforcing that filters are needed to reduce overqualification.

In future evaluations, we would want to use more robust evaluation prompts, potentially using a prompt for each criteria. Another addition would be using multiple LLMs as judges to help reduce any biases contained in a specific model. We would also like to use a larger sample size, because only 10 candidates were evaluated due to the lengthy and manual evaluation process.

Architecture & Deployment

Our solution integrates cutting-edge technology to streamline resume extraction and job search matching. It combines Streamlit for the user interface, AWS SageMaker for hosting the embedding model (intfloat/multilingual-e5-large-instruct), and Pinecone as the vector database for efficient retrieval and matching. The MIRRA distilled model (with GPT-4o) is utilized separately, not hosted on AWS, for extracting structured information from resumes and job postings.

_{Figure 7: System Architecture}

Here’s the workflow:

Streamlit Frontend: Users upload resumes and job postings through an intuitive interface.
Resume and Job Posting Extraction: MIRRA distilled model processes the uploaded data externally, extracting structured fields like skills and experiences from unstructured text.
SageMaker Embedding Generation: The embedding model hosted on SageMaker (intfloat/multilingual-e5-large-instruct) generates vector representations of the extracted data.
Custom Matching Algorithm: A proprietary matching algorithm leverages the semantic similarity of embeddings to match resumes with job postings.
Pinecone Database: Pinecone is used for efficient vector storage and retrieval, complementing the custom matching process.
Results Display: Matches are ranked and displayed on Streamlit for actionable insights.
This modular design ensures scalability, precision, and seamless integration, providing users with a powerful tool for job matching.

Key Learning and Impact

One critical finding from developing MIRRA is understanding the limitations of semantic similarity and explicit matching criteria in job recommendations. While our current system effectively identifies roles for which candidates clearly meet the specific qualifications, it may include positions that lack explicit requirements which appears in our results page as highly viable opportunities. For example, an experienced data scientist might see entry-level job postings ranked highly if the role’s description does not explicitly state experience requirements. MIRRA has been built to emphasize qualification matching rather than implicit fit or seniority level, resulting in cases where candidates may be overqualified. However to address this gap, candidates can utilize the provided filtering options such as selecting salary or experience level.

Additionally, our experience suggests the importance of using embedding models that are fine-tuned on job-related language and skill taxonomies. General-purpose large language models often struggle with the nuanced distinction for domain-specific terminology. By pretraining an embedding model on large amounts of text rich in job skills and career-specific semantics, we believe we can significantly improve the precision of similarity scores and the overall quality of the matching process.

We also find that although the three prompts used to extract and create the training data worked sufficiently well, splitting out the logic into more additional prompts would likely improve the quality of labels and therefore provide more accurate extractions.

As we continue to refine MIRRA, our goal is to empower job seekers with smarter, more transparent artificial intelligence matching tools while focusing on bridging the gap between opportunities and fit, and helping candidates focus their time on roles that truly align with their aspirations.

Video

Last updated: April 15, 2025