Introducing Microsoft Dynamics 365 Sales Research Bench: The Standard for AI-Driven Sales Research

Microsoft has taken inspiration from innovative AI-Benchmarks like TBFact and RadFact to create the Sales Research Bench. This tool is designed to evaluate how effectively AI-Solutions tackle the business research inquiries that sales leaders have regarding their data.

For an in-depth look at the Sales Research Bench methodology and the architecture of the Sales Research Agent, be sure to check out this blog post.

This benchmark reflects the genuine experiences and priorities of customers. Through collaborations with sales teams across various industries worldwide, Microsoft has developed 200 authentic business questions that resonate with sales leaders. Additionally, they have pinpointed eight essential quality dimensions, including accuracy, relevance, clarity, and explainability. The evaluation data schema is tailored to mirror the complexities of customers’ enterprise environments, encompassing their intricate business logic and nuanced operational contexts.

To give you a clearer picture, here are three examples from their pool of 200 evaluation questions, shaped by inquiries from sales leaders:

  1. Which sellers have the most significant gap between Total Actual Sales and Estimated Value for the First Year in the ‘Corporate Offices’ Business Segment when examining closed opportunities?
  2. Are sales efforts directed toward particular industries, or are they distributed evenly across the board?
  3. Considering that headcount on paper is 30, how many individuals are genuinely in their roles and contributing to the pipeline?

The evaluation process involves large language model (LLM) that assesses the AI solution’s responses, including both text and data visualizations, across various quality dimensions on a 100-point scale. Scores are determined by specific guidelines, for instance, a score of 100 might be given for a chart that is clear and well-labeled, while a score of 20 could apply if the chart is difficult to read or misleading. These dimension-specific scores are then weighted to create a composite quality score, with the weights derived from qualitative feedback from customers, reflecting what they prioritize most. This approach results in a thorough benchmark that provides a composite score alongside dimension-specific scores, highlighting areas where agents excel or require improvement.1

[1] Sales Research Bench employs Azure Foundry’s pre-built LLM evaluators for the dimensions of Text Groundedness and Text Relevance, while the remaining six dimensions utilize custom LLM evaluators that harness OpenAI’s GPT 4.1 model. The scoring system ranges from 100 as the highest score to 20 as the lowest. More detailed information on the benchmark methodology can be found here.

Running Sales Research Bench on AI solutions

In the evaluation of AI solutions, Microsoft utilized the Sales Research Bench to assess the performance of the Sales Research Agent, ChatGPT by OpenAI, and Claude by Anthropic. Here is a summary of the approach:

  • License Selection: Microsoft tested ChatGPT using a Pro license with GPT-5 in Auto mode and Claude utilizing a Max license. These licenses were selected to ensure optimal quality in responses. ChatGPT’s Pro plan is described as providing “full access to the best of ChatGPT,” while Claude’s Max plan is recommended for those looking to “get the most out of Claude.”2 For ChatGPT, they ran the evaluation in Auto mode to allow the system to choose the most suitable model variant for each prompt.
  • Question Set: All agents were tasked with responding to the same set of 200 business questions.
  • Instructional Guidelines: Both ChatGPT and Claude received clear instructions to generate charts and articulate their reasoning for the answers they provided. These instructions align with the setup included in the Sales Research Agent by default.
  • Data Access: ChatGPT and Claude accessed a sample dataset hosted on an Azure SQL instance via the MCP SQL connector. In contrast, the Sales Research Agent connects to a sample dataset within Microsoft Dynamics 365 Sales without additional configuration.

2For further details, including pricing information, please refer to the respective pricing pages of ChatGPT and Claude, which were accessed in October 2025.

Sales Research Agent Versus Competition

In head-to-head evaluations conducted in October 2025, using the Sales Research Bench framework, the Sales Research Agent achieved a notable victory, surpassing Claude Sonnet 4.5 by 13 points and ChatGPT-5 by 24.1 points on a 100-point scale.

Results: Testing was conducted in October 2025, using the Sales Research Bench methodology. The evaluation encompassed three key tools: Microsoft’s Sales Research Agent, which is part of Dynamics 365 Sales; ChatGPT by OpenAI, utilizing a ChatGPT Pro license with GPT-5 in Auto mode; and Claude Sonnet 4.5 from Anthropic, accessed through a Claude Max license.

Methodology and Evaluation Dimensions: The Sales Research Bench consists of 200 targeted business research questions tailored for sales leaders, all evaluated using a customized data framework. Each AI solution accessed a sample dataset through unique mechanisms that suited their specific architecture. The solutions were then assessed by LLM judges based on their responses to each question, encompassing both textual answers and data visualizations.

To evaluate quality, Microsoft focused on eight key dimensions, assigning weights based on valuable feedback from customers about what they prioritize in AI tools for sales research: Text Groundedness (25%), Chart Groundedness (25%), Text Relevance (13%), Explainability (12%), Schema Accuracy (10%), Chart Relevance (5%), Chart Fit (5%), and Chart Clarity (5%). Each dimension was scored by LLM judges on a scale from 20 (poor) to 100 (excellent). For instance, a score of 100 for chart clarity indicates a well-designed, clearly labeled chart, while a score of 20 suggests the chart is difficult to read or misleading. For Text Groundedness and Text Relevance, they utilized Azure Foundry’s standard LLM evaluators, while the other six dimensions relied on Open AI’s GPT 4.1 model with tailored guidance. The final composite score was derived as a weighted average of the eight specific dimension scores. For further details on this methodology, please refer to this blog.

The Sales Research Agent Exceeded Expectations Across All Eight Quality Dimensions

The Future of Sales Research: Exploring Benchmark Investment

Looking ahead, we are excited about Microsoft’s plans for the Sales Research Bench. Their focus will be on leveraging this benchmark for the ongoing improvement of the Sales Research Agent, conducting comparisons with a broader set of competitive solutions, and in the upcoming months, they will share the complete evaluation package. This will include all 200 questions and a sample dataset, allowing others to replicate the results and assess their own agents against our benchmark. We believe that evaluation should be an ongoing process and by tracking scores across various releases, domains, and datasets, we can drive meaningful quality improvements and ensure that AI evolves alongside your business needs.

Sales Research Bench is just our starting point, however. Microsoft is set up to create evaluation frameworks and benchmarks for additional business functions and agent solutions, including areas like customer service as their aim is to establish a new benchmark for trust and transparency in enterprise AI.

Travis South – Marketing Specialist

Working with New Dynamic

New Dynamic is a Microsoft Solutions Partner focused on the Dynamics 365 Customer Engagement and Power Platforms. Our team of dedicated professionals strives to provide first-class experiences incorporating integrity, teamwork, and a relentless commitment to our client’s success.

Contact Us today to transform your sales productivity and customer buying experiences.

The post Introducing Microsoft Dynamics 365 Sales Research Bench: The Standard for AI-Driven Sales Research appeared first on CRM Software Blog | Dynamics 365.

Click Here to Visit the Original Source Article

Scroll to top