Investing on the Edge: Navigating the Promise and Peril of AI Reasoning Models

Reasoning Models in Action -The Illusion of Thinking AI

Jun 11, 2025

Welcome to Silicon Sands News—the go-to newsletter for investors, senior executives, and founders navigating the intersection of AI, deep tech, and innovation. Join ~35,000 industry leaders across all 50 U.S. states and 113 countries—including top VCs from Sequoia Capital, Andreessen Horowitz (a16z), Accel, NEA, Bessemer Venture Partners, Khosla Ventures, and Kleiner Perkins.

Our readership also includes decision-makers from Apple, Amazon, NVIDIA, and OpenAI, some of the most innovative companies shaping the future of technology. Subscribe to stay ahead of the trends defining the next wave of disruption in AI, enterprise software, and beyond.

This week, we will examine reasoning models, including their challenges, applications, and use cases.

Let's Dive Into It...

Last week, Apple researchers dropped a bombshell that sent ripples through the AI community and tech markets. Their study, titled "The Illusion of Thinking," revealed fundamental limitations in the latest generation of AI systems known as reasoning models. These models—including OpenAI's o1 and o3, Claude 3.7 Sonnet with "Thinking" mode, and IBM Granite 3.2—have been heralded as a significant leap forward in artificial intelligence, capable of "thinking through" complex problems step by step before providing answers.

The findings were stark: these reasoning models experience "complete accuracy collapse" beyond certain complexity thresholds. More troublingly, they exhibit a counterintuitive scaling limit—their reasoning effort increases with problem complexity up to a point, then declines despite having adequate computational resources. In other words, they think less when problems become more challenging.

This research comes at a critical juncture for the market. Organizations across various industries are rapidly deploying reasoning models for high-stakes decisions, often without a thorough understanding of their limitations. Venture capital continues to flow into reasoning AI startups, with over $12 billion invested in the past 18 months. The promise is compelling: AI systems that not only generate outputs but also explain their thinking, evaluate alternatives, and solve complex problems with human-like reasoning. But the reality is more nuanced for investors and business leaders alike.

As you navigate this landscape, understanding both the capabilities and fundamental limitations of reasoning models isn't just academic—it's essential for investment decisions, product strategy, and responsible deployment. This article examines the market reality behind the hype, explores real-world business implications across critical industries, and provides strategic guidance for capitalizing on this technology while mitigating its risks.

Key Takeaways

For Investors:

The reasoning model market represents a significant but nuanced investment opportunity, with clear winners emerging among companies that focus on medium-complexity use cases rather than general-purpose reasoning.
Companies like Manus, Wand, and Writer that have developed specialized approaches to overcome reasoning limitations represent more promising investment targets than those pursuing general-purpose reasoning capabilities.
Early movers in specialized reasoning domains are showing strong returns, with specialized AI implementations demonstrating measurable improvements in accuracy and efficiency across multiple industries.
The most valuable investment opportunities combine domain-specific expertise with innovative reasoning architectures, creating defensible market positions with sustainable unit economics.

For Founders:

Building reasoning capabilities into your product requires a strategic focus on specific complexity tiers, rather than a general-purpose reasoning approach that attempts to address all use cases.
Successful reasoning AI companies are pursuing three distinct innovation paths: context-aware architectures, hybrid symbolic-neural integration, and domain-specific frameworks, each addressing different aspects of the fundamental limitations.
Open-source strategies offer significant advantages in terms of community adoption, talent acquisition, and ecosystem development, with improved consistency in reasoning across similar cases.
Energy efficiency is emerging as a key competitive differentiator, with the most successful startups optimizing for computational efficiency rather than raw reasoning power.

For Senior Executives:

A tiered implementation strategy matching model type to task complexity offers the most effective deployment approach, with standard LLMs for simple tasks and human oversight for critical decisions.
Implementation strategies from market innovators demonstrate that reasoning consistency and reliability in specific domains deliver more business value than impressive but inconsistent performance on general benchmarks.
Healthcare organizations implementing domain-specific reasoning models report significant accuracy improvements in complex diagnostic scenarios, while maintaining appropriate human oversight.
Strategic competitive advantage comes from knowing when and how to use reasoning models rather than wholesale adoption across all business functions.

The Market Reality Behind the Hype

Reasoning models represent a specialized class of Large Language Models (LLMs) designed to generate detailed "thinking processes" before providing answers. While standard LLMs, such as earlier versions of GPT or Claude, focus on generating outputs based on patterns in their training data, reasoning models are fine-tuned to produce extended chains of thought that simulate human-like reasoning.

The primary innovation is in allocating more compute resources at inference time (when generating outputs) rather than just at training time. This approach, known as inference scaling, incentivizes models to create longer, more complex "thought processes" before arriving at conclusions.

However, Apple's research identified three distinct performance regimes that challenge the value proposition of these models and have significant implications for both investors and business leaders. First, for low-complexity tasks, standard LLMs surprisingly outperform reasoning models, showing both higher accuracy and lower token consumption. The additional "thinking" not only fails to improve outcomes but actively wastes computational resources—a critical consideration for startups and enterprises managing AI costs.

Second, for medium-complexity tasks, reasoning models do shine, demonstrating advantages over standard LLMs, but at significantly higher computational costs. The question becomes whether the marginal improvements justify the substantial increase in resource usage, a calculation that varies dramatically by industry and use case.

Third, for high-complexity tasks, both model types experience a complete accuracy collapse; however, reasoning models paradoxically reduce their reasoning effort despite having adequate token budgets. They essentially "give up" on complex problems, cutting their reasoning short—a limitation that creates significant risks for investors backing companies targeting complex reasoning applications.

This pattern reveals a fundamental limitation for the market: current reasoning models do not develop general strategies for problem-solving. Even with mechanisms like self-reflection and extended thought paths, they fail to maintain performance as tasks grow more complex. This creates a natural segmentation in the market, with different companies likely to emerge as leaders in each complexity tier.

More concerning for investors and founders is the finding that these models don't engage in actual formal reasoning but instead rely on advanced pattern matching. Their performance drops sharply when presented with irrelevant information or minor changes to the problem structure—precisely the kind of variations that are abundant in real-world business scenarios.

Overthinking Healthcare Outcomes

Healthcare organizations have been early adopters of reasoning models, creating a rapidly growing market segment with both established players and startups competing for market share. The promise is compelling: reducing administrative burdens for overworked clinicians while improving diagnostic accuracy and patient outcomes, which in turn translates to significant cost savings and revenue opportunities.

Several major healthcare systems have implemented reasoning models to assist clinicians with complex diagnostic decisions, particularly in radiology and pathology. These systems analyze patient data, medical images, and clinical notes to suggest potential diagnoses, ostensibly with transparent reasoning processes that clinicians can review.

The market reality reveals both significant challenges and promising breakthroughs. As highlighted in Canada's 2025 Watch List on AI in Healthcare, conventional reasoning models demonstrate inconsistent performance across different patient populations and disease presentations. The "complete accuracy collapse" identified by Apple researchers manifests in healthcare as diagnostic systems that fail when faced with complex, multi-factor conditions, precisely when sophisticated reasoning is most needed.

A particularly troubling pattern emerges in diagnostic applications: overthinking simple cases while underestimating the complexity of others. For routine presentations, models often find the correct diagnosis early but continue generating unnecessary alternatives, potentially introducing confusion. Conversely, for complex cases with multiple interacting factors, models paradoxically reduce their reasoning depth, cutting short the analytical process when it's most critical.

Despite these challenges, healthcare organizations are finding success with more specialized approaches. Mayo Clinic researchers have developed an AI model that demonstrates 92% mean accuracy in detecting pancreatic cancer in CT scans, with the ability to identify cancer approximately 475 days before clinical diagnosis. Their approach combines specialized AI models with human oversight, creating a complementary system that leverages the strengths of AI while maintaining clinical judgment.

Similarly, the Cleveland Clinic has reported significant efficiency gains through its AI-powered virtual triage system, which has achieved 94% accuracy in diagnosing patient needs. Their system connects emergency doctors with patients across multiple facilities and has achieved 83% patient satisfaction rates. This approach enables patients to receive the proper care promptly while optimizing healthcare resources and enhancing the overall patient experience.

The regulatory implications remain challenging for business leaders and investors. The 2025 Watch List highlights significant questions about liability when healthcare providers rely on AI reasoning judgments. Who bears responsibility when a reasoning model provides logically sound but clinically incorrect guidance? The human-like "chains of thought" create an illusion of understanding that may lead to inappropriate trust in the system's recommendations, potentially exposing companies to significant liability for deploying these systems.

The Cost of Overthinking

Financial institutions have deployed reasoning models across various applications, creating a substantial market opportunity for both startups and established AI providers. The sector's data-rich environment and quantitative focus make it an ideal setting for AI reasoning capabilities.

Upstart, an AI-powered lending platform, reports approving 44% more borrowers than traditional models while maintaining lower annual percentage rates. BlackRock leverages AI to analyze thousands of earnings call transcripts and broker reports daily. These success stories highlight the potential for significant returns on investment in financial AI.

However, the limitations of reasoning models create unique challenges in financial contexts that investors and founders must consider. The "overthinking" phenomenon identified by Apple researchers has significant implications for time-sensitive financial decisions. During critical market events, the extended reasoning time creates dangerous delays that can eliminate potential advantages, a crucial consideration in fintech startups targeting trading applications.

Federico Dominguez, Founding Partner of MyStockDNA, notes that "no human being can keep up with the pace of change of modern markets." This speed requirement creates a fundamental tension with the extended processing time of reasoning models. High-frequency trading firms have largely abandoned reasoning models due to latency issues, finding that the computational overhead of reasoning traces renders any analytical advantage moot. This market reality has forced several well-funded, reasoning AI startups to pivot away from trading applications.

More concerning for investors is the performance degradation during market volatility, precisely when robust reasoning is most valuable. Financial analysts report that reasoning models trained on historical market data struggle with novel economic conditions, demonstrating the pattern-matching problem identified in Apple's research. They apply different reasoning approaches to similar financial scenarios, creating unpredictable outcomes that undermine their reliability for critical decisions.

The economic trade-offs are equally challenging for business leaders evaluating these technologies. Financial institutions report that reasoning models for credit decisions cost four to seven times more to operate than traditional scoring models. This additional cost is justified only for borderline cases that fall between explicit approval and clear rejection thresholds—a relatively small percentage of total applications. For routine credit decisions, the additional "thinking" provides no measurable improvement in default rates, creating a narrow market opportunity that may not support the current valuation of many reasoning AI startups.

When Complexity Collapses Reasoning

Municipal governments and planning agencies have begun using reasoning models to assist with complex infrastructure planning decisions, creating a growing market for AI solutions in this sector. These applications are well-suited to AI reasoning capabilities, given the multiple variables and constraints involved in infrastructure planning.

However, the "complete accuracy collapse" identified by Apple researchers manifests dramatically in these contexts, creating significant risks for investors and business leaders. When faced with highly complex urban planning scenarios involving numerous interdependencies, reasoning models often exhibit catastrophic failure modes, rendering their outputs unreliable.

Infrastructure planners report significant difficulties in verifying the correctness of reasoning models' outputs, creating liability concerns that business leaders must address. Unlike mathematical problems with clear, correct answers, infrastructure planning involves trade-offs and value judgments that are difficult to validate algorithmically. The reasoning may appear logical, but it can lead to suboptimal or even dangerous recommendations when implemented in the physical world—a risk that creates significant barriers to market adoption.

A particularly concerning pattern for investors is the inconsistent application of constraints. Models apply different constraints inconsistently across similar planning scenarios, creating unpredictable outcomes. In one documented case, a municipal planning department found that their reasoning model recommended entirely different utility placements for two nearly identical neighborhood developments, with no apparent justification for the divergence. This inconsistency has led several agencies to revert to traditional planning methods after initial AI deployments.

The computational resources required for reasoning models often exceed the benefits compared to traditional planning tools, creating challenging unit economics for startups targeting this sector. Municipal governments report that reasoning models for infrastructure planning cost 5- 8x more than conventional approaches, with the additional costs justified only for highly interdependent systems where traditional methods struggle. This narrow use case may not support the current market valuations of companies focused on infrastructure AI.

The Economics of Artificial Thinking

Reasoning models present organizations with a fundamental economic paradox that investors, founders, and executives must understand: higher computational costs do not necessarily translate to better decision quality. This creates complex trade-offs that impact both investment decisions and deployment strategies.

The token economy plays a central role in addressing this challenge. Reasoning models generate significantly more tokens than standard large language models (LLMs), with each token incurring substantial computational costs. A single complex reasoning trace can create thousands of tokens, and organizations pay for all "thinking" tokens regardless of whether they contribute to better outcomes. Context window limitations mean reasoning tokens displace other potentially valuable information, creating opportunity costs beyond direct expenses—a critical consideration for startups managing burn rates and enterprises evaluating ROI.

Apple's research on performance regimes has direct economic implications that should inform investment strategies. For low-complexity tasks, standard large language models (LLMs) outperform reasoning models while using fewer tokens, resulting in a negative return on investment (ROI) for reasoning capabilities. For medium-complexity tasks, reasoning models exhibit advantages, but at significantly higher token costs, resulting in a marginal return on investment (ROI). For high-complexity tasks, both model types fail; however, reasoning models consume more resources, resulting in a negative return on investment (ROI).

The overthinking costs are particularly problematic for business models built around reasoning AI. Models frequently find correct answers early but continue generating alternatives, creating unnecessary expenses without improving outcomes. In financial applications, this can delay time-sensitive decisions during critical market events, resulting in opportunity costs that exceed computational direct expenditures. This pattern has forced several reasoning AI startups to redesign their pricing models to remain competitive.

Equally concerning for investors is the underthinking risk. Models cut reasoning short precisely when extended thinking is most needed. In healthcare diagnostics, this can lead to missed considerations for complex cases. In infrastructure planning, critical interdependencies may be overlooked, resulting in safety risks that extend beyond financial implications. These limitations create significant barriers to market adoption that investors must consider when valuing companies.

Organizations must develop sophisticated approaches to model selection and deployment that account for task complexity, time sensitivity, and specific performance requirements. Strategic approaches include hybrid deployment models that utilize standard large language models (LLMs) for routine tasks, while reserving reasoning models for medium-complexity tasks where their advantages justify the additional costs. IBM Granite 3.2's toggleable "thinking" mode enables organizations to activate reasoning only when needed, while Claude 3.7 Sonnet's ability to control reasoning duration allows for fine-tuned cost management. These capabilities are creating market differentiation among providers and should inform investment decisions.

Strategic Guidance for Market Participants

For investors, founders, and executives navigating this complex landscape, a structured approach to reasoning model evaluation is essential for maximizing returns and minimizing risks.

Investors should focus on companies targeting the medium-complexity tier, where reasoning models demonstrate clear advantages over standard large language models (LLMs). Companies developing hybrid approaches that match model type to task complexity represent more attractive investments than those pursuing universal reasoning capabilities. Energy efficiency metrics should be a key evaluation criterion, as computational costs will increasingly drive competitive differentiation. The most promising investment targets combine domain-specific expertise with reasoning capabilities rather than offering general-purpose reasoning tools.

Founders building reasoning AI companies should focus on specific complexity tiers rather than attempting to address all use cases. Open-source strategies offer significant advantages in terms of community adoption, talent acquisition, and ecosystem development compared to closed approaches. Energy efficiency should be a core design principle rather than an afterthought, as computational costs will increasingly drive purchasing decisions. The most successful startups will integrate multiple AI architectures to address different aspects of their problem domain rather than relying solely on reasoning capabilities.

Senior executives implementing reasoning models should adopt a tiered implementation strategy that matches model type to task complexity. Standard LLMs should be deployed for simple tasks, reasoning models considered for medium-complexity tasks with careful cost monitoring, and human-AI collaborative frameworks implemented for high-complexity tasks rather than relying solely on reasoning models. For critical decisions, human oversight should be maintained regardless of AI capabilities.

This tiered approach should be tailored to specific industry contexts and business objectives. In healthcare, reasoning models are most effective for providing preliminary diagnostic support, but not for making a final diagnosis. In finance, they're more suitable for scenario analysis but not for autonomous trading decisions. In infrastructure, they can help generate planning alternatives, but shouldn't be used for final approval.

Technical risk mitigation should include complexity assessment tools to identify tasks beyond model capabilities, performance thresholds that trigger human review, monitoring systems to detect reasoning inconsistencies, and fallback mechanisms when reasoning quality degrades. These safeguards are essential for protecting both business performance and company reputation.

Organizational risk mitigation requires clear policies on appropriate use cases and limitations, established liability frameworks and decision accountability, documentation requirements for AI-assisted decisions, and regular auditing of reasoning model outputs. These governance structures are increasingly important as regulatory scrutiny of AI decision-making intensifies.

A comprehensive economic evaluation should calculate fully-loaded costs, including direct inference costs, integration and maintenance costs, monitoring and oversight costs, and potential liability costs. These should be compared against quantifiable benefits, such as labor efficiency improvements, enhanced decision quality, risk reduction metrics, and competitive advantages. This detailed analysis often reveals that reasoning models are economically viable for a narrower set of use cases than initially assumed.

Innovators Breaking Through

While Apple's research revealed fundamental limitations in reasoning models, several innovative companies have developed approaches that address these challenges. Their strategies offer valuable lessons for investors seeking promising opportunities and executives looking to implement more effective AI solutions.

Manus: Context-Aware Reasoning Architecture

Manus, founded by Red Xiao Hong and Yichao Ji, has developed an AI agent system designed to help people "build at the speed of thought." The company has achieved significant market traction, reaching a valuation of approximately $500 million, with robust performance in international markets, earning seven times more revenue in USD markets compared to yuan markets.

Manus has focused on developing autonomous AI agent technology that can adapt to varying levels of complexity. Their approach to reasoning addresses some of the fundamental limitations identified in Apple's research by dynamically allocating computational resources based on task complexity.

The company's technology has shown promising results in enterprise deployments, with measurable improvements in both efficiency and accuracy across various business applications. For investors, Manus represents an interesting case study in how architectural innovation can overcome limitations in AI reasoning capabilities.

Wand: Hybrid Symbolic-Neural Integration

Wand, led by co-founders CEO Rotem Alaluf, President Philippe Chambadal, and CPO Yogev Shifman, has developed an enterprise AI platform featuring a "multi-agent cognitive layer" that serves as an intermediary between users and AI models. The company has been particularly focused on addressing the compounding error effects in large language models—a critical issue for reasoning applications.

Wand's approach integrates multiple reasoning methodologies rather than relying on a single approach. This hybrid strategy addresses the pattern-matching limitations identified in Apple's research by providing alternative reasoning pathways when one approach fails.

The company has focused primarily on enterprise applications, with a particular strength in scenarios that require consistent reasoning across similar cases. Their platform's architecture suggests a recognition of the fundamental limitations in pure neural approaches to reasoning and an attempt to mitigate these through architectural innovation.

Writer: Domain-Specific Reasoning Frameworks

Writer, led by co-founders CEO May Habib and CTO Waseem Alshikh, has developed a full-stack generative AI platform specifically built for enterprises. The company has raised $326 million (as of March 2025) to pursue its vision of domain-specific AI.

Writer's approach focuses on training models using customers' data, creating specialized reasoning capabilities tailored to specific industries and use cases. This strategy directly addresses the performance regime challenges identified by Apple researchers by avoiding the pursuit of general-purpose reasoning capabilities.

The company's focus on domain-specific models rather than general-purpose reasoning aligns with the findings from Apple's research, that reasoning models perform best within narrowly defined domains where pattern recognition can be highly optimized. Writer's success in enterprise deployments suggests that this specialized approach may be more commercially viable than attempts to create universal reasoning capabilities.

Implications for Market Participants

These innovative approaches share several common elements that investors, founders, and executives should note:

First, all three companies have abandoned the pursuit of general-purpose reasoning in favor of more specialized approaches. This suggests that the future market leaders in AI reasoning will be those that focus on specific problem domains or reasoning types rather than universal capabilities.

Second, each company has developed proprietary methods for addressing the fundamental limitations identified in Apple's research rather than simply scaling existing approaches. This focus on architectural innovation rather than computational brute force has created more defensible market positions and better unit economics.

Third, all three companies have prioritized reasoning consistency and reliability over impressive but inconsistent performance on benchmarks. This emphasis on real-world reliability rather than laboratory performance has accelerated enterprise adoption and reduced the number of implementation failures.

For investors, these companies represent promising models for evaluating other players in the reasoning AI market. For founders, they demonstrate viable paths to overcoming the limitations that have constrained the adoption of reasoning models. And for executives, they offer templates for more effective AI implementation strategies that deliver reliable business value rather than impressive but unreliable capabilities.

The Future of the Reasoning AI Market

Despite their current limitations, reasoning models represent a significant step in AI development and a substantial market opportunity for well-positioned companies. Research continues to address fundamental limitations, with several promising directions that investors should monitor.

Alternative reasoning architectures that move beyond fine-tuning large language models (LLMs) represent a promising investment area. Hybrid systems combining neural approaches with symbolic reasoning could overcome current limitations. Complexity-aware models that can allocate reasoning resources appropriately may address the counterintuitive scaling behavior identified by Apple. Domain-specific reasoning frameworks tailored to particular industries offer more defensible market positions than general-purpose approaches.

Investors, founders, and executives should monitor these developments while maintaining realistic expectations about near-term capabilities. The most successful market participants will be those that view reasoning AI as one component of a broader decision-making ecosystem rather than a standalone solution.

Let's Wrap This Up

The "illusion of thinking" identified by Apple researchers has profound implications for the reasoning AI market and the strategic decisions of investors, founders, and executives. These models represent significant advancements in AI capabilities; however, their fundamental limitations necessitate thoughtful approaches to investment, product development, and deployment.

The gap between theoretical capabilities and practical reliability remains substantial for most reasoning models. They do not engage in actual formal reasoning but instead rely on sophisticated pattern matching that breaks down in novel or complex scenarios. Their counterintuitive scaling behavior—thinking less when problems become more challenging—creates particular risks in high-stakes applications and challenges current market valuations.

Yet as we've seen with innovators like Manus, Wand, and Writer, these limitations are not insurmountable. Companies that have developed specialized approaches targeting specific complexity tiers, integrated symbolic and neural methods, or focused on domain-specific reasoning frameworks are showing promising results that overcome the fundamental limitations of conventional reasoning models.

The most successful market approach views reasoning AI not as a replacement for human judgment, but as a complementary tool that handles routine analytical tasks while freeing human experts to focus on complex decisions that require genuine understanding, creativity, and ethical consideration. This complementary relationship—rather than an illusory vision of fully autonomous AI reasoning—represents the most promising path forward for investors, founders, and executives.

As you evaluate reasoning models for your investment portfolio, product roadmap, or organizational deployment, focus not on the impressive benchmarks or human-like reasoning traces, but on the specific business problems where these tools can deliver measurable value within their known limitations. With clear-eyed assessment and strategic positioning, reasoning models can generate significant returns—provided you never forget that for most systems, the "thinking" remains, for now, an illusion.

The journey towards truly open, responsible AI is ongoing. We will realize AI's full potential to benefit society through informed decision-making and collaborative efforts. As we explore and invest in this exciting field, let’s remain committed to fostering an AI ecosystem that is innovative, ethical, accessible to all, and informed.

If you have questions, you can contact me via the chat in Substack.

UPCOMING EVENTS:

RECENT PODCASTS:

🔊NEW PODCAST: Build to Last Podcast with Ethan Kho & Dr. Seth Dobrin.

Youtube: https://lnkd.in/ebXdKfKs
Spotify: https://lnkd.in/eUZvGZiX
Apple Podcasts: https://lnkd.in/eiW4zqne

🔊SAP LeanX: AI governance is a complex and multi-faceted undertaking that requires foresight on how AI will develop in the future. 🎙️https://hubs.ly/Q02ZSdRP0

🔊Channel Insights Podcast, host Dinara Bakirova https://lnkd.in/dXdQXeYR
🔊 BetterTech, hosted by Jocelyn Houle. December 4, 2024
🔊 AI and the Future of Work published November 4, 2024
🔊 Humain Podcast published September 19, 2024
🔊 Geeks Of The Valley. published September 15, 2024
🔊 HC Group published September 11, 2024
🔊 American Banker published September 10, 2024

Disclaimer: The views and opinions expressed above are current as of the date of this document and are subject to change without notice. Materials referenced above will be provided for educational purposes only. None of the above will include investment advice, a recommendation or an offer to sell, or a solicitation of an offer to buy, any securities or investment products.