How Agents Fail Investors

👋 Hi, welcome to Signal Lab where the team at Innate Language Processing digs deep into language & behavior, the limits of technologies, and the connection between what people say and do. 100% human authored, never with AI. For venture investors and founders.

AI agents can do a lot,¹if you let them: ingest reams of data, manage calendar and comms, source companies, process pitch decks, run market research, draft deal memos, perform investor relations. The efficiency they offer is significant, and many investors already can’t imagine life without them.

In mid-2023 James Currier (GP, NFX) said, “Let’s be honest: much of what a typical venture capitalist does — reading, summarizing, and ranking — is what large language models already do extremely well.” It was true when he said it, and even more true now, with agentic advances. A recent academic paper found that agents produce similar quality results at top-of-funnel screening, 537x faster than an associate,² especially when applying a clear investment thesis.

But agents are not all upside.

The 5 Ways Agents Fail

If you’re using AI agents across your stack, you need to know that agents fail. These are the ways, big and small, that will happen.

Bias & Fairness

The fundamental problem is that bias in LLMs is a feature, not a bug. Agents operate on top of LLMs. LLMs are LLMs, they cannot get rid of bias.³ Neither can agents.

This is why the #1 concern among data-savvy investors is algorithmic bias. It deserves its own post. The whole point of leveraging algorithms is to filter and prioritize based on patterns, but when those patterns are biased, the output is too.

For instance, VCs argue that agents can better uncover first-time founders in underserved areas because the data is not being filtered through referrals and networks. However, overindexed traits have been overindexed in the discourse and decisions of venture capital – this is the training data for LLMs, perpetuating the pattern-matching problem. And indeed, early studies have shown that LLMs exhibit the same cognitive biases as human investors (or, even more severe bias), even when debiasing measures are in place (e.g., prompting to ignore certain data or nudging the LLM to search with priorities like women-led, regional diversity, etc.).

If you thought that we could just train the bias out of the model using synthetic data, you are wrong (—> model collapse). And if you thought we can just untrain the models or pluck out problems and overinfluence, it’s not that simple. Look into the work of nonprofit DAIR for more along these lines. And unless you’re building your own model, you have no control over this.

Data Security & Privacy

In the era of widespread adoption of unapproved generative AI tools at work (aka a new branch of longstanding “shadow IT”), most agents are deployed without proper security measures, leaving investors’ infrastructure vulnerable to data leakage. Not coincidentally, regulatory is the area of least concern among data-focused VCs, surveys show. Generally, agents do not handle data privacy, copyright, and security in ways that optimize for protection.

If your agents’ use of APIs involves your sensitive data leaving your environment, it is much harder to assess your security risk. Especially for VCs dealing with companies in regulated industries like fintech and healthcare particularly, this is huge. Self-hosted open source models are the most secure choice because data never leaves a controlled environment, but performance lags behind most foundation models and managing them is no small task. At bare minimum, most investors aren’t even using available methods of securely bridging their data to a LLM.⁴ Making agents secure is a very big task which most VCs approach with a slapdash solution or an opt-out.

Reliability & Trust

LLMs are effectively next-token-predictors. While their output is impressive, they are not “thinking” machines.⁵ Agents, sitting atop the LLM, are also not thinking machines. They are orchestrations, making typical agent mistakes: compounding errors across steps (a bad retrieval in step 2 corrupts everything downstream), not knowing when to stop or escalate, and demonstrating unpredictable behavior when the environment doesn't match what the agent was designed for. Plus, they still fail in basic LLM tasks because they are built on LLMs and inherit their shortcomings.

RETRIEVAL: Agents struggle to accurately identify similar companies, for example, because much more goes into similarity than a company's description. Determining Uber and Lyft's similarity is pretty simple, but finding analogues in cybersecurity is much more complex because of subsectors and specializations, business model, customer, and technology differences.

TAGGING: Dependence on agents means that the human is often outside the loop for investment-critical outputs. And outside the loop, you don’t know what you don’t know. VCs using an agent to tag from source data they never see have no accountability for the conclusions drawn or omissions that stem from that original data.

SUMMARIZING: In 2023 SignalFire, an AI leader in VC, advocated “to use LLMs to optimize internal data gathering rather than decision-making and communication” and disparaged using LLMs/agents to summarize pitch decks or memos so partners don’t have to read them in full because “there are so many nuances to a pitch and business plan that could easily get lost or misinterpreted…. once you’re evaluating their vision, human experience, and expertise are critical to avoiding costly mistakes.”

Agent reliability can fluctuate over time. Hallucinations and AI slop can occur. When an agent is used on the front line, the agent may claim it took an action it never did, it can fabricate details. It’s such a big problem that you can now find patches on github to address investor-agent hallucinations specifically.

This is the layer VCs are relying on to separate the companies worth seeing from those to ignore?

Systemic & Structural

Typical agents are built on off-the-shelf models. Off-the-shelf models aren’t differentiated (i.e., they aren’t going to give you an edge since anyone can use them). Have we considered this could lead to homogenized decision-making?

Or perhaps prompt writing is the new differentiation?

In the words of the AI Realist (also a computational linguist), “I do not argue that a prompt has no impact on the output - on the contrary, it has too much impact. An impact that is inconsistent and difficult to control. It is a useful skill to be able to word your query so that it returns a good answer, but it is equally important to know when to stop tuning that query and to understand that no amount of prompting can ensure consistency.”

“Prompt engineering” is not just a productivity trap, it is already outmoded: AI-generated prompts are proven to outperform expert-human prompts. Humans are optimizing prompts for machines that are better at writing prompts than the human doing the optimizing. “Prompt Engineer” job posts went extinct for a reason.

Cost

The rise of solo GPs was fueled by LLM support. Carta’s 2025 year-end data showed over 3200 solo GP funds in the US, up from 1100 in 2020. But agents are much, much more expensive. Token cost is a massive investor opex line item, where spinning up from 4-figure to 6-figure spend within a couple quarters is not unheard of.⁶

The gap between the agentic architectures staged by the biggest VC firms and the smaller (or solo) firms is very wide (Exhibit A). It’s easy to imagine that, just as in hedge funds, the top VC firms will increase their lead by creating operating advantages through developer-led processes that are simply too expensive for smaller funds to emulate.

One Last Failure

The more things change, the more they stay the same: investors have always been obsessed with not missing out on a potential unicorn. That obsession is what drove agentic workflow adoption in the first place in VC. Two sides of the same FOMO coin. But agents — despite their real benefits of scaling insights and reach, speeding and synthesizing data collection, and systematizing deal review — do fail. They fail to recognize unicorns. They fail to return accurate information, sometimes they even falsify it. In VCs’ eagerness to create “bionic firms”, they have pulled the human out of the loop prematurely, embracing time savings won by unnumbered inaccuracies and vulnerabilities left open.

One last way agents fail?

They can’t even comprehend the most critical factor in the investment decision. Founder risk. Agent filters miss it completely.

Person-based risk, despite its inherence in linguistic artifacts, is not interpretable by agents or LLMs. LLMs are notoriously limited in perceiving impression-based, intuition-led factors of human judgment. This is a line of demarcation, and agents cannot cross it.

But you can, with a little help.

See you next month.

Share the newsletter

1 AI agents are systems that use LLMs as a reasoning layer, then connect that reasoning to actions, memory, tools, and workflows. LLMs gained traction in 2023 and AI agents in 2025, facilitated by advances in no-code implementation.

2 As you might expect, job displacement is already visible at lower levels. VC intern job posts are down by nearly two-thirds as a share of all roles compared against previous years.

3 If you don’t understand how LLMs work, I won’t introduce a lot of jargon or programming speak. However, you might benefit from reading this short article explaining LLMs which is so clear that I’ll forgive the authors for replacing “tokens” with “words” :/

4 Like MCP (Model Context Protocol) which allows control over what information gets exposed and how it’s formatted, but if you run MCP servers, move them all to remote. Monitor and audit all tool calls/responses and set fine-grained tool controls to prevent the AI from destructive actions. Set gateways. MCPs are not safe without these. Or investors can use RAG (Retrieval-Augmented Generation) which will support encryption and prevent an LLM from learning your data — not the biggest data security risk out there, as even if the investor has no enterprise-grade settings preventing the LLM from logging and training on data, exact input data will be nearly impossible to prompt from the LLM. However, it needs to be considered, in the context of organizational policies on data security and confidentiality in general, especially in case of PII (Personal Identifiable Information).

5 For more on this, see Melanie Mitchell’s work, like this book.

6 Few investors have moved to self-host open source models locally or in their own VPC. But, if you can believe it, the cost is actually cheaper at scale than API calls after setup. It’s still not a common choice because — although it’s maximum control and differentiation — you’re managing hosting, scaling, and maintaining updates which is a job and also requires fine-tuning.