👋 Hi, I’m Melissa and welcome to my biweekly Note, Language Processor. Every other Tuesday, I dig deep into language & behavior, the limits of technologies, and the connection between what people say and do. Once a month, Signal Lab takes over and covers these topics for startups and early-stage investors.

📈 The Prevailing Data Paradigm

Private markets intelligence is powered by platforms that track easily quantifiable data: team information, funding rounds, growth metrics. These are useful data points, to be sure, but it is understood that they do not offer a complete view. While VCs build agents and stacks to filter and predict towards success (market size, CAC, burn rate, etc., etc.), they are underutilizing unstructured language data as signal.

Language is a data set conveying much more than the sum of its tokens, or the categorical and quantitative information VCs extract from it. This is a blind spot that limits how investors evaluate opportunities.

👃 How VCs Analyze Language

How a founder says what they’re saying is obviously important. Why is it that gut and intuition are the only tools most VCs use to analyze this data?

This while, according to the Harvard Business Review, 95% of VC firms weigh the founder as one of their most important factors toward investment, ahead of business model (74%), market (68%), and industry (31%). Most VCs agree that the key factor in startup success is innate rather than external. In 2013, Paul Graham (YC) noticed that the differentiator between all his best and worst founders was… how they talked. Other investors surmise linguistically-measurable factors as #1 toward success: Garry Tan (Initialized Capital) says personal responsibility, Anand Sanwal (CB Insights) topicality, Charlie Munger (Berkshire Hathaway) trustworthy presentation.

But converting these impressions into hard data — taking linguistic measurements — has eluded investors. For instance, what Paul Graham noticed can actually be measured, quantitatively, as linguistic uptake.

Instead, VCs assess founders’ language in one of two ways.

First, as a delivery vehicle for raw numerical and categorical data to feed into the funnel. To mine metrics from speech (e.g., “exit”, “TAM”, “co-founder”, “valuation”) gives a high-level overview of company position and trajectory. However, this approach sees language structures offering little to no inherent value.

Alternatively, if investors want to “know how the engine runs” and assess the language itself, it is an in-the-moment gut read on the founder. This is one of the essential tasks of the VC — to “read” a founder for signal.2 And at the earliest stages, when there is little else to analyze, a gut read weighs all the more heavily. VCs are paid by LPs to do it, but the facts speak for themselves: they are pretty bad at differentiating signal from noise in human interaction.

Language is largely made to… confuse people, delude them, charm them, seduce them, scare them, and exploit them.

- Nassim Taleb, The Bed of Procrustes

Language is confusing! There are 3000+ venture firms. If you remove the top ~20, venture capital underperforms the index. Those are some bad guts.

Bottom line: VCs overindex on language in funding decisions, by performing little to no language processing — though they will perform thorough product, market, and competitor analysis. This is an expensive mistake.

🚥 Language is Signal

Despite the majority view that the founder is the best predictor of startup success, most founder-centric evaluative measures are actually noise. The many tools to analyze founder personality, psychology, motivation, while they have a use, uncover psychometric idiosyncrasies rather than success metrics (or metrics of investment relationship success, which is also important to VCs).

Even linguists admit linguistic data “is not inherently quantified, nor does it possess a uniform structure with which to convert it into useable metrics”, 1 so it’s understandable that VCs have not devised their own system of measuring linguistic data for these purposes.

🟢 What No One is Doing

Processing the founder’s own statements and speech with behavioral linguistic intelligence surfaces investment risk and founder signal. This converts an intuition-led process to a data-driven one, allowing insights across funnel and portfolio.

Signal capture is only possible with data, metadata, and annotations optimized for performance outcome. (No existing text analysis or conversation intelligence platforms offer this.)

This is what Innate Language Processing does.

The more unstructured language data going into the models, the more understanding of the signal grows in sophistication and nuance, creating a powerful data flywheel.

See you in two weeks.

1 Boyd & Pennebaker, 2015

2 After meeting with a founder, Sequoia partners rate them and keep that data in the CRM permanently. That rating can be argued among partners, but is itself inarguably subjective.

Reply

or to participate

Keep Reading

No posts found