Can artificial intelligence tools help find the needle in the haystack of corporate filings?


Artificial intelligence is rapidly transforming the investment landscape in ways that extend beyond algorithmic trading and robo-advisors. One of the most promising applications of AI lies in its ability to process and extract meaning from large amounts of unstructured text – something that even the most diligent human investors struggle to do at scale. While a skilled analyst can peruse a handful of company files in a day, AI can analyze thousands of documents simultaneously, identifying patterns and connections that would be virtually impossible for humans to discern. This ability is especially valuable because much of the information that moves stock prices is buried in narrative disclosures—a sea of ​​text that companies release through regulatory filings.

With the average 10-K report containing over 60,000 words, the challenge is to identify which sentences actually matter—what is actually new and significant enough to move stock prices? Finding this important information can be like trying to find a “needle in a haystack.” Anna Costello, Bradford Levy and Valeri Nikolaev, authors of the November 2025 study “Representations of Investor Trusts” addressed this question using artificial intelligence.

What Researchers Examined

Costello, Levy, and Nikolaev developed a new approach to identify “surprise” information in corporate filings. Their solution combined information theory with large language models (LLM) – the same technology behind ChatGPT. They trained artificial intelligence models specifically on financial disclosures to understand what information investors already know about a company, then used these models to identify what actually new information in subsequent files. Their study required:

  • Pre-training an LLM from scratch on a cross-section of firms’ narrative disclosures.
  • The LLM is further trained from the time series of each individual firm’s disclosures to yield a firm-specific model for each firm in the sample.
  • Iterative implementation and further pretraining of the firm-specific model.
  • Out-of-sample test to measure information in new narrative discoveries.

Their study analyzed all SEC EDGAR disclosures filed by 500 companies from 1996 to 2023, covering nearly 278,000 filings with approximately 1.7 billion words. By training from scratch with a fixed knowledge cutoff of 2007 and iteratively updating each firm-specific LLM, they addressed concerns about future bias.

Key findings

1. Most news doesn’t come from where you think

While investors and researchers traditionally focus on earnings announcements and quarterly reports, the study found that most new information actually arrives through current reports (Form 8-K) and exhibits attached to filings, rather than the main body of annual and quarterly reports.

The exhibits attached to the files contained approximately 150% more high-information content than the main parts of the files. Even more striking, while earnings announcements receive significant attention, other 8-K items such as changes in accountants, bankruptcy filings, and caveats to previously issued financial statements contained 60% or more high information content.

2. Information arrives continuously, not just quarterly

The research found that 55.2% of high-information content arrives almost consistently through current reports of non-profit announcements and other filings, while only 10.2% comes from earnings announcements, 14.3% from quarterly reports and 20.3% from annual reports – challenging the common practice of only checking companies during quarterly earnings season.

3. The measure of AI explains market reactions

The researchers validated their approach by showing that it explains actual market behavior—companies with filings in the highest decile of information saw a 106% increase in absolute returns on the date of disclosure, compared to just 24.2% for those in the lowest decile.

4. Sentiment only matters when it is informative

While traditional sentiment analysis found the difference between the most negative and most positive presentations to be about 53 basis points, when the researchers weighted sentiment by information content, that difference jumped to 422 basis points. In other words, it doesn’t just matter whether the language is positive or negative—it’s whether that positive or negative language is telling investors something they didn’t already know.

5. Limited attention has consequences

The study examined what happens when investors process only certain types of disclosures. They found that investors who read only annual reports or annual and quarterly reports would experience perceived “under-reactions” to what they consider to be news, while investors who relied only on current reports generally saw market reactions that matched their beliefs but were somewhat muted.

Their findings led Costello, Levy, and Nikolaev to conclude, “LLMs can be used to form preferences for narrative content, which can then be used to identify information in new content.”

Key Investor Relations

1. Don’t just focus on earnings day

If you’re only paying attention to quarterly earnings announcements, you’re missing most of the important information. Set alerts for all Form 8-K filings from companies in your portfolio, not just earnings releases.

2. Read the Exhibits

Those long attachments in SEC filings that most investors overlook? They often contain the most valuable information – new contracts, debt agreements and material business developments.

3. Context is everything

A negative-sounding filing isn’t necessarily bad news if it’s simply repeating information the company has already disclosed. Similarly, positive language only matters if it represents genuinely new information. This is where research suggests AI tools can help individual investors level the playing field.

4. Issues of Continuous Monitoring

Unlike earnings that arrive on a predictable quarterly schedule, important information can drop at any time. This creates challenges for individual investors, but also opportunities for those who want to stay engaged throughout the year.

5. The Information Advantage is real

The fact that this AI-based information measure can predict returns for up to 12 months suggests that careful processing of narrative insights provides real investment insights. The market eventually incorporates this information.

conclusion

This research provides scientific validation for something that many experienced investors know intuitively: reading and understanding company disclosures matters. However, he also points out that in our data-saturated world, what you read and HOW you process it is just as important as whether you read it at all.

As AI tools become more accessible, individual investors may soon have powerful allies in sorting through the chaff of corporate disclosures to find the needles that really matter. Until then, the key lesson is clear: expand your exposure beyond quarterly earnings, pay attention to all material disclosures, and remember that innovation—not just sentiment—is what moves markets.

AI and the future of market efficiency

The fact that AI can identify information that predicts returns up to a year ahead suggests that markets may not be as efficient at processing narrative information as they are at processing numerical data such as earnings surprises. The sheer volume and complexity of textual disclosures—with important information scattered across different types of files, arriving at unpredictable times, and buried in lengthy exhibits—creates natural barriers to information processing that even sophisticated investors struggle to overcome.

As AI tools become more accessible and adopted, we can see markets become more efficient at incorporating narrative information. When more investors can quickly identify and act on genuinely new information, regardless of where and when it appears, mispricing based on limited attention or incomplete processing should be reduced. This can narrow the window of opportunity to generate alpha from textual analysis.

However, this also raises an interesting paradox: if everyone has access to similar AI tools, will the advantage disappear? Not necessarily. The key will lie in how these tools are implemented, what questions investors ask of them, and how their insights are integrated with other forms of analysis and judgment. Artificial intelligence can process information at a superhuman scale, but investment success will still require human wisdom in interpreting that information and making decisions under uncertainty.

The future of investing isn’t about AI replacing human judgment—it’s about augmenting human abilities to navigate an ever-expanding universe of information. Those who learn to use this partnership effectively may find that true alpha lies not in having the most information, but in knowing what information matters.

Larry Swedroe is the author or co-author of 18 books on investing, including his latest Enrich your future. He is also a consultant to RIAs as an educator on investment strategies.

Important discoveries

For informational and educational purposes only and should not be construed as specific investment, accounting, legal or tax advice. Some information is considered reliable, but its accuracy and completeness cannot be guaranteed. Third party information may become out of date or be replaced without notice. Neither the Securities and Exchange Commission (SEC) nor any other federal or state agency has approved, determined the accuracy, or confirmed the adequacy of this article.

The views and opinions expressed herein are those of the author and do not necessarily reflect the views of Alpha Architect, its affiliates, or its employees. Our full disclosures are available here. Definitions of common statistics used in our analysis are available here (in the last direction).

Join thousands of other readers and subscribe to our blog.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *