SEC EDGAR Alternative Data Signals: 8 Unconventional Datasets Small Quant Pods Can Afford
The unconventional alternative data signals hiding in SEC EDGAR: segment operating margins, Compensation Actually Paid, activist 13D filings, fund flows.

A two-person quant pod cannot write a six-figure check for a credit-card panel, a satellite-imagery feed, or a Bloomberg terminal. So the conventional wisdom is that real alternative data is out of reach, and the budget option is one of the cheap alt-data APIs: congressional trades, Reddit sentiment, web-traffic estimates. Useful, sometimes. Auditable, never. None of it traces back to a primary source you can defend in a drawdown post-mortem.
There is a better source sitting in plain sight, and it is free to download: SEC EDGAR. The catch is that the edge was never the filings themselves. It is what you extract from them. This post is a list of the eight most unconventional datasets StockFit pulls out of EDGAR, and the signal each one uncovers, every value carrying the accession number it came from so a backtest stays reproducible and lookahead-free.
Why SEC EDGAR is the most underrated alternative data source
Everyone has EDGAR. It is public, free, and a wget away. That is exactly why it is underrated: people assume that because the raw data is commodity, the signal must be too. It is not. A raw 10-K is three hundred pages of inconsistent HTML wrapped around a wall of XBRL tags. A raw Form N-PORT is a machine-hostile XML dump. A DEF 14A buries the most interesting governance numbers in a table that did not even exist before 2022. The filings are open; parsing them correctly, at scale, across every filer and every quirk of every filing agent, is the work.
The differentiator is extraction depth. Most data vendors stop at normalizing the income statement and balance sheet, which is why their fundamentals all look identical and none of them can tell you anything the next vendor cannot. StockFit keeps going: into the dimensional segment axis of the XBRL, into the pay-versus-performance table, into the Schedule 13D and 13G amendment trail, into N-PORT internals, and into an AI economic model whose every claim is verified against the source text before it is stored. The signal was never the filing. It is what you pull out of it, and almost nobody pulls this much. Each of the eight datasets below is a proof of that.
One more thing makes EDGAR-derived data a genuine alternative data source rather than a commodity feed: it is point-in-time by construction. Every value carries the accession number and filing date of the document it came from, so you can reconstruct exactly what was knowable on any date and never leak a restated number backward into a backtest. For the full argument, see point-in-time fundamental data for backtesting. The free /api/filings endpoint is the on-ramp: it returns the raw filing index for any ticker so you can see the documents every signal below is extracted from.
1. The audit-grade economic model: AI fundamentals you can verify
Start with the deepest extraction in the catalog, because it is the most alternative thing here. Every other dataset in this post is structured raw disclosure. The economic model is a derived signal layer: an AI reads a company's latest 10-K, 10-Q, and 8-Ks and returns a structured business model. No competitor produces this, and the reason most builders distrust AI fundamentals is the obvious one: you cannot tell whether the model read the filing or hallucinated it.
What the model returns, as structured fields:
- Offerings, each with its revenue role, monetization model, and the KPIs that move it
- Cost structure and unit economics
- Reinvestment, cash conversion, and capital allocation
- Operating levers and the metrics each one maps to
- Structural advantages, flywheels, and failure modes
- Strategic initiatives and management style
StockFit closes that gap at generation time. For every claim, the model carries a clickable EDGAR URL, the exact section it came from, and a verbatim quote copied word for word from that section. Before the model is stored, the pipeline fetches the filing, strips the HTML, and confirms the quote literally appears in the document. A fabricated or paraphrased quote is rejected and the model is regenerated. The output below is a trimmed slice of the real /api/company/economic-model response for Apple.
GET /api/company/economic-model?symbol=AAPL
{
"offerings": [
{
"name": "iPhone (smartphones)",
"revenueRole": "core",
"monetization": { "model": "one-time", "pricingUnit": "per device" },
"kpiMappings": { "primaryMetrics": ["iPhone net sales"] },
"sources": [
{
"source": "10-K",
"section": "Item 7 — MD&A (Products and Services Performance — iPhone)",
"quote": "iPhone net sales increased during 2025 compared to 2024 due to higher net sales of Pro models.",
"url": "https://www.sec.gov/Archives/edgar/data/320193/000032019325000079/aapl-20250927.htm"
}
]
}
],
"structuralAdvantages": [
{
"advantage": "Economics of Services mix (structurally higher gross margin)",
"type": "scale-economy",
"sources": [
{
"source": "10-K",
"section": "Item 7 — MD&A (Gross Margin percentage)",
"quote": "Services 75.4 % 73.9 % 70.8 %"
}
]
}
]
}The signal it uncovers: a machine-readable thesis you can feed to a research agent or a screening pipeline without it inventing the underlying facts. An analyst clicks the URL, jumps to Item 7, and finds the quote in ten seconds. This is the difference between an AI fundamentals toy and infrastructure, and it is the same accession-trail wedge that runs through every dataset below, just applied to generated prose instead of a number.
2. Business segment operating income: find the real profit engine
Consolidated numbers hide where a company actually makes money. The segment footnote does not, and US filers tag it on the XBRL business-segments axis: revenue, operating income, assets, goodwill, depreciation, and capex, per segment, per period. StockFit reads that axis directly. The /api/financials/business-segmentation response for Amazon makes the point louder than any commentary.
GET /api/financials/business-segmentation?symbol=AMZN&period=annual
[
{
"fiscalYear": 2025,
"segments": [
{ "name": "North America Segment", "metrics": { "revenue": 426305000000, "operatingIncome": 29619000000, "capex": 35919000000 } },
{ "name": "International Segment", "metrics": { "revenue": 161894000000, "operatingIncome": 4750000000, "capex": 7617000000 } },
{ "name": "Amazon Web Services Segment", "metrics": { "revenue": 128725000000, "operatingIncome": 45606000000, "capex": 96496000000 } }
]
}
]The signal it uncovers: AWS is about 18% of Amazon's segment revenue but 57% of its segment operating income. The retail business is the revenue; the cloud business is the profit. And the capex line tells the next chapter: AWS absorbed roughly 96.5 billion dollars of capital expenditure in fiscal 2025, more than the other two segments combined, the signature of an AI-infrastructure build-out that will land on the income statement as depreciation for years. A pod tracking segment-level operating margin and capex intensity sees the profit engine and the spending cycle quarters before either fully shows up in the consolidated print. Pair this with revenue segmentation for the demand side.
3. Revenue segmentation: geographic and product momentum
The same dimensional XBRL machinery splits revenue by geography and by product in a single call. The /api/financials/revenue-segmentation response partitions geography into country leaves, US states, filer-tagged regions, and residual catch-alls, and returns the product breakdown as a flat list. Apple, trimmed to the headline lines:
GET /api/financials/revenue-segmentation?symbol=AAPL&period=annual
{
"fiscalYear": 2025,
"geography": {
"countries": [
{ "name": "United States", "value": 151790000000 },
{ "name": "China", "value": 64377000000 }
],
"residuals": [{ "name": "Other Countries", "value": 199994000000 }]
},
"product": [
{ "name": "iPhone", "value": 209586000000 },
{ "name": "Services", "value": 109158000000 },
{ "name": "Wearables, Home and Accessories", "value": 35686000000 },
{ "name": "Mac", "value": 33708000000 },
{ "name": "iPad", "value": 28023000000 }
]
}The signal it uncovers: product and geographic concentration and momentum that a single consolidated revenue line erases. iPhone is still roughly half of Apple's total revenue, Services is the high-margin growth engine, and China at 64 billion dollars is a discrete, trackable exposure rather than a footnote. Diff this dataset across periods and you get segment-level growth divergence, the leading indicator a top-line number averages away. Because every figure is tied to the dimensional fact it was tagged on, you can trace any number back to the exact 10-K, which is why the data behaves like a research-grade signal and not a scraped estimate. For the parsing details, see the revenue segmentation API teardown.
4. Compensation Actually Paid: what the board really pays for
Since fiscal years ending on or after December 16, 2022, the SEC has required a pay-versus-performance table in the proxy statement, and it is one of the least-mined disclosures in public markets. It introduced a metric called Compensation Actually Paid (CAP), which re-marks equity awards to their actual value change during the year instead of the grant-date fair value in the Summary Compensation Table. The gap between the two is a real alignment signal. StockFit extracts the whole table through /api/executives/compensation.
GET /api/executives/compensation?symbol=AAPL
[
{ "fiscalYear": 2025, "ceoTotalComp": 74294811, "ceoActuallyPaidComp": 108423733, "companyTsr": 233.88, "peerGroupTsr": 279.51 },
{ "fiscalYear": 2024, "ceoTotalComp": 74609802, "ceoActuallyPaidComp": 168980568, "companyTsr": 207.59, "peerGroupTsr": 198.69 }
]The signal it uncovers: in fiscal 2024 Apple reported 74.6 million dollars of CEO total compensation, but Compensation Actually Paid was 169.0 million, more than double, because the equity granted in prior years appreciated. The reported number understated what the CEO actually earned by 94 million dollars. CAP moves with the stock, so it is a cleaner read on pay-for-performance alignment than the headline figure, and the company TSR and self-selected peer-group TSR ride alongside it for context. A companion call to /api/executives/performance-measures returns the single metric the board chose as most important; for Apple in fiscal 2025 that was Net Sales of 416.2 billion dollars, telling you exactly what management is held accountable for. This pairs naturally with the officer roster in the executive officers API.
5. The 13D filing: catching an activist take a position
When an investor crosses 5% of a voting class, they must file a Schedule 13D or 13G, and the form choice encodes intent. A 13G is for passive holders: index funds, ETFs, quiet institutions. A 13D is for control intent: activists, bidders, proxy contests. The /api/ownership/beneficial-owners/history endpoint returns the full trail, every filing per reporting person over time, not just the latest, with the filingType on each row. Southwest Airlines makes the contrast obvious:
GET /api/ownership/beneficial-owners/history?symbol=LUV
[
{ "reportingPersonName": "Elliott Investment Management L.P.", "filingType": "13D", "percentOfClass": 4.9, "reportDate": "2026-04-07", "accessionNumber": "0000902664-26-001872" },
{ "reportingPersonName": "PRIMECAP MANAGEMENT CO/CA/", "filingType": "13G", "percentOfClass": 9.58, "reportDate": "2026-05-13", "accessionNumber": "0000763212-26-000018" },
{ "reportingPersonName": "Franklin Resources, Inc.", "filingType": "13G", "percentOfClass": 9.0, "reportDate": "2026-04-29", "accessionNumber": "0000038777-26-000116" }
]The signal it uncovers: Primecap and Franklin hold larger stakes than Elliott, but they filed 13G, so they are along for the ride. Elliott filed a 13D, the activist form, and Elliott's campaign reshaped the Southwest board. The form type flagged the difference, not the position size. Because the history endpoint keeps every filing instead of overwriting the latest, you can timestamp the exact moment a holder's form flips from 13G to 13D, the passive-to-activist switch a snapshot vendor can never reconstruct because it only ever stores the current row. Every entry carries its accession number for the audit trail. The full deep dive lives in the 13D and 13G beneficial ownership API post.
6. Insider transaction codes: conviction vs compensation accounting
Every officer and director reports trades in their own stock on Form 4, and each transaction carries a one-letter code. Most consumers of insider data ignore it and end up treating an option exercise like a conviction buy. The code is the entire signal: P is an open-market purchase with personal cash, S is an open-market sale, M is an option or RSU exercise, F is shares withheld to pay tax, A is a grant, G is a gift. StockFit returns the code on every deduplicated row through /api/insider-transactions. Look at Apple's most recent insider activity:
GET /api/insider-transactions?symbol=AAPL&filter=all
[
{ "ownerName": "Newstead Jennifer", "officerTitle": "SVP, GC and Secretary", "transactionCode": "F", "shares": 16238, "transactionValue": 4813267.96, "transactionDate": "2026-06-15" },
{ "ownerName": "Borders Ben", "officerTitle": "Principal Accounting Officer", "transactionCode": "S", "shares": 116, "pricePerShare": 295.14, "transactionDate": "2026-06-16" },
{ "ownerName": "Borders Ben", "officerTitle": "Principal Accounting Officer", "transactionCode": "M", "shares": 240, "transactionDate": "2026-06-15" },
{ "ownerName": "LEVINSON ARTHUR D", "isDirector": true, "transactionCode": "G", "shares": 65000, "transactionDate": "2026-05-27" }
]The signal it uncovers: the absence of one. Across Apple's recent filings the codes are F, S, M, and G, tax withholding, a small sale, an option exercise, a charitable gift. Not a single P. Without the code field you would log all of that as insider activity; with it you see there is zero open-market conviction buying, which is exactly the normal pattern for a mega-cap that pays its executives in equity. Filter to code P and you isolate the only transactions where an insider spent their own money, which is the foundation of the cluster-buy study in do insider cluster buys beat the S&P 500. The /api/insider-transactions/summary endpoint rolls the same codes into 3, 6, and 12-month aggregates per ticker.
7. Fund flows and overlap from N-PORT: real flows, not estimates
Most fund-flow data you can buy is modeled: a vendor infers flows from price and shares-outstanding changes. StockFit reads the actual sales, redemptions, and reinvestments a fund discloses on Form N-PORT, the monthly portfolio report every registered fund files. The /api/fund/flows endpoint returns them straight from the filing. The companion /api/fund/overlap endpoint CUSIP-matches the latest N-PORT holdings of two funds to measure how much they actually share. Compare the Nasdaq-100 ETF against a tech-sector ETF:
GET /api/fund/overlap?symbol1=QQQ&symbol2=VGT
{
"overlapCount": 37,
"onlyIn1Count": 63,
"onlyIn2Count": 285,
"overlapWeight1": 48.79,
"overlapWeight2": 70.80
}The signal it uncovers: QQQ and VGT look like different products, but the 37 names they share are about 49% of QQQ's weight and roughly 71% of VGT's. Hold both and you are far less diversified than the two tickers suggest, a concentration and crowding read that matters for any portfolio built from ETF sleeves. Flows from the same N-PORT source add the demand side: real, filing-sourced sales and redemptions per month rather than a price-derived guess. Both are extracted from a filing format that is, in raw form, close to unusable, which is the whole point of this list. For a wider tour of fund disclosures, see the ETF deep-lens overview.
8. Sector-aware financial health scores from XBRL
The Altman Z-Score and Piotroski F-Score are textbook quality and distress factors, and every vendor computes a version. The difference is the inputs. StockFit computes them from SEC-sourced XBRL with the right line items per company, point-in-time, so a backtested score is never silently built on numbers that were later restated. The /api/financials/scores response also returns the nine Piotroski sub-criteria so you can see how the score was built, not just the total.
GET /api/financials/scores?symbol=AAPL
{
"fiscalYear": 2025,
"piotroskiFScore": 8,
"altmanZScore": 2.42,
"altmanZone": "grey"
}The signal it uncovers: a reproducible quality and distress factor, with an honest caveat baked in. Apple scores 8 of 9 on Piotroski, strong, but its Altman Z lands at 2.42, the grey zone, not the safe zone, because the model uses book value of equity as the equity input and a company that has bought back enormous amounts of stock carries thin book equity. That is not a bug; it is the formula behaving correctly on its declared inputs, and seeing the sub-criteria and the input convention is what lets you trust the factor in a backtest instead of treating it as a black-box number. For the broader fundamentals workflow, see fundamental stock analysis with the API.
Wiring these signals into a lookahead-free backtest
Eight datasets, one common property: each value is tied to the accession number and filing date of the document it was extracted from. That is what makes them usable as alternative data instead of curiosities. To build an honest backtest, key every signal to its filing date, not the period it describes. A fiscal-2025 segment number is not tradable on the last day of fiscal 2025; it is tradable when the 10-K hits EDGAR weeks later, and the dateFiled field on the segment response tells you exactly when. The same rule applies to a 13D (use the report date), a Form 4 (the filing date), and a pay-versus-performance table (the proxy filing date).
Cost matters for a small pod, so be clear about it: the free tier covers /api/filings (the filing index every signal is extracted from) and /api/financials/as-reported for raw traceability. The signals in this list sit on the Stock plan and the ETF plan, well below the price of a single conventional alt-data subscription. The point is not that SEC data is the only alternative data worth having; it is that for an emerging manager it is the highest signal per dollar available, and the only one that comes with a built-in audit trail. For how the same point-in-time discipline protects a fundamentals backtest, see the holy grail of stock backtesting.
This post describes data and tooling, not investment advice. Nothing here is a recommendation to buy or sell any security.
FAQ
What is alternative data from SEC filings?
Alternative data is any non-standard dataset that adds signal beyond price and headline fundamentals. SEC filings qualify when you extract the structured, less-mined disclosures most vendors skip: dimensional revenue and business segments, the pay-versus-performance table, the Schedule 13D and 13G ownership trail, Form 4 transaction codes, and Form N-PORT fund internals. The filings are public, but parsing them correctly at scale is the value, and StockFit returns each of these as clean JSON with the source accession number attached.
Is SEC EDGAR data free for quant research?
The raw filings on EDGAR are free to download. The cost is engineering: a 10-K is inconsistent HTML over a wall of XBRL tags, and N-PORT and DEF 14A are machine-hostile in raw form. StockFit's free tier exposes the filing index via /api/filings and raw as-reported financials, and the extracted alternative-data signals (segments, ownership, compensation, scores, fund flows) sit on low-cost paid plans, far below a conventional alt-data subscription.
What signals can you extract from SEC filings?
Segment-level revenue and operating income to find the real profit engine, geographic and product revenue momentum, Compensation Actually Paid versus reported pay as an alignment signal, the 13G-to-13D passive-to-activist switch, Form 4 open-market buying isolated by transaction code, N-PORT fund flows and portfolio overlap for crowding, and sector-aware Altman and Piotroski health scores. An AI economic model adds a verified, machine-readable business thesis on top.
What is Compensation Actually Paid (CAP)?
CAP is an SEC-mandated metric introduced in the 2022 pay-versus-performance rule. It re-marks an executive's equity awards to their actual value change during the year, rather than using the grant-date fair value reported in the Summary Compensation Table. Because CAP moves with the stock, it is a cleaner read on pay-for-performance alignment. The gap can be large: Apple reported 74.6 million dollars of CEO total compensation in fiscal 2024 but Compensation Actually Paid was 169.0 million.
What is the difference between a Schedule 13G and a 13D?
Both disclose beneficial ownership above 5% of a voting class, but the form encodes intent. A Schedule 13G is for passive holders with no control intent: index funds, ETFs, qualified institutions. A Schedule 13D is for investors who intend to influence or control the company: activists, bidders, proxy contests. In the StockFit API the filingType field classifies each filing, and the history endpoint preserves every filing so you can detect a holder switching from 13G to 13D.
How do you avoid lookahead bias with SEC filing data?
Key every signal to the filing date of the document it came from, not the fiscal period it describes. A fiscal-2025 number becomes tradable when the filing reaches EDGAR, often weeks after period end. StockFit attaches the accession number and filing date to every value, so you can reconstruct exactly what was knowable on any historical date and never leak a restated figure backward into a backtest.
Can a small quant fund afford alternative data?
Conventional alternative data (credit-card panels, satellite, Bloomberg) runs into five and six figures, out of reach for a small pod. SEC-derived signals are the highest signal-per-dollar alternative: the source is free and public, StockFit's extracted datasets sit on low-cost plans, and every value carries an audit trail the expensive panels do not. For an emerging manager it is the most defensible place to start.
Which StockFit endpoints expose these alternative data signals?
The economic model is /api/company/economic-model; segments are /api/financials/business-segmentation and /api/financials/revenue-segmentation; compensation is /api/executives/compensation and /api/executives/performance-measures; ownership is /api/ownership/beneficial-owners/history; insider data is /api/insider-transactions; fund signals are /api/fund/flows and /api/fund/overlap; and health scores are /api/financials/scores. The free /api/filings endpoint returns the underlying filing index.
Ready to build?
Free API key, no credit card. Every endpoint mentioned in this post is available on the free tier.
Get Your Free API Key