All Posts
AnalysisPoint-in-Time DataBacktestingLookahead Bias

Point-in-Time Fundamental Data for Backtesting (and Why Restated Numbers Wreck It)

How lookahead bias from restated financials, survivorship bias, and reporting lag corrupt backtests, and how point-in-time SEC fundamentals fix them.

Published June 16, 202611 min readStockFit Engineering
Point-in-Time Fundamental Data for Backtesting (and Why Restated Numbers Wreck It)

A backtest is a claim about the past: if I had run this strategy with the information available at the time, here is what would have happened. The entire claim rests on those four words, with the information available at the time. The moment a single input leaks a fact that nobody could have known on the simulated date, the backtest stops measuring a strategy and starts measuring your data pipeline's bugs.

Fundamental data is where this leaks most quietly. Prices are easy: the close on a given day is the close on that day, forever. A company's financial statements are not. They are filed weeks after the period they describe, they get restated months or years later, and the companies that go bankrupt vanish from most vendors' universes entirely. Point-in-time fundamental data is the discipline of pinning every value to the date it actually became public, so a backtest can only ever see what a trader could have seen. This post shows, with live SEC data, the three ways non-point-in-time fundamentals corrupt a backtest, and what the fix looks like.

Why point-in-time fundamental data is non-negotiable for backtesting

Most fundamentals APIs return one number per line item per period: the latest, best, most-corrected value they have. That is exactly the right answer for a screener showing you a company today, and exactly the wrong answer for a simulation set in 2019. A backtest needs the value as it stood on the test date, not the value as it stands now.

Three distinct problems break that, and they are easy to confuse:

  • Restatements (lookahead bias). The number you have today is not the number that was filed. A company reported one figure, then revised it. Using the revised figure before the revision was public is lookahead bias in its purest form.
  • Reporting lag. Even the first, original number did not exist on the last day of the fiscal period. It arrived weeks later when the 10-K was filed. Joining the annual figure to the period-end date hands your strategy a report that had not been written yet.
  • Survivorship bias. If your universe only contains companies that still trade, every test you run is conditioned on survival. The names that went to zero are the ones a strategy most needs to see.

None of these are exotic edge cases. They show up on Apple, on JPMorgan, on every name in the S&P 500. Let us walk each one with real filings.

Lookahead bias: when restated financials leak the future

Companies restate. Sometimes it is a small reclassification, sometimes it is a full earnings restatement after a material weakness. Either way, the value that lands in most databases is the corrected one, and the correction is stamped with today, not with the date it was actually filed.

Plug Power (PLUG) is a clean, well-documented example. In its original fiscal-2018 Form 10-K, filed in March 2019, the company reported a net loss to common stockholders of about $78.1 million, or a diluted loss of $0.36 per share. In 2021, as part of a broader restatement, those figures were revised: the same fiscal-2018 period became a net loss of about $85.7 million, a diluted loss of $0.39 per share. That revised number did not become public until the filing dated May 14, 2021, roughly two years after the original 10-K.

Plug Power FY2018 net loss: the number changed two years after the fact
Net loss to common stockholders, as originally filed (Mar 2019) vs as restated (filed May 2021). Diluted EPS moved from -$0.36 to -$0.39. Source: StockFit /api/financials/income-statement audit trail, as-of 2026-06-16.

Now picture a value strategy backtested over 2019 and 2020. On any simulated date in that window, the only fiscal-2018 loss that existed was $0.36 per share. A dataset that silently serves today's restated $0.39 hands your 2019 simulation a number that would not be filed for another two years. The strategy looks like it knew something. It did not. That is lookahead bias, and it is the worst kind, because unlike most biases it does not just add noise, it makes the results wrong in a direction you cannot detect from inside the backtest.

The fix is not to throw away the restatement. The fix is to keep both values, each tagged with the filing that carried it, so the backtest can ask: what was the value as of this date? StockFit does this on every fundamentals response. The /api/financials/income-statement endpoint returns the resolved value plus a sources map keyed by SEC accession number, and when a later filing changed a fact, that filing's entry carries a before delta holding the prior value. Here is the trimmed, real response for Plug Power's fiscal-2018 net loss and EPS:

json
{
  "period": "2018-12-31",
  "fiscalPeriod": "FY",
  "dateFiled": "2019-03-13",
  "facts": {
    "netIncomeCommonStockholders": -85660000,
    "epsDiluted": -0.39
  },
  "sources": {
    "0001558370-20-002267": {
      "type": "10-K", "dateFiled": "2020-03-10", "amendment": false,
      "facts": { "netIncomeCommonStockholders": {}, "epsDiluted": {} }
    },
    "0001558370-21-007147": {
      "type": "10-K", "dateFiled": "2021-05-14", "amendment": false,
      "facts": {
        "netIncomeCommonStockholders": { "before": -78115000 },
        "epsDiluted": { "before": -0.36 }
      }
    }
  }
}

Read it from the bottom up. The filing dated 2021-05-14 changed both facts, and the beforevalues record what they were beforehand: a $78.1 million loss and a $0.36 diluted EPS, the originally-reported numbers. To reconstruct what was knowable on any past date, you walk the sources by their filing dates and roll back any before that postdates your test date. The full algorithm, and the rest of the audit-trail data model, is the subject of our deep dive on the point-in-time fundamentals API with a full amendment audit trail. If you want the raw, unmapped fact tree instead of the curated statement, the same provenance is on /api/financials/as-reported. You can confirm both accession numbers yourself on SEC EDGAR.

The reporting lag every backtest has to respect

Restatements are the dramatic failure. The mundane one is more common and just as damaging: the original number did not exist on the period-end date either. A fiscal year ends, and the 10-K describing it is filed weeks later. If your join key is the period-end date, you have given the strategy a document that had not been written.

This is not a small-cap problem. Here are nine of the largest, fastest-filing companies in the US market, with the gap between each one's fiscal year-end and the day its 10-K actually hit SEC EDGAR. Every value is live from the API.

Even the fastest mega-cap filers leave a 25 to 49 day blackout
Days between fiscal year-end and the 10-K hitting SEC EDGAR. Source: StockFit /api/financials/income-statement, as-of 2026-06-16.

The gap ranges from 25 days (NVIDIA) to 49 days (Exxon Mobil), averaging about 38 days. And these are the disciplined filers. A large accelerated filer has up to 60 days after fiscal year-end to file a 10-K and 40 days after a quarter-end for a 10-Q, per the SEC's reporting deadlines; smaller companies get longer. The practical rule is simple: a fundamental fact is unusable in a simulation until its filing date, full stop.

That is why every period StockFit returns carries a top-level dateFiled, always the original filing date of the 10-K, 10-Q, 20-F, or 40-F, never an amendment date. The lag is not metadata you have to go find; it is the first thing the response tells you. Pair it with the restatement trail above and you have both halves of the point-in-time problem: when did the original number arrive, and when did each correction.

Survivorship bias: the dead companies your universe drops

The third corruptor is the universe itself. If you build a backtest on the current S&P 500, or on whatever tickers your vendor still serves, you have quietly conditioned the whole study on survival. The companies that cratered, got acquired in distress, or filed for bankruptcy are simply gone, and they are exactly the observations a strategy most needs in order to be honest about downside.

2,190
fully-delisted companies still served by the API: names with no active listing on any exchange, including OTC. Source: /api/company/delisted, as-of 2026-06-16.

Take WeWork. It came public through the BowX SPAC, traded as WE on the NYSE, and went dark in 2023 after filing for bankruptcy. In a survivorship-clean universe it never existed, so a strategy that would have bought it in 2021 and ridden it to zero shows no loss at all. The API still resolves it: the entity is flagged delisted with its last venue and date preserved, retrievable from /api/company/delisted, so you can build a universe that includes the casualties rather than pretending they never listed. For the mechanics of how a delisting is detected from the filing record, our guide to SEC forms covers the Form 25-NSE removal notice we treat as authoritative.

Survivorship is also why honest event studies state their universe up front. When we tested whether insider cluster buys beat the S&P 500, the universe was a current-constituents snapshot, and we said so, because pretending otherwise would have inflated the result.

Normalized fundamentals: fiscal years, Q4, and cross-company comparability

Point-in-time correctness gets you data that is honest about time. Normalization gets you data that is comparable across companies. Raw SEC XBRL is neither out of the box, and a backtest that skips this step inherits a different set of silent errors.

Three traps stand out. First, concept proliferation: the same economic line item is tagged with thousands of overlapping us-gaap concepts plus each issuer's own extension tags, so a naive join misses values it should have matched. Second, fiscal-calendar drift: Apple's year ends in late September, Microsoft's in June, Costco's in late August, and lining those up by label rather than by resolved period date silently compares mismatched windows. Third, the missing fourth quarter: companies do not file a standalone Q4. It has to be reconstructed from the full year minus the nine-month figure, on the correct canonical field, or your quarterly series is wrong by construction.

StockFit serves both layers so you can pick per use case. The /api/financials/as-reported endpoint returns the raw fact tree exactly as filed, including the company-specific extension concepts, for audit and validation. The curated statements like /api/financials/income-statement map every issuer onto a stable canonical schema, classify each fact into the right fiscal period, and reconstruct Q4 at read time, so the same query shape works identically across thousands of names. The full walkthrough of turning messy XBRL into a clean backtest-ready table is in our post on converting XBRL financials to JSON for backtesting. Crucially, normalization and point-in-time are not a tradeoff here: the curated statements carry the same dateFiled and before audit trail as the raw facts, so you get comparability without giving up the timeline.

Building a reproducible point-in-time snapshot

Put the three pieces together and the recipe for a bias-free fundamentals snapshot is short:

  1. Never use a fact before its original dateFiled. That eliminates the reporting-lag leak. The annual number is invisible to your simulation until the day the 10-K was filed.
  2. Never use a restated value before the restatement's filing date. Walk the sources map; if a before delta was filed after your test date, roll the value back to it. That eliminates lookahead bias from restatements.
  3. Include the dead names. Build the universe from active and delisted companies, using /api/company/delisted so survivors and casualties are both in the sample.

Because the filing dates and the before deltas ship inside the same response, the unwind logic is a few lines of code rather than a research project, and it is fully reproducible: the same query against the same audit trail yields the same point-in-time snapshot every time. That reproducibility is the whole point. A backtest you cannot reproduce is an anecdote.

All of this is the foundation under StockFit's fundamental data API: standardized statements parsed directly from SEC EDGAR, with the original filing date and the full amendment trail attached to every value, on the free tier. The implementation details of the audit trail, with copy-pasteable unwind code, live in the point-in-time fundamentals deep dive.

FAQ

What is point-in-time data in backtesting?

Point-in-time (PIT) data is financial data tagged with the date it actually became public, not just the date it describes. A fiscal-2018 income statement is point-in-time correct only if the response also tells you the original 10-K filing date and flags any later restatement with its own filing date. PIT data lets a backtest see only what a trader could have seen on the simulated date, which is the requirement for the result to mean anything.

How does lookahead bias from restated financials affect a backtest?

When a company restates a figure, most databases overwrite the original with the corrected value and stamp it as current. A backtest set before the restatement's filing date then uses a number that did not exist yet. Plug Power's fiscal-2018 diluted loss was originally $0.36 per share and was restated to $0.39 in a filing dated May 14, 2021. Any 2019 or 2020 simulation that uses $0.39 has leaked two years into the future. The fix is to keep both values, each with its filing date, and roll back to the original for dates before the restatement.

What is survivorship bias in financial data?

Survivorship bias is the error of building a study only on companies that still exist. If your backtest universe is today's tickers, every bankrupt, acquired, or delisted name has been removed, so the test never sees the worst outcomes and overstates returns. Including delisted companies, like WeWork after its 2023 bankruptcy, is what keeps a universe honest. StockFit retains delisted entities and exposes them on the /api/company/delisted endpoint.

Why do I need timestamped financial statement data for backtesting?

Because a financial statement describes a period that already ended but does not become public until the filing is submitted, often 25 to 60 days later for annual reports. Without the filing timestamp, you cannot know when the number was actually available, and any join to prices on the period-end date introduces lookahead bias. Timestamped data attaches the original filing date to every value so the simulation respects the real availability of each fact.

Does StockFit provide as-originally-reported (unrestated) financials?

Yes. The resolved value is the latest, but every restated fact carries a before delta in the sources map holding the value the prior filing reported, each tagged with its SEC accession number and filing date. Walking that chain lets you recover the figure as originally filed for any past date. The raw fact tree is on /api/financials/as-reported and the curated statements carry the same audit trail.

What is the difference between as-reported and normalized fundamentals?

As-reported preserves the original XBRL concept names and values exactly as the issuer filed them, including company-specific extension tags, which is ideal for audit. Normalized maps those issuer-specific concepts onto a stable canonical schema (revenue, netIncome, epsDiluted, and so on), classifies each fact into the correct fiscal period, and reconstructs the missing fourth quarter, so queries work identically across thousands of companies. StockFit returns both, and both carry the point-in-time filing dates and amendment trail.

Ready to build?

Free API key, no credit card. Every endpoint mentioned in this post is available on the free tier.

Get Your Free API Key