docs / why-attestd

Why Attestd

A technical walkthrough of the design decisions behind Attestd: why querying NVD directly is not sufficient, how the two-signal architecture works, and what the confidence score and determinism guarantee actually mean in practice.

01 / design philosophy

Sensor, not a gate

Attestd is read-only. It returns a structured signal. It does not modify pipelines, enforce policies, or carry an accuracy SLA. What you do with the signal is your decision. That is intentional.

Tools like Snyk, Grype, and Trivy are gates: they scan a target and output a pass/fail verdict with built-in policy enforcement. The gate model requires a scanner agent deployed per environment, per repo, and per language ecosystem. It produces output tuned for human review: long HTML reports, suppression lists, false-positive noise.

Attestd is a sensor. A single HTTP call returns a machine-readable risk state for a specific product and version. It is injectable into any existing pipeline: a CI step, an AI agent tool, or a pre-deployment script, without requiring a scanner binary or a policy configuration file. The caller owns the policy. Attestd owns only the data layer.

Property	Gate (Snyk / Grype / Trivy)	Sensor (Attestd)
Deployment model	Scanner binary per environment	Single API call
Output format	Report / diff / suppression list	Deterministic JSON
Policy enforcement	Built-in, config-driven	Caller-owned
Integration surface	CI plugin or sidecar	HTTP GET, Python SDK
AI agent compatibility	Not designed for tool loops	Direct tool use
Scope	Full dependency tree scan	Single product x version

The two approaches are not mutually exclusive. Grype scans a lockfile; Attestd answers whether a specific version is safe to deploy right now, in a structured form that an LLM or CI script can act on without parsing.

02 / data quality

Why NVD alone is insufficient

NVD is the correct data source for CVE information. Querying it directly, without preprocessing, produces unreliable version-level assessments for three independent reasons.

Sentinel records (missing version ranges)

A large fraction of NVD entries name a product as affected but omit the version range fields (versionStartIncluding, versionEndIncluding). These are sentinel records: they confirm a CVE exists but provide no information about which versions are affected. If you query NVD for nginx and include sentinel records in your count, you cannot determine whether your specific version is affected. You only know that some version of nginx was affected at some point.

Attestd measures the sentinel rate for every product before adding it to coverage. Products where more than 50% of CVE records are sentinels are excluded from the supported set because the data is too sparse to produce reliable version-level output. The threshold and reasoning are documented on the products page.

CPE namespace fragmentation

NVD organises vulnerability data under CPE (Common Platform Enumeration) vendor namespaces. When a vendor is acquired or renames itself, NVD does not merge the old and new namespaces. It maintains both independently. Querying only the current namespace silently omits all CVEs published under the previous one.

nginx (F5 acquisition, 2019)
nginx:nginx + f5:nginx

MySQL (Oracle acquisition, 2010)
mysql:mysql + oracle:mysql

Redis (vendor rename, 2021)
redislabs:redis + redis:redis

Log4j (pre/post Apache formalisation)
apache:log4j + apache:log4j2

Attestd queries both namespaces for affected products and deduplicates on CVE ID before synthesis. Failing to do this produces a coverage gap for any vulnerability published under the legacy namespace.

NIST enrichment degradation (April 2026)

NIST publicly confirmed in April 2026 that NVD enrichment processing fell critically behind schedule. The enrichment phase is what adds CVSS scores, CWE classification, and CPE applicability statements to newly published CVEs. Thousands of CVEs entered the NVD API without CVSS base scores, without CWE identifiers, and without any CPE data. A direct NVD API consumer receives these incomplete records with no in-band signal that enrichment is missing.

Attestd's ingestion pipeline validates enrichment completeness on every record. CVEs without a CVSS score or without at least one CPE applicability statement are held back from synthesis until the record is enriched. This means Attestd's coverage may lag NVD publication by hours or days for recently published CVEs, but every record that appears in a response has passed the enrichment threshold.

03 / architecture

Two-signal architecture

A single GET /v1/check call returns two orthogonal security signals. They are independent: neither implies nor contradicts the other.

Signal 1: risk_state

CVE-derived severity for the queried version. Source: NVD + CISA KEV. Possible values: critical high elevated low none. Present for all supported products. For monitored PyPI packages where no version-range CVEs match, risk_state will be "none". That result does not mean the package is safe.

Signal 2: supply_chain.compromised

Malicious publish detection for PyPI packages. Source: OSV MAL- advisories and PyPI registry security yanks. Present only when the queried product is a monitored PyPI package. This signal is independent of CVE history. A package can be supply-chain compromised without any CVE being filed, and a package can have CVEs without any supply-chain compromise.

risk_state: "none" and supply_chain.compromised: true can co-exist. A version with zero CVEs is still dangerous if it contains a known backdoor. Both signals must be checked independently. Blocking only on risk_state misses supply chain attacks entirely.

The following is a real API response for litellm 1.82.7: a version with no CVE risk but a confirmed supply chain compromise.

json

{
  "product": "litellm",
  "version": "1.82.7",
  "supported": true,
  "risk_state": "none",
  "risk_factors": [],
  "actively_exploited": false,
  "remote_exploitable": false,
  "authentication_required": false,
  "patch_available": false,
  "fixed_version": null,
  "confidence": 1.0,
  "cve_ids": [],
  "last_updated": "2026-04-27T16:07:47.644177Z",
  "supply_chain": {
    "compromised": true,
    "sources": ["osv", "registry"],
    "malware_type": "backdoor",
    "description": "TeamPCP supply chain attack: credential stealer in proxy_server.py",
    "advisory_url": "https://docs.litellm.ai/blog/security-update-march-2026",
    "compromised_at": "2026-03-24T10:39:00Z",
    "removed_at": "2026-03-24T16:00:00Z"
  }
}

The TeamPCP attack injected a credential stealer into proxy_server.py. The version was yanked from PyPI within hours but remained installable from cached mirrors for days. An agent checking only risk_state would have approved this version as safe.

Full list of monitored packages and OSV source details: Supply chain integrity.

04 / data quality signal

Confidence score semantics

The confidence field is a float between 0 and 1. It does not indicate the probability that the product is vulnerable. risk_state already reflects that. Confidence indicates how reliable the supporting detail fields are: remote_exploitable, authentication_required, risk_factors.

Range	Source	Meaning
`>= 0.85`	LLM-extracted	Detail fields extracted from CVE prose and cross-checked against CVSS vectors. High corroboration between text and metadata.
`0.5 to 0.84`	DB-derived fallback	risk_state is correct (worst-case from CVSS). Detail fields were inferred from CVSS AV/AC/PR vectors rather than extracted from CVE text. LLM extraction was unavailable or returned invalid output.
`< 0.5`	Thin data	Fewer usable version ranges than the reliability threshold. Risk state is conservative but the dataset is sparse. Treat with extra caution.

The key guarantee: risk_state is always the worst-case from available data, regardless of confidence tier. A low confidence score means the supporting detail fields are less reliable. It does not mean the risk classification is optimistic. When in doubt, act on risk_state directly.

When multiple CVE version ranges match the queried version, the minimum confidence across all matching rows is returned. Full aggregation semantics are documented in the Response Field Reference.

05 / query-time behaviour

Determinism guarantee

The same product@version pair always returns the same risk_state within the same ingestion snapshot. There is no probabilistic element in API responses.

Attestd uses LLM extraction internally, but the LLM runs at ingestion time, not at query time. When NVD data is ingested, CVE records are processed, cross-checked, and stored as structured synthesis results. A query against the API is a lookup against those pre-computed results. There is no generative step at query time.

This matters for two use cases in particular:

CI gates

A deployment pipeline that calls Attestd twice for the same version will receive the same risk_state both times. The gate does not flap.

AI agent tool loops

An LLM agent that calls the check tool multiple times for the same dependency, due to reasoning loops or retries, will not see inconsistent security verdicts that cause confusion in the agent's decision-making.

The response will change when a new ingestion run completes, for example when a new CVE is published for the queried product or when CISA adds a CVE to the KEV catalog. The last_updated field in the response shows the timestamp of the most recent synthesis run for that product. The X-Attestd-Knowledge-Age response header shows elapsed time since then. NVD and CISA data is re-ingested every 6 hours.

Attestd vs. NVD Response Field Reference Supply chain integrity Supported products and eligibility criteria FAQ