How NutriMatch Calculates Food Triggers

Last updated: 26 April 2026

Why this page exists. Every gram of lactose, milligram of histamine, and trigger estimate NutriMatch shows you is an estimate based on average data — drawn from public food-composition databases, peer-reviewed research, and EU/US regulatory thresholds. We publish our methodology in full so you can verify (or challenge) any number we display.

Our approach

NutriMatch is a lifestyle app, not a substitute for a medical diagnosis. We don't measure the specific batch of yoghurt sitting on your shelf; nobody can do that without a lab. What we can do is tell you, transparently, where each number came from.

This page exists because we think you deserve to know that. Most apps in the food-intolerance category quote numbers without saying where they came from. Some make those numbers up entirely. We've made every choice traceable: every constant in our codebase has an inline comment citing its source, our test suite asserts every fixture against a published reference range, and the analysis you're reading right now lists the 35+ studies and regulations we anchor our calculations to.

We recognise the limits of this approach. Crowd-sourced product data (Open Food Facts) varies in quality. Heuristic estimators built on average composition data can be wrong by 2–3× for unusual products. AI-assisted ingredient parsing is faster than humans but makes mistakes. We document all of this. We test for the most common failure modes and we ship regression tests on every code change so an estimate that worked yesterday still works tomorrow. If you suspect a food intolerance, consult a doctor or registered nutritionist — NutriMatch helps you spot patterns; the people who help you act on those patterns are still humans.

How a scan works

When you scan a barcode or photograph a meal, four things happen:

  1. Product lookup. The barcode is sent to Open Food Facts, a community-maintained database of 3+ million products covering most EU and US groceries. We pull the ingredient list, allergen tags, and per-100 g nutrient values.
  2. Keyword detection. A lookup table maps each tracked intolerance to its known ingredient terms — in English, German, and French (milk, Vollmilch, lait demi-écrémé all match for lactose).
  3. Quantification. For each detected intolerance, an estimator function reads the nutrient declaration and ingredient position to produce a per-100 g estimate. Heuristics are documented inline, with citations.
  4. Refinement. A separate "resolver" stage takes the estimator's value, looks up a category anchor (e.g. "whole milk → 4.8 g lactose / 100 g") from a clinical reference table, and either trusts the estimator (if it falls within a plausible range), snaps to the anchor (if the estimator is far off), or clamps to a soft ceiling (if the estimator is implausibly high).

The numbers you see on the scan result are the resolver's output, with inline ranges (e.g. "~8 g lactose · typical 6–10 g") so you can see the inherent variance.

Trigger-by-trigger sources

Lactose

Reference values:

Why these values: Pure dairy products' declared sugar IS the lactose — milk has essentially no other sugars. As products move toward sweetened or processed forms, added sucrose dominates the sugar declaration, so we apply category-specific multipliers (1.0 for pure dairy, 0.9 for cheese with cultures, 0.2 for sweetened condensed milk, 0.15 for milk chocolate). For aged cheeses, lactic-acid fermentation converts most lactose to lactic acid, which is why aged cheddar is functionally lactose-free even though milk is its first ingredient.

Limitations: Concentrated dairy products (skim milk powder up to 52 g lactose / 100 g; whey protein concentrate up to 70 g) can fall outside our category bands when the product name doesn't trigger a specific anchor. Our resolver caps at 60 g / 100 g as a sanity ceiling, which still flags any sensitivity correctly even if the displayed number undershoots reality.

Histamine

Reference values (mg per 100 g or 100 ml):

Why these values: Histamine accumulates during fermentation and ripening. The Sánchez-Pérez 2018 European fermented-food survey is the gold-standard reference for commercial product distributions. Wine values follow Konakovsky 2011's HPLC-validated survey of European reds and whites. The midpoints we display are arithmetic means within typical commercial ranges — long-aged or post-MLF wines can legitimately reach the upper end.

Limitations: Histamine is heavily influenced by storage time, cold-chain integrity, and individual batch variability. A "fresh" tuna held 24 hours warm can develop ten times the histamine of a properly handled one. We can't see those conditions from a barcode, so the values we show are pre-distribution typical ranges.

Gluten

Reference values:

Why these values: Bread flour is approximately 10 % gluten by weight (Wieser 2007). Products dilute that fraction depending on flour content — pasta is mostly flour (so close to 11 g), pizza adds toppings (so closer to 4 g), beer is fermented and filtered (so 50× lower than its grain origin). Our category anchors cover the most common product types directly; for unusual products we fall back to a coarse carbs-based ladder that the resolver clamps to plausible ranges.

Limitations: Sourdough fermentation can reduce gluten content by up to 90 % compared to commercial bread (Gobbetti 2014, Annu Rev Food Sci Technol 5:69), but our heuristic doesn't detect "long fermented" labels. For users with coeliac disease, always look for the certified gluten-free mark rather than relying on our estimate.

A1 β-casein

Reference values:

Why these values: Casein composition follows a well-defined chain: total casein ≈ 80 % of milk protein; β-casein ≈ 35 % of casein; A1 ≈ 45 % of β-casein in conventional Holstein-Friesian herds (Kamiński 2007; Caroli 2009, J Dairy Sci 92:5335). Net factor ≈ 0.13 of total milk protein. For 3.4 g protein per 100 ml whole milk, that's ~0.45 g A1 β-casein.

Limitations: A1:A2 ratios vary by herd genetics. Jersey cattle have a different distribution than Holstein-Friesian, and intentionally A2-bred herds have zero A1. We default to the conventional Holstein-Friesian average because that's what the European and US milk supply is dominated by; products labelled "A2 milk" / "A2 protein" are correctly flagged as zero.

Fructans (FODMAP "F")

Reference values:

Why these values: Fructan concentration varies by 60× across common dietary sources, so a single per-product fallback is impossible — we look up by ingredient. The Monash University FODMAP database is the gold-standard reference for IBS dietary management; most clinical FODMAP elimination protocols cite it directly.

Limitations: Cooking degrades fructans by up to 40 % (Muir 2009, J Agric Food Chem 57:554), which our heuristic doesn't detect. Sourdough fermentation reduces wheat fructans by up to 90 %. The values we display lean toward the raw / unprocessed end of the range, which is the safer direction for IBS users (false-positives over false-negatives).

Fructose

Reference values:

Why these values: Free fructose is what causes symptoms in fructose malabsorption — sucrose (table sugar) splits 1:1 into glucose + fructose during digestion (Skoog 2008, Am J Gastroenterol 103:2354) and is well tolerated. We only flag fructose when ingredients explicitly list free-fructose sources (HFCS, agave, honey, fruit juice) or when the product is a recognised fruit. The 0.55 ratio for "free fructose declared" matches HFCS-55's by-definition composition (White 2008); the 0.45 ratio for fruit reflects USDA FDC's median across pomes, stone fruits, berries, and melons.

Limitations: Fructose:glucose ratio matters more than absolute fructose for malabsorption symptoms. We display this ratio when both can be estimated — values > 1.0 (fructose exceeds glucose) flag the canonical "fructose malabsorption risk" condition (Born 2007).

Sorbitol

Reference values:

Why these values: Stone fruits and pome fruits naturally contain sorbitol — Wrolstad 1981's HPLC survey is the foundational reference. For products with sorbitol explicitly listed but no quantity declared, we default to 5 g / 100 g, the median Awad 1991 measured directly across 27 commercial diet jams. This is intentionally conservative: the Monash FODMAP symptom threshold is 0.4 g per serving, so even at our 5 g / 100 g estimate, a 30 g serving (1.5 g consumed) flags AVOID — we'd have to underestimate by 4× before classification flipped to a false-negative.

Polyols (FODMAP "P" — total)

Reference values:

Why these values: Sentko 2012's category survey is the most comprehensive published reference for polyol use levels across food categories. When a product declares polyols but doesn't disclose the quantity, we use 25 g / 100 g for products below the EU "low sugars" threshold (≤ 5 g sugar / 100 g per Regulation 1924/2006) and 8 g / 100 g for products above it. The 25 g number is the geometric midpoint of the "truly sugar-free" mix weighted toward gum/candy as the dominant scan target.

Caffeine

Reference values (mg per 100 ml):

Why these values: Caffeine content is one of the better-documented food-composition values because it's regulatory-relevant. We prefer label-declared values when Open Food Facts surfaces them, falling back to category midpoints. EFSA's 2015 caffeine safety review is the canonical European reference; USDA FoodData Central provides matching US-side numbers.

Tyramine

Reference values (mg per 100 g):

Why these values: Tyramine matters for users on MAO inhibitors and for some migraine patients. McCabe-Sellers 2006 is the most-cited compositional survey; the FDA dietary guidance for MAOI patients references it directly.

Sulfites

Reference values (mg per 100 g or 100 ml):

Why these values: Sulfite content is regulatory-capped, so commercial products cluster near the legal upper bounds. We use typical commercial midpoints, with the resolver clamping to the regulatory maximum as the ceiling.

Composite and processed foods — how we estimate

Composite foods (a Big Mac, a Margherita pizza, a frozen lasagne, a Snickers bar) are harder to quantify than whole foods. The challenge: a public food database tells us "Big Mac contains wheat flour, american cheese, beef, sauce, pickles…" but it does not tell us the exact weight of each component. Without component weights, a single per-100 g number has to come from heuristics.

Our approach combines four layers, each independently safe:

  1. Local estimator — runs in the first ~50 ms of a scan, on-device. For each trigger, a small function reads the food's nutriment values and ingredient text and emits a per-100 g estimate. The lactose and casein estimators detect dairy markers across English / German / French (milk, cheese, butter, mozzarella, vollmilch, käse, lait, fromage and others) and route through one of six branches based on whether dairy is the lead ingredient, an early ingredient, mid-list, or a trace mention.
  2. Category anchor + AI cross-check — runs after ~2–5 s, server-side. The AI assistant returns a per-100 g estimate for each detected trigger. A resolver then matches the food against ~30 published category anchors per trigger (whole milk, sweetened condensed milk, milk chocolate, dry pasta, sourdough bread, beer, soy sauce, etc.) and either trusts the AI value if it falls in the literature plausibility window, snaps to the anchor if the AI is far off, or clamps to a soft ceiling if the AI looks implausibly high.
  3. Known-product safety floor — for the most-scanned branded products (Big Mac, Whopper, Quarter Pounder, McNuggets, Subway, pizza slice, Coca-Cola, Red Bull, espresso), we publish a recalibrated per-100 g floor based on the chain's nutrition disclosure + USDA FoodData Central + per-component arithmetic (e.g. "Big Mac bun set ~85 g × 6 % gluten = 5.1 g per sandwich = 2.4 g/100 g"). The floor is a minimum — the AI can refine upward if it finds reason to, but the displayed value will never drop below the floor for these products.
  4. The error-direction rule — the most important rule. For severe-allergy / safety-critical triggers (gluten in coeliac, true allergens like peanut / tree-nut / fish / shellfish, sulfites for asthmatics), our estimators, AI prompts, and safety floors all err on the high side. Underestimating these triggers can cause real harm; overestimating only inconveniences the user. For dose-dependent FODMAPs (lactose, fructose, polyols), we aim for the mid-range — the Monash FODMAP serving thresholds give a built-in safety buffer that absorbs ~30 % under-counting.

Acceptable-error bands per trigger (the safety contract)

These bands are the operational definition of "correct" that our regression test suite enforces. Every test fixture asserts a literature-grounded range; the bands here are how wide a given range can be. The test suite blocks every code change via continuous integration — a change that takes any fixture's displayed value below its safety threshold blocks the merge.

Trigger category Triggers Underestimate tolerance Overestimate tolerance
Coeliac / severe allergy Gluten (coeliac), Peanut, Tree-nut, Fish, Shellfish, Sesame, Mustard, Celery, Lupin, Egg, Soy, Milk-protein (casein/whey for allergic users), Alpha-gal, Sulfites 0 % — never under up to
Dose-dependent FODMAP Lactose (intolerance), Fructose, Fructans, Galactans, Sorbitol, Mannitol, Polyols up to 30 % under up to
Biogenic amines / additives Histamine, Tyramine, Caffeine, Nitrates, Benzoates, MSG up to 20 % under up to
Functional / quality-of-life Salicylates, Oxalates, Tannins up to 30 % under up to
Genetic / lineage-dependent A1 β-casein up to 50 % under up to
True allergens (presence-only) Peanut, Tree-nut, Fish, Shellfish, Sesame, Mustard, Celery, Lupin, Egg, Soy, Alpha-gal PRESENCE only — dose immaterial; if any keyword/anchor fires, classification = AVOID regardless of number

Operational definition: for every test fixture, the displayed per-100 g value must satisfy displayed ≥ literature_low × (1 − under_tolerance) AND displayed ≤ literature_high × (1 + over_tolerance). For true allergens, we assert the classification ("AVOID") rather than a numeric range.

What we do not claim to do

Citation index

The full primary-source list (alphabetical by first author):

NutriMatch is a lifestyle app, not a substitute for a medical diagnosis. Values are estimates based on average data. If you suspect food intolerances, consult a doctor or nutritionist.