# Rulis's number, reproduced — and Frawley's table cross-referenced to the CPDB

*Built 2026-06-14. Scripts: `cpdb_rulis_reproduction.py`, `frawley_cpdb_crossref.py` (run with the project venv
`.venv/`; deps in `../requirements.txt`). Data: `../papers/cpdb/CPDBChemical.xls` (NLM Carcinogenic Potency
Database, sheet "Rats and Mice"); `../papers/f1967.txt` (Frawley 1967 appendix); Gold et al. 1984 paper at
`../papers/lateral/Gold-etal_1984_Carcinogenic-Potency-Database_EHP58_9-319.pdf`.*

## 1. Reproducing Rulis's threshold

Method (Rulis 1987 / Flamm-Rulis ch. 8): for each carcinogen, low-dose risk = potency × dose, potency = slope
from the TD50 to zero = 0.5/TD50 (per mg/kg/day). Diet→dose by the FDA convention (60 kg person, 3 kg food+water
/day): **1 ppb dietary = 5×10⁻⁵ mg/kg/day** (so 1 ppt ≈ 5×10⁻⁸ mg/kg/day — in Flamm-Rulis's stated 10⁻⁷–10⁻⁸
range). Risk line = **10⁻⁶ lifetime** (the methylene-chloride-decaf de minimis Rulis borrowed).

Input: **785 positive carcinogens** in the CPDB "Rats and Mice" sheet (most-potent of rat/mouse TD50);
**median TD50 = 19.3 mg/kg/day** (geomean 16.5).

Result — fraction of carcinogens whose lifetime risk would exceed 10⁻⁶ at a given dietary level:

| dietary level | % of carcinogens exceeding 10⁻⁶ | Rulis 1987 said |
|---|---|---|
| 5 ppb | 72.7% | "about 60%" |
| **1 ppb** | **52.7%** | "about half … on either side of the 1 ppb line" |
| 0.5 ppb (1995 ToR) | 45.4% | — |
| 0.05 ppb (= 50 ppt diet, his worked example) | 20.3% (i.e. ~80% excluded) | "exclude about 85%" |

And his bookkeeping: at the 50 ppt example, assuming 1 unknown migrant in 5 is a carcinogen, 0.2 × 20.3% ≈ 4% of
threshold-of-regulation decisions would exceed 10⁻⁶ — i.e. **~96 of 100 ≤ 10⁻⁶**, reproducing his "95 out of
every 100." At the 1995 rule's 0.5 ppb the same bookkeeping gives ~91 of 100.

**The reproduction holds:** off the raw potency distribution, 1 ppb splits the carcinogens ~50/50 at 10⁻⁶ and 50
ppt clears ~95 of 100 — Rulis's two anchor figures. (Caveats: this NLM CPDB is the full ~2000s version, not his
exact 1984 subset of 343; the diet→dose factor and the 0.5/TD50 slope are order-of-magnitude conventions. The
*shape and the resulting level* are what reproduce.)

## 2. Frawley's "safe" compounds, looked up in the CPDB by CAS

Eleven of thirteen named in the essay are carcinogens on record in the CPDB, with a TD50 (mg/kg/day):

| Frawley compound | his "no-effect" ppm | CPDB TD50 | Salmonella |
|---|---|---|---|
| Vinyl chloride (his *safest* entry) | 120,000 | **6.11** | + |
| Acrylamide (his named exception) | 40 | **3.75** | – |
| DEHP (plasticizer — his packaging case) | 1,300 | 700 | – |
| BHA (antioxidant) | 5,000 | 405 | – |
| Catechol (he flagged "T = tumours") | 1,250 | 71.5 | – |
| Hydroquinone (he flagged "T = tumours") | 10,000 | 82.8 | – |
| Thiourea | — | 98.5 | – |
| Sodium cyclamate | 10,000 | 667 | . |
| DDT (his "toxic" benchmark) | 1 | 12.8 | – |
| Dieldrin (benchmark) | 0.5 | 0.912 | – |
| Aldrin (benchmark) | <0.5 | 1.27 | – |
| Citrus Red No. 2 | 500 | not in this CPDB version | — |
| Ponceau 3R | 5,000 | not in this CPDB version | — |

The point lands: the compound Frawley rated *most* inert (vinyl chloride, 120,000 ppm) is among the *more potent*
carcinogens in the database that succeeded his (TD50 6.11; acrylamide is more potent still at 3.75). Only vinyl
chloride is Salmonella-positive (genotoxic, no-threshold); the rest are non-genotoxic carcinogens — i.e. "more
dangerous than he logged," but for VC + acrylamide the "no-effect level" is the wrong *kind* of number.

Automated whole-220 name matching is unreliable (the appendix is two-column OCR; only "toxaphene" parsed to a
clean CPDB hit). The CAS lookup above is the verified cross-reference. CSV: `frawley_cpdb_keycompounds.csv`.

## 3. Restricting to the 1984 database (Rulis's actual source)

Rulis drew his curve from the **original 1984** Gold database, not the combined NLM file we hold. His own words
(Rulis 1992, ACS Symposium Series 484, ch. 14, p. 135): two curves, "one corresponding to **343 carcinogens**
selected from the original data base compiled by Gold et al. … and the other … **477 carcinogens** chosen from an
updated Gold et al. database." So his published threshold rests on **343** carcinogens (1987 paper) / 477 (the
updated curve the 1995 rule's preamble describes).

`cpdb_rulis_1984.py` rebuilds the 1984 roster from the Gold 1984 paper's own **Appendix 1** (chemical names +
synonyms) and **Appendix 2** (names by CAS): OCR-extract every CAS token, keep only those that pass the **CAS
check-digit test** (728 valid CAS; OCR-mangled ones drop out — conservative), plus the normalized names. An NLM
carcinogen counts as "in the 1984 edition" if its CAS or normalized name is in that roster. **499** of the 785
NLM carcinogens match (by CAS 377, by name 31, by substring 91).

| | full NLM combined (N=785) | restricted to 1984 roster (N=499) |
|---|---|---|
| median TD50 (mg/kg/day) | 19.3 | **13.8** (more potent) |
| % exceeding 10⁻⁶ at 5 ppb | 72.7% | 77.4% |
| **% exceeding 10⁻⁶ at 1 ppb** | **52.7%** | **58.1%** |
| % exceeding 10⁻⁶ at 0.5 ppb (1995 rule) | 45.4% | 48.5% |
| % exceeding 10⁻⁶ at 50 ppt (his example) | 20.3% (≈80% excluded) | 19.8% (≈80% excluded) |
| 1-in-5 bookkeeping at 50 ppt → decisions ≤10⁻⁶ | ~96/100 | ~96/100 |

**The reproduction is robust to the restriction.** The 1984-scope set is slightly *more* potent (median TD50
13.8 vs 19.3 — the classic, heavily-studied carcinogens), so at 1 ppb a few more carcinogens cross 10⁻⁶ (58%
vs 53%), but the 50 ppt anchor is unchanged (~80% per-carcinogen excluded; ~96 of 100 decisions ≤10⁻⁶ once you
fold in his 1-in-5-is-a-carcinogen assumption). Rulis's two headline figures hold on either database.

The match count (499) overstates the true 1984 carcinogen set (Rulis's 343) for two reasons, both pushing the
same way: (a) the NLM "carcinogen" flag is the *modern* harmonized verdict, so compounds tested-and-negative in
1984 but found positive in later bioassays now carry a TD50 and are counted; (b) Rulis "**selected**" 343 — a
narrower, criteria-filtered set. We restrict the chemical **set** to 1984's scope but use the NLM harmonized TD50
**values** (the 1984 per-chemical TD50 live only in the OCR-mangled Part IV plot and are not machine-readable).
CSV of the restricted set: `cpdb_1984_restricted_carcinogens.csv`.

## 4. How Rulis uses 10⁻⁶ (the mechanism, from Rulis 1992 verbatim)

10⁻⁶ is not computed — it is **imported as the yardstick** ("Target Risk"). Rulis: the Target Risk is "that
upper-bound level of presumptive lifetime risk deemed commensurate with negligible or de minimis risk, and is
typically chosen to be 1 × 10⁻⁶ (refs 8–10)" — i.e. borrowed from FDA's own Sensitivity-of-the-Method / DES
precedent (the methylene-chloride-decaf line; see `10_DE_MINIMIS_LEGAL_LINEAGE.md` §3.5 and the *One in a
Million* essay). It is the carcinogen-side twin of Frawley's 0.1 ppm: a chosen line, not a derived one.

The calculation around it:
1. Each carcinogen's **upper-bound lifetime risk** at a dietary level comes from a "highly conservative linear"
   low-dose extrapolation — a straight line from the (TD50, 50% response) point to the origin, slope = 0.5/TD50
   per mg/kg/day. (This is exactly the slope our scripts use.) Risk = slope × dose; dose = diet level × the FDA
   60 kg / 3 kg-food convention (1 ppb ≈ 5×10⁻⁵ mg/kg/day).
2. Carcinogen potencies are **lognormally distributed** (his Figure 1). A chosen threshold (T/R) level therefore
   "excludes a probabilistically defined proportion of the area under the lognormal curve … from producing
   dietary risk above … the Target Risk."
3. That excluded area is his **"Target Risk Avoidance Probability"**: the probability that, *if* an exempted
   substance turned out to be a carcinogen, its upper-bound lifetime risk at the threshold would **not** exceed
   10⁻⁶. At 50 ppt he reports ~**85%** per exemption; at 1 ppb the avoidance probability "begins to exceed 50
   percent."
4. He then folds in an **assumed 1-in-5 probability that an untested migrant is a carcinogen**: at 50 ppt,
   1 − 0.2×(1−0.85) = **~97%** of exemptions stay ≤10⁻⁶. Our restricted run gives 1 − 0.2×(1−0.80) ≈ **96%** —
   same structure, same answer.

So 10⁻⁶ enters twice over as a *choice*: once as the borrowed Target Risk line, and once in the 1-in-5 prior — a
distribution-of-potencies argument resting on two numbers picked, not measured. The thing it leaves out is the
same thing Frawley's table left out: the genotoxic, no-threshold carcinogen, for which a "linear upper bound"
is not a conservative cushion but the actual shape of the dose-response (Munro's caveat, quoted in the same
chapter, concedes the higher thresholds hold only "if adequate data are available to preclude the genotoxicity
of the chemical").