"Safety" Testing of Carcinogenic Agents 1 NATHAN MANTEL and W. RAY BRYAN, Biometry Branch and Laboratory of Viral Oncology, National Cancer Institute,2 Bethesda, Maryland SUMMARY The problem of determining what dose 9 X 10- 8 mg per mouse when a statisti- levels of an agent are safe, e.g., non- cal assurance level of 99 percent and a carcinogenic, cannot be resolved unless conservative probit slope of 1 normal one first defines some level of permis- deviate per log for extrapolation are sible risk, no matter how small, rather used. The principles given are of gen- than insisting on absolute safety. Both eral applicability in other safety-testing because of practical considerations and problems, the point of emphasis being statistical variation, the determination that since direct observation cannot be oflow-risk dose levels, for example 1/100 made that the risk at some dose level is million, cannot be made directly but clearly low, indirect conservative pro- must be by extrapolation from observed cedures for the determination of low data. A conservative approach for risk levels must be made. The arbi- doing 80 is given. In addition to an trary risks and definitions for so doing arbitrary definition of "virtual safety," may change with circumstances. The it is necessary to define an arbitrarily procedure does not require specification high statistical assurance level and a of an experimental protocol; the "safe" rule for extrapolation by use of an arbi- dose is determined on the basis of w ha t- trarily shallow slope. Illustrative data ever data are available. Minimum pro- by Bryan and Shimkin (J. Nat. Cancer tocols may, however, be desirable, since Inst. 3: 503-531, 1943) on the carcino- greater amounts of data will ordinarily genic action of methylcholanthrene permit specifying large "safe" levels.- yield a "safe," 1/100 million, dose of J. Nat. Cancer Inst. 27: 455-470,1961. ALTHOUGH IT is not definitely known whether all chemical compounds that induce cancer in experimental animals will also cause cancer in man, it has been fairly well established that, with the possible exception of arsenical compounds, every chemical or physical agent known to produce cancer in man will likewise do so in one or more species of lower animals (1). Of necessity, the potential deleterious effects of chemical compounds must be tested in laboratory animals. The reaction of a particular species of animal does not constitute proof that humans will react similarly, but 1 Received for publication March 29, 1961. , National Institutes of Health, Public Health Service, U.S. Department of Health, Education, and Welfare. 455 697373-61--20 456 MANTEL AND BRYAN the only recourse the investigator has in implementing test programs for control of the human environment with respect to injurious chemical or physical agents is to proceed under the assumption that any substance harmful to animals is potentially harmful to man. Also in carrying out control tests in animals one must proceed, initially, as if protection of the animal population were the actual problem. When reliable estimates of the doses tolerated (with specified probability) are achieved for the animal population, the problem then becomes one of judgment, based on accumulative experience, in transferring the implications of the results to man. Results of testing with a variety of animals could suggest how extrapolation could be made properly to mammalian species of higher order or larger size. Many chemical compounds may be harmful at certain concentrations, though beneficial at others. The control of "toxic" or "harmful" sub- stances therefore does not imply the necessity of their complete or ab- solute elimination, which in some cases would be either impossible or economically infeasible, but their reduction to concentrations that can be tolerated by essentially all individuals of the population at risk. In rapidly acting toxic substances that are either quickly eliminated from the body or are readily transformed to less harmful compounds through metabolic processes, the estimation of tolerated levels is not too difficult. With a reasonable allowance for an extra margin of safety intro- duced in the form of an arbitrary "safety factor," the results obtained in laboratory animals can be successfully projected to humans. For the most part, modern pharmacology is based on just such usage of laboratory animals. There are other compounds, however, for which the results obtained in laboratory animals cannot be so confidently projected to man. These substances are not readily excreted or metabolized, and because of their weak solubility in aqueous solution, may remain in the body or on body surfaces for very long periods. Other substances may be metabolized to some extent, but because of their selective affinity for cells of certain types they may accumulate selectively within these cells to yield harm- fully high concentrations if supplied on a continuing basis. Most chemi- cal compounds that cause cancer; e.g., polycyclic aromatic hydrocarbons, naphthylamines, azo dyes, and some steroids, have properties falling into one or the other of these categories. Still other compounds have an immediate initial effect, which is not considered harmful, and are rapidly eliminated or metabolized; yet, during their brief sojourn in the body they produce some critical intra- cellular damage or change, which does not manifest itself until later. Urethan, which interferes with nucleic acid metabolism, is a classical example of a compound of this type. Injected parenterally, urethan acts immediately to induce transient anesthesia, but single anesthetic doses may cause lung .tumors in mice many months later. Fortunately, this drug was never approved for use as an anesthetic for man. Biometric methodology has been highly developed for the study of JOURNAL OF THE NATIONAL CANCER INSTITUTE TESTING OF CARCINOGENS 457 acute toxicity and rapidly developing biological reactions of other types (see 2-4 for review). Some progress has been made in the development of methods for the analysis of responses that are greatly prolonged in time, such as cancer (see 5 for review); however, the problems associated with the estimation of tolerated levels of carcinogenic agents and other chronically acting compounds have not yet received adequate study. The purpose of this communication is to discuss the major problems in the development of methods for estimating limits of tolerance, or "safety levels," of carcinogenic compounds and to describe certain bio- metric procedures, based on available data, that are applicable to presently known carcinogenic compounds and experimental animal systems. Fac- tors which one must consider in transferring the inferences derived from results in animals to man are also discussed. The suggested procedures are not restricted to carcinogenic agents, but are applicable, in principle, to compounds which cause various other types of harmful reactions or disease. SOME PROBLEMS IN PLANNING AND ANALYSIS OF SAFETY STUDIES In a safety-testing program both the design of the experimental proto- cols and the method of analysis or interpretation of resulting data must be determined. The experimental design cannot be properly determined apart from the plan for analysis of the data. Certain issues in a safety-testing program must be resolved in advance. These relate to what we mean by safety and what kind of feasible results we are willing to accept as proof of safety. Settling of these issues, if necessary on some arbitrary but conservative basis, may permit answers to problems that would otherwise be insoluble. These problems are: 1) How safe is safe'! Absolute safety can never be unquestionably demonstrated experimentally. Rather, experimental results can be used only to establish limits on the risk involved. With the specification of some level of risk, no matter how small, the possibility of determining whether or not that risk is exceeded opens. We may, for example, assume that a risk of 1/100 million is so low as to constitute "virtual safety." Other arbitrary definitions of "virtual safety" may be employed as conditions require. Incidentally, an inflexible requirement for absolute safety may lead to acceptance of high levels of hazard. The impossibility of really demon- strating absolute safety leads to the acceptance, as a satisfactory demon- stration, that no hazard was observed in an experimental protocol of moderately large size, 100 or even 1,000 animals. Such evidence, how- ever, only provides assurance, at the 99 percent probability level, that the true risk is under 4.5 percent in the 100 animals or 0.46 percent in the 1,000 animals. 2) What constitutes proof of safety'! In principle, one could use an experimental protocol sufficiently large to demonstrate that "virtual VOL. 27, NO.2, AUGUST 1961 458 MANTEL AND BRYAN safety" obtained. For this purpose it must be realized that an observed outcome of no tumors among 100 million treated mice does not necessarily demonstrate clearly that treatment was either absolutely or even vir- tually safe. This outcome could arise with a probability of 1 percent, even if the risk involved were as high as 4.6/100 million. It would in fact require a total of some 460 million tumor-free mice to demonstrate at the 99 percent assurance level that "virtual safety" obtained. Simi- larly, tumor-free results for 10,000 mice would only indicate that the risk was less than 1/2,200 and it would require tumor-free results in a total of some 450 mice to establish with high probability that the risk was under 1 percent. Studies of feasible size can be used to establish directly only risks of the order of 1/100 or higher. Data from such studies can be used to ascertain the treatment level consonant with a prescribed risk or to establish limits on the risk for a particular treatment level. The deter- mination of "safe" levels can be made only by indirect methods extrap- olating from the data obtained in a feasible study. The use of extremely large studies to establish safety may well be self- defeating. The almost certain occurrence of unusual syndromes in one or more of a large number of test animals, albeit these may have arisen spontaneously, will require admitting the possibility that they may be attributable to drug treatment. 3) How can protocol data be extrapolated sajely? Since it is only feasible to use experimental protocols for the direct determination of relatively high-risk dose levels, e.g., 1 percent, which makes extrapolation methods necessary, one must consider that such extrapolation methods might yield misleading results. Procedures exist which permit extrapolating the results obtained at a number of test-agent levels to determine the dose level corresponding to any desired degree of risk and to establish, with a high level of assurance, a minimum bound on this dose level. These methods, however, are based on the assumption that the relationship observed between tumor occurrence and dose at the levels tested will continue to apply in the regions to which extrapolation is being made. The validity of such an assumption cannot be tested and, if it is false, may lead to a serious overestimate of the "safe" level. Such overestimation would arise if the relationship of response to dose is less pronounced at the dose levels to which extrapolation is made than at the levels at which tests were performed. Another related source of difficulty is that tests are performed on relatively pure inbred strains of laboratory animals. Characteristical1y, such pure strains will show steep dose-response relation- ships, while the heterogeneous population to which it is intended to apply the results of testing may exhibit a shallow relationship. To avoid the risk of overestimation of "safe" levels which may result from extrapolating with too steep a slope, it is suggested here that a conservative result may be obtained by extrapolation with an arbitrarily low slope from the data at hand. For example, quantal-response data, that is, all-or-none response, frequently exhibit a somewhat linear re- lationship when plotted on probability paper with a normal or probit JOURNAL OF THE NATIONAL CANCER INSTITUTE TESTING OF CARCINOGENS 459 scaling for the percent responding and a logarithmic scaling on dose. According to the kind of system being investigated the dose-response slopes observed may vary widely. With systemic poisons, rather steep dose-response slopes are generally obtained, reflecting the narrow loga- rithmic range between the lowest dose levels at which any toxic deaths occur and the levels which are lethal for all animals-such response slopes are on the order of 10 to 50 or more probits per common loga- rithmic or tenfold dose increase (the technical definition of this slope is not given here; it permits one to perform any necessary extrapolations). The all-or-none response also arises in the study of the therapeutic effects of antibiotics. The response slopes in these instances are generally shallower, on the order of 3 probits per common logarithm. Lower slopes on the order of 2 do arise in virus-assay work, but in these instances it may be that use of a different response curve, the single-hit or one- particle curve, would be more appropriate. From such experience it would appear that the use for purposes of extrapolation of a slope as low as one pro bit per common logarithm is likely to be conservative. While slopes in the regions to which we wish to extrapolate cannot be established, the suggested slope of one is rather low compared with that ordinarily obtained in the observable region. The indicated low slope is a key feature in the method to be suggested as conservative for the establishment of "safe" levels. For this reason it may be well to make clear just how weak or strong are the assumptions being made in its use. In fact the only assumption being made for the procedure to be conservative is that whatever form the true response curve may take over the region of extrapolation, the average slope is not less than the assumed one. There is no requirement that the true response curve be linear or even that the true slope should nowhere be less than assumed. The use of the indicated conservative slope is, of course, arbitrary. Other values may be specified and other scales for extrapolation may be employed. Once answers to the three questions are provided, a defined level of "virtual safety," a prescribed level of statistical assurance, and a con- servative rule for extrapolation, it becomes possible to determine, from protocol data, "safe" dose levels and to undertake the planning of any necessary experimental protocols. This is considered in the succeeding sections. ANALYSIS OF RESULTS AT A SINGLE DOSE LEVEL In what follows we will take the defined level of "virtual safety" to be 1/100 million, the statistical assurance level to be 99 percent. Extrapola- tion will be on the basis of 1 normal deviate or pro bit per common log or tenfold change in dose. To illustrate how definitive "safe" levels are obtained, consider that a prescribed dose of an agent has elicited no tumors in a group of 100 ex- VOL. 27, NO.2, AUGUST 1961 460 MANTEL AND BRYAN perimental animals. While the observed rate of tumor occurrence is o percent, we will take, as an upper limit on the true rate, that risk for which the probability for occurrence of as few as zero tumors is 1 percent (100% less the assurance level of 99%). This is given by the solution for P to the equation (1 - P)l00 = 0.01. Solving, we have 100 log (1 - P) = log 0.01 = - 2; log (1 - P) = - 0.02 = 9.98-10; 1 - P = 0.955; P = 0.045 or 4.5 percent. We now know that the observed outcome of no tumors in 100 animals is consistent with the possibility that the true risk was, in fact, 4.5 percent. From tables of the normal probability function (6) we determine that the normal deviate, Y, such that the integral from - co to Yequals 0.045 is -1.695, for a probit value of 3.305 = 5-1.695. However, the normal deviate, Yo, corresponding to a risk of 1/100 million can similarly be de- termined as -5.612, the probit being - 0.612. 1-'he upper limit on the risk for the dose employed is 3.917 = -1.695 - (-5.612) normal deviates above the desired safe risk, and, at a slope of one normal deviate per common log, it is necessary to reduce the log dose by 3.917 logs to attain a "safe" level. The antilog of 3.917 being about 8,300, it is determined that the "safe" dose is 1/8,300 times that which had been tested. Table 1 shows the preceding results together with those for several other hypothetical experiment sizes in which no tumors were observed to occur in a group of treated mice. For each such group the table shows the greatest risk consistent, at the 99 percent assurance level, with the observed outcome. Also shown are the corresponding conservative estimates, with the slope of one normal deviate per tenfold dose increase, of the "safe" (1/100 million) dose level expressed as a fraction of the dosage tested. The larger the experimental group among which no tumors occurred, the greater is the value determined as the "safe" dose. TABLE I.-Illustration of "safe" doses determined when no risk was observed in single groups Number of mice with tumors/ No. tested 0/10 0/50 0/100 0/500 0/1,000 Upper limit on tumor risk at level employed, 99 percent assurance (percent) 37 8.8 4. 5 O. 92 O. 45 Estimated "safe" dose (1/100 million) dose employed = 1 1/190,000 1/18,000 1/8,300 1/1,800 1/1,000 Study of this table indicates that a control system can be established without the need for specifying a design protocol, though there might be some merit in specifying a minimum size. For, whether the amount of evidence adduced to show that an agent is safe is great or small, it can be properly weighted to determine conservative safe limits. If the promulgator of a drug wishes to have high tolerances established for his JOURNAL OF THE NATIONAL CANCER INSTITUTE TESTING OF CARCINOGENS 461 compound, it would be worth while for him to produce results of experi- ments with a large number of animals, which would show that the agent is not especially dangerous at certain dose levels. Where the lack of danger is demonstrated with a small number of animals or as a result of testing at rather low doses, the dose levels determined as "safe" will be lower. A control system can be constructed about the possibility for interpreting the data submitted, with no specification needed as to how much data should be obtained. THE CASE OF SOME OBSERVED RISK The method indicated for determining "safe" levels is not restricted to the case in which no observable danger was noted. That case was used for illustration because of the simpler mathematical solution involved. In general it will be that an experiment testing n animals will yield r unfavorable results (tumors). The upper limit on risk at the 99 percent assurance level is then the solution for P to r ~nC1Pt(1 - p)n-I = 0.01 ;=0 or n ~ nCtPI(1 - p)n-t = 0.99 ;=-r+1 The solution for P is a value such that the chance of observing as few or fewer than r tumors is 1 percent. At values for P in excess of the solution, the chance for such an outcome is less than 1 percent. The preceding equation can be solved approximately by reference to tables of cumulative binomial probabilities. Such tables, as for examples those of the Ordnance Corps (7), show values of quantities such as n ~ nCt P1(1- p)n-I. ;=r+1 Let us consider as a hypothetical outcome that, of n = 100 mice, r = 10 have developed tumors, for an observed rate of 10 percent. Referring to these tables we see that for P = 0.19 100 ~ nCt Pi(1 - p)n-t = 0.9891 i=11 and that for P = 0.20 100 ~ nOjP1(1- p)n-t = 0.9943. i=-l1 VOL. 27, NO.2, AUGUST 1961 462 MANTEL AND BRYAN We may take the solution for P as approximately 0.192. The cal- culating procedure follows as before. The normal deviate corresponding to a 19.2 percent probability is -0.871 and, at the slope assumed, it will require a reduction in log dose of 4.741 = -0.871-(-5.612) to obtain a risk of 1/100 million. The "safe" dose is then determined as 1/55,000 of the dose which had been employed, 55,000 being the antilog of 4.741. USE OF CONTROL DATA In the methodology shown, it was assumed that the response of interest, appearance of tumors, did not occur spontaneously. In general, however, it will be desirable to use controls to check this and to allow data obtained for such controls to modify the determination of "safe" dose made. However, with the method already shown, failure to use controls or to take control data into account will result in more conservative determina- tions of the "safe" dose. If, in fact, spontaneous rates are rather low, they will have little effect on the determination made. For this reason we may adopt a procedure which is somewhat more conservative than necessary for taking control data into account. This we can do by taking, as before, the upper limit for the risk in the treated group as the solution for P t to r, L: ntOtP:(1- pt)nr-i = 0.01 i=O while the lower limit on the control group risk is the solution for Pc to n. 2: ncOiP~(1 - pc)nc-i = 0.01 i = T. where T t of n t treated animals and Tc of nc control animals, respectively, showed positive response. These equations can be solved through the use of binomial tables. At this point Abbott's formula (8) can be used to obtain a modified value for the treated-group risk, this being computed as Pi = (P t - P c)/ (I-Pc). The computation follows as before, the normal deviate being obtained corresponding to P;.3 Since p~ cannot exceed P e, the use of control data cannot result in decreased values for the calculated "safe" dose. ANALYSIS OF RESULTS OBTAINED AT SEVERAL DOSE LEVELS When an agent is tested, it is sometimes desirable to do so over a number, perhaps even a wide range, of dose levels. In such instance, all the avail- able data should be considered in the determination of "safe" 1evels. 3 The fastidious statistician may object to the moderately conservative procedure described here for determining P'.. A more rigoroUS procedUre would require setting limits on the ratio of binomial parameters. P; would be given by reducing by unity the upper limit on the ratio of control-to-treatment nonresponse probabilities. JOURNAL OF THE NATIONAL CANCER INSTITUTE TESTING OF CARCINOGENS 463 Parametric Procedures As already indicated, it may be unwise to extrapolate the data with the observed response slope. Procedures for taking into account the statistical variation of the fitted slope will not suffice to make such extrapolation methods conservative. To see this, one need only consider the use of quite large study sizes. In this case, statistical variation will be negligible with the result that extrapolation to low-risk levels will be substantially with the slope obtaining in the observable range. (However, with small study sizes or studies, the designs of which are inefficient for estimation of the slope, taking account of the statistical variation of the slope determination may result in extremely conservative estimates of "safe" levels. The lower confidence limit on the "safe" dose in such instances may be less than that which would be obtained when an arbitrarily low slope value is used for extrapolation, as suggested.) An alternative device could be to employ parametric procedures, for example, fitting the maximum likelihood pro bit line to the data (9), to determine the lower confidence limit on the dose corresponding to some moderate percentage risk, e.g., 1 percent, to which risk-level extrapolation with the observed slope is considered reliable. Then, using an arbitrarily conservative value for the slope, and anchoring at the lower limit on the 1 percent dosage, one could extrapolate to the desired "safe" level. While the procedure just indicated is straightforward, its employment is based on the validity of the parametric function employed. Quite useful results have derived from such parametric assumptions in bioassay work. But for bioassay purposes it is not essential that a parametric function be exactly appropriate; it is sufficient that the function assumed do a reason- ably good job of graduation. (Even somewhat inappropriate curve forms can yield reasonably good relative potency estimates as long as the prepar- ations being compared are tested over the same regions of response.) The use of an invalid parametric function can lead to inappropriate estimates of dose levels corresponding even to moderate risks. In many situations the estimates may be only moderately inappropriate, in others quite serious. With parametric procedures, the use of rather large experimental groups in one region of the response curve can be reflected in narrow-range confidence intervals for dose levels corresponding to risks in other ranges. This can produce a false sense of security in one's estimate of the moderate-risk dose level when the parametric model is violated. Accordingly, while agreeing that parametric procedures may be useful, we will consider the possibility for extending the method described for the single dose-level case without the need for assuming any particular model. The only assumption is that the arbitrary low slope assumed is conserva- tive and it is, of course, implicit that the response curve is monotone. In instances in which it can be recognized that use of a parametric procedure is not misleading, workers may prefer this procedure rather than the more generally appropriate nonparametric procedures described in the next section. VOL. 27, NO.2, AUGUST 1961 464 MANTEL AND BRYAN Nonparametric Procedures In the preceding section it was suggested that the use of parametric methods, while straightforward, could lead to nonconservative results. There are no simple fixed rules for conservative estimates of the "safe" dose when several dose levels are employed and one is unwilling to make assumptions about the dose-response curve in the region of observation. How estimates can be made in these circumstances can best be demon- strated by illustration. 4 We will begin with some simple ideas. Suppose investigators at two laboratories independently test an agent at a level of 100 mg/kg. At the first, with 500 mice tested, no tumors are observed, and with the methods described previously the "safe" dose is estimated as 100 mg/kg/l.800 = 0.056 mg/kg (ej. table 1). A somewhat lower "safe" dose of 0.012 mgjkg is obtained at the second laboratory, based on the observation of no tumors among only 100 mice. It can readily be recognized here that it would be inappropriate to reject the high "safe" level of the first laboratory just because of the low estimate obtained at the second laboratory. The two sets of data are consistent and in fact confirm each other. If any modification is to be made, it should be to consider that, with results combined, no tumors have occurred among 600 mice which would lead to a safe dose of about 0.065 mgjkg. Suppose that at still a third laboratory, tests are made at a dose of 50 mgjkg and, with no tumors occurring among 500 mice, the calculated "safe" dose at that laboratory is 0.028 mgjkg. Here again we can see that the "safe" dose obtained at the first laboratory should not be modified downward just because a consistent result at the third laboratory yielded a lower "safe" dose. If anything, it should be considered that the 500 mice not responding at the higher dose at laboratory 1 would not have responded at the lower dose employed at laboratory 3. With these 500 mice treated as nonresponders at the low dose, there is then, including those at laboratory 3, a total of 1,000 mice not responding at the low dose. (Laboratory 2 results are being ignored for this illustration.) This yields as a calculated "safe" dose 50 mgjkg/l,OOO = 0.050 mgjkg. In the present case the "safe" dose based on the combined calculation is less than that for the data of laboratory 1 alone of 0.056 mg/kg and so the higher figure is retained. Had the combined calculation led to a higher "safe" dose it would have been correct to take that as the estimate. The point of these illustrations is that, when the data obtained from a series of doses are consistent with each other, it is appropriate to take as the calculated "safe" level the highest one pertaining to the results at anyone dose. Even a higher "safe" level may be taken when it can be ob- tained through a justifiable combination of the results at the various doses used. What is meant here by "justifiable" combinations can be seen from the following example in which hypothetical results at four dose levels, low, middle, and high, are considered . • An alternative method to the one about to be described is given as an appendix. JOURNAL OF THE NATIONAL CANCER INSTITUTE TESTING OF CARCINOGENS 465 Observed results "Justifiable" combined results Dose in Number of tumors/ Combined number of tumors/combined size order number of mice number of mice 1 0/100 (0/100); 5 0/200; 1/300; 5/400 2 0/100 0/100; 1/200; 5/300 3 1/100 1/100; 5/200 4 4/100 4/100 a Parentheses indicate tha.t this result need not be considered, as the next must yield a higher value for the "safe" dose. In the absence of inversions in the data, we can determine the "justifi- able" combined results at a dose by adding to the results at that dose, in succession, the results at still higher doses. A calculated "safe" dose can be determined for each dose used and for each of the various "justifiable" combined results corresponding to each dose. The over-all "safe" dose would be the highest of the various determinations. Where data show an inversion the procedure is altered. Consider a simple example: Observed results Number of tumors/ "Justifiable" combined results Dose in size number of order mice Combined No. of tumors/combined No. of mice 1 1/100 (1/100) ;1/200 2 0/100 [(0/100) ;1/100] At the first dose level it is clear that the calculated Hsafe" dose would be larger if based on the combined results for both levels than if based on the results observed at this level alone; accordingly, as noted in one in- stance in the preceding example, the result at the lower level alone is shown in parentheses. At the higher dose level an inversion occurs; there is a lower incidence of tumors even though the dose level is higher. In view of the inversion, one would be less willing to accept as "safe" the calcu- lated value obtained on the basis of results for this dose level alone. The two alternative results shown in brackets at this dose level are the results at this dose level and the contradictory results at the lower dose level. The significance of the use of brackets here is that the calculated "safe" value is now to be taken as the lesser of the values suggested by the alter- native results. In the present instance, the result at the lower dose 1/100 would yield the lower "safe" dose and so the alternative result is shown in parentheses. (It will not always be necessarily true, when an inversion occurs, that the retained result will correspond to the higher tumor inci- dence at the lower dose level.) In the present example the calculated "safe" dose will be that corresponding to the first dose with combined result 1/200 or that corresponding to the second dose with retained result 1/100, whichever is the greater. In practice the application of the methods just indicated is much simpler than the explanation would suggest. Ordinarily only one or per- haps two of the combined results at a dose level will need to be considered. VOL. 27, NO.2, AUGUST 1961 466 MANTEL AND BRYAN The results at some dose levels may immediately permit us to drop them from consideration. After only a limited amount of experience it should be possible so do this rather rapidly. The calculations are actually simpler than those for the maximum likelihood probit method. [In fact, the confidence limit procedures ordinarily employed in connection with the probit method are not fully satisfactory. A more appropriate method is described by Mantel and Patwary (10), but it could require a somewhat extravagant level of computational effort.] And, while for completeness, we have indicated the need for considering the possibility of inversions, this will ordinarily not pose a problem. An Illustrative Example An example from the literature shows how the procedure just discussed can be applied. The data, from Bryan and Shimkin (11), are the results obtained after a single injection of methylcholanthrene into mice, 12 dif- ferent dose levels being used in the study. The reader may refer to the original article for details. No peculiarities arise in this example. There are no inversions. At the four lowest levels no tumors occurred and the appropriate combined result is readily recognized in these instances. At the middle four levels it can be recognized that there is no point in combining results and, finally, the four highest levels can be disregarded as these all yielded 100 percent tumor occurrence. The procedure is illustrated in table 2. The first three columns show, respectively, the dose, log dose, and the observed result. Column 4 shows each combined result considered, and there should be a separate line for each such result. In the present instance only one combined result required to be considered at each dose. For each such result, column 5 shows the calculated maximum risk at the 99 percent assur- ance level. These were obtained from binomial tables or calculated directly for the case of no tumors occurring. The normal deviate cor- responding to the maximum risk is obtained from tables of the normal distribution and is shown in column 6. Finally, column 7 shows the calculated "safe" (1/100 million) log dose. The maximum for this, 2.962-10, appears in the second line, and the over-all calculated "safe" dose is 9 X 10-8 mg per mouse. One might remark that this "safe" dose is so low as to make impractical any use of it which may result in its ingestion by humans. But we are dealing here with a rather potent carcinogen and if any compounds are to be assigned tolerated levels which are virtually zero, this is one of them. Text-figure 1 shows graphically the results and analysis of the experi- ment just considered. Normal deviates are shown on the vertical scale, while on the horizontal scale the dose employed is shown as negative descending powers of 2. The points shown represent the outcomes at each dose level; 0 and 100 percent outcomes are shown by arrow. The solid line on the figure is the maximum likelihood probit line fitted to the JOURNAL OF THE NATIONAL CANCER INSTITUTE TESTING OF CARCINOGENS 467 TABLE 2.-Illustration of methodology for determining the "safe" dose from results at several dose levels; data from Bryan and Shimkin (11) Calcu- Maxi- lated Combined mum "safe" Result result P (1/100 value Corre- million) No. of tumors No. of tumors 99 % sponding log dose Dose mg/ assur- normal (2) - (6) mouse Log dose No. of mice No. of mice ance deviate -5.612 (1) (2) (3) (4) (5) (6) (7) O. 000244 6. 388-10 0/79 0/158 O. 0288 -1. 899 2. 675-10 O. 000975 6. 990-10 0/41 0/79 O. 0566 -1.584 2. 96'2-10 O. 00195 7. 291-10 0/19 0/38 O. 1141 -1. 205 2.884-10 0.0039 7. 592-10 0/19 0/19 0.2152 -0.789 2.769-10 O. 0078 7. 893-10 3/17 3/17 O. 480 -0.050 2.331-10 0.0156 8. 194-10 6/18 6/18 0.729 +0.610 1. 972-10 O. 0312 8. 495-10 13/20 13/20 0.871 + 1. 131 1. 752-10 O. 0625 8.796-10 17/21 17/21 O. 958 + 1. 728 1. 456-10 O. 125 9.097-10 21/21 O. 25 9. 398-10 21/21 O. 50 9. 699-10 21/21 1.0 10.000-10 20/20 data. Above the first 8 data points the triangles shown correspond to the maximum P values of table 1. Extrapolation, with the slope of one normal deviate per common log to the over-all calculated "safe" value, is indicated by a broken line. All triangles, other than the one from which 3 (99.87'1.) .11 Observed Outcomes ~ 2 {97.7'1.1 & Maximum P Values, 99" Assuronc. 196%} & :g Percentage Outcomes and Maximum (87'Y.J~(IOO'Y.) .D .1 P Vatues Shown in Parentheses 0 (84'1'.1 (7J%'''' .(81%J 0: (48%J'" 165'Y.J 0 w (SO'4) ~ 122%J'" 133%J :; -I (16'1'.) III'YoJ'" • IIB%J r2~J ~ w 0 ...J -2 {2.3'1'.1 Conservative Exlrapal ,. ...,./"/"'(5.7%J