|
|
||||||||
a Department of Pathology, Institute of Gerontology, and Geriatrics Center, University of Michigan, Ann Arbor
b Department of Physiology, Southern Illinois University, Carbondale
c Ann Arbor DVA Medical Center, Michigan
Decision Editor: John A. Faulkner, PhD
| Abstract |
|---|
|
|
|---|
MICE homozygous for the Ames dwarf mutation, Prop1df, show a 40% to 70% increase in mean and maximum longevity compared with control mice (+/+ or df/+) in a specific pathogen-free colony (1). The df/df genotype impairs development of the embryonic anterior pituitary, leading, in postnatal animals, to primary deficiencies in pituitary production of growth hormone (GH), prolactin, and thyroid-stimulating hormone, and thus to secondary deficiencies in insulin-like growth factor 1 (IGF-1) and the thyroid hormones (2). Mice of the Snell dwarf genotype, homozygous for the Pit1dw mutation, show a similar degree of lifespan extension (3), consistent with the known role of Prop1 as an inducer of Pit1 production in the embryonic pituitary (4)(5). It is not known how alteration of hormonal levels (or perhaps other, as yet undocumented, effects of the Pit1 and Prop1 mutations) might lead to life-span extension, but further study of this model seems likely to produce useful insights into the mechanism of physiological decline and vulnerability to illness in late life.
In principle, the availability of new methods for parallel assessment of multiple mRNA levels in cell and tissue samples could be used to produce a high-resolution image of gene expression patterns as they change with age and genotype. This could identify mechanisms by which longevity mutants lead to decelerated aging. Initial attempts to use array-based screening methods to develop catalogs of genes whose expression is affected by aging, by caloric restriction, or by mutations that lead to early death (6)(7), while providing provocative clues about the possible role of specific gene families in the aging process, have not included formal tests of statistical significance, making it impossible to determine which of the observed differences in gene expression are plausibly attributed to chance variation rather than to reproducible effects of age, diet, or genotype.
To provide an initial assessment of the effects of aging and of the df/df genotype on gene expression, we used 588-target nylon membrane cDNA arrays to study liver mRNA levels in df/df mice at ages 5, 13, and 22 months (n = 34 mice/group), comparing them to an equal number of nonmutant control mice. We found that although the levels of expression of many of these genes appeared to show effects of age or genotype when assessed by ratio statistics alone, application of formal significance testing greatly reduced the number of genes for which the effect was unlikely to represent merely chance variation. Based on an arbitrary criterion of p < .01, 22 of the tested genes seemed likely to show reproducible effects of age or genotype in replicate sample populations. We also used the entire collection of expressed genes to test the hypothesis that global patterns of expression change more slowly in df/df than in control mice.
| Methods |
|---|
|
|
|---|
RNA Preparation and Labeling
Liver tissue, stored frozen in liquid nitrogen, was homogenized using a mortar and pestle. Total RNA was then extracted using the Atlas Pure Total RNA Isolation Kit (Clontech, Palo Alto, CA) following the vendor's protocol. Genomic DNA contamination was reduced by treatment with RNase-free DNase I. RNA integrity was confirmed by electrophoresis on agarose gel. Reverse transcription, 32P-labeling, and hybridization were conducted using the Atlas cDNA Expression Array Kit (Clontech) following the recommended protocol in all steps. Briefly, in each case, 2 to 5 µg of total RNA was converted into 32P-labeled first-strand cDNA by means of Maloney mouse leukemia virus reverse transcriptase. The purification of the labeled cDNA from unincorporated 32P-labeled nucleotides was achieved with Chroma Spin-200 (Clontech) column chromatography. cDNA fractions of highest activity were pooled and hybridized to one of the mouse Atlas membranes containing 588 mouse cDNA fragments and 9 housekeeping control cDNAs.
Hybridization
After prehybridization (30 min at 68°C in ExpressHyb [Clontech], supplemented with 100 µg/ml sheared salmon testes DNA [Sigma, St. Louis, MO]), the heat-denatured probe was added. Hybridization occurred overnight at 68°C. Membranes were washed 4 x 30 minutes in 2 x sodium chloride/sodium citrate (SSC)/1% SDS at 68°C, followed by two washes in 0.1 x SSC/0.5% SDS (30 min at 68°C). All prehybridization, hybridization, and washing steps were carried out with continuous agitation in a hybridization incubator with rotating bottles (FisherBiotech, Fisher Scientific, Pittsburgh, PA). Membranes were sealed in sample bags (Wallac, Finland), exposed to a storage phosphor screen for 1 to 3 days, and evaluated with a phosphorimager (Molecular Dynamics, Sunnyvale, CA).
Data Reduction
Each hybridized nylon array was used to produce two digital images, typically after different exposure times (13 d). Each digital image was converted into a table of pixel volumes using Molecular Dynamics ImageQuant software (Molecular Dynamics, Sunnyvale, CA). Two such tables were produced from each image, for a total of four tables from each mRNA sample; this redundancy was used to detect errors in digitization by comparison among the four replicate data sets, and rare errors were corrected by redigitization of the original images. Average values from these quadruplicate digitizations were then used for further calculations.
Normalization for differences among experiments in exposure time, probe specific activity, probe quality, and other technical factors was conducted using a procedure described in detail elsewhere (I. Dozmorov and R. Miller, unpublished data, 2000). In brief, the procedure assumes that spots corresponding to mRNAs not expressed by the liver will be normally distributed and computes the mean and SD of these nonexpressed genes (typically
350 of the 588 tested genes on each array). The level of each expressed mRNA is then calculated as the "S" score, corresponding to the number of SDs above the mean value of the nonexpressed genes on the same array. Linear regression is then used to compare the distribution of S scores from each experimental array to that of a standard control (a sample from a normal young mouse), and the regression coefficients are then used to adjust each experimental sample to a common baseline.
Among the 588 genes in the array, 300 were designated "nonexpressed" at all ages and in both genotypes on the grounds that the mean value of S was less than 3 for each combination of age (5, 13, and 22 mo) and genotype, normal and dwarf (i.e., that the gene was never expressed at a level 3 SD above background). Although it is possible that more sensitive methods might have documented expression of some of these genes, their level of expression was judged to be too low for reliable detection by the Clontech array system. An additional 23 genes were considered not evaluable because of their proximity on the array to cDNA targets expressed at very high levels in the liver. The remaining 265 genes were the subject of this report.
Simulation Study
A simulation study is described in more detail elsewhere (8). In brief, a random-number generator was used to create sets of normally distributed values with mean = 100 expression units and preselected SDs. Sets of 10,000 such numbers (corresponding to 10,000 gene expression measurements) were assigned to each of four young and four old mice. The ratio of mean "young" level to mean "old" level was then calculated for each of the 10,000 simulated genes, and the proportion of these ratios exceeding thresholds of interest (e.g., twofold, threefold, etc.) was tabulated for each SD level. To simplify interpretation, expression levels were limited to the range of 10 to 300; this adjustment led to a slight underestimation of the number of extreme ratios in the tables for a higher SD.
| Results |
|---|
|
|
|---|
|
To see if the level of variation in gene expression differed with age or genotype, we calculated the mean CV value and key percentiles for each of the six groups of mice in this study. Fig. 2 shows these values. The median variation in gene expression among individual normal mice increased significantly between 5 and 13 months of age (p = .01; Mann-Whitney test), and increased still further between 13 and 22 months (p = .003). In contrast, dwarf mice showed less intermouse variation at 13 months (p = .001 compared with 5-month-old animals), but variation then increased between 13 and 22 months of age (p = .001).
|
|
We also examined differences between Ames dwarf mice at 5 and 22 months of age. Twenty genes met a criterion of p < .05; the two where p < .01 are tabulated in Table 1 C. Table 1 D shows the four genes with p(t) < .01 comparing dwarf mice at 5 and 13 months of age; in addition, four others not shown met p < .05.
Effects of the Prop1df Mutation on Gene Expression at Various Ages
Table 2 A collects the four examples of genes where normal and dwarf mice differed at 5 months of age with p < .01; seven others, not shown, met p < .05. Among these, expression of the interleukin-11 (IL-11) gene approached the Bonferroni-adjusted significance criterion. Seven genes, shown in Table 2 B, differed at p < .01 at 13 months of age, in addition to another 12 genes where p < .05. One of these, the IGF-1A gene, was significantly lower (p = .00002) in dwarf mice, reflecting the low levels of GH-stimulated IGF-1 production in these mutants. Table 2 C shows the effect of the mutation at 22 months of age: there were 12 genes where p < .05, among which the three shown in the Table 2 achieved p < .01.
|
Distribution of Young-to-Old and Normal-to-Dwarf Ratios Among Tested Genes
Because studies using gene-array approaches to document age, diet, or mutation effects on gene expression (6)(7) have typically featured compilations of genes selected solely on the basis of ratio calculations (i.e., without formal significance testing), we also compiled lists of genes ranked on this criterion. Fig. 3 shows histograms of the young-to-old ratio (comparing 5- with 22-month-old mice) for normal (Fig. 3) and dwarf mice (Fig. 3). When all the genes were considered, the data set appeared to include a large number of genes whose expression changed more than twofold over the age range. When genes with CV > 50% were excluded, however, the number of genes showing these relatively extreme apparent age effects became much smaller. For the dwarf mice, for example, all of the 69 genes with a young-to-old ratio above 3 or below 0.33 were within the set with CV > 50%; it is likely that nearly all of these were false positives whose high observed ratio reflected the effects of sampling variation for genes whose expression varied greatly from mouse to mouse. Similarly, of the 44 genes that in normal mice produced a young-to-old ratio above 3 or below 0.33, only two (cyclin G and IGF-1; see Table 1 A) met a significance criterion of p = .01. In the same way, the effect of the df mutation on levels of gene expression cannot be convincingly examined using ratio calculations alone: Fig. 4 shows that most of the genes with a normal-to-dwarf ratio greater than 2 were within the high CV subset most likely to generate false-positive results. These observations are consistent with the idea that most of the genes yielding high ratios do so because of the well-known effects of sampling error for traits with high intersubject variance. The observations are consistent with a simulation analysis (8), which shows that data sets conforming to the null hypothesis (e.g., no true age effect) yield high numbers of false-positive high ratios when the experiment simulated uses small numbers of mice and in which at least a fraction of genes exhibit relatively high CVs. Fig. 5 shows an excerpt from the simulation results, showing the number of false-positive results, using ratios from 1.5- to 4-fold, at various levels of CV, for n = 4 mice per group. It is clear from the simulation that genes with CV > 50% frequently produce ratios of twofold effect by chance alone in a study using n = 4 animals per group.
|
|
|
|
|
| Discussion |
|---|
|
|
|---|
The scientific community has yet to arrive at a consensus on the best way to evaluate the results of quantifying gene expression using high throughput technologies so as to minimize both false-positive and false-negative inferences. Previous attempts to compile catalogs of age-sensitive genes (6)(7) have relied on simple ratio calculations (e.g. the mean young:old) without explicit consideration of the likelihood that high ratios could reflect merely chance variation in the level of expression of genes whose expression differs greatly among individual mice. Simulations (8) show that false-positive results of this kind are surprisingly frequent in experiments with small numbers of mice per test group; in tests with three mice per group, for example, as many as 6% of the genes tested will show a twofold change for genes with CV = 50%; this number increases to 15% of the genes tested using the criterion adopted by Lee and colleagues (6). Indeed, in our own data set the large majority of genes with high ratios (either young:old or normal:dwarf) came from the subset in which CVs were quite high; none of these highly variable genes met conventional standards of statistical significance, even using a significance criterion (p = .05) that was not adjusted to reduce the effects of making multiple comparisons.
Inferences based on a formal significance test, such as the Student t statistic, which is based on a ratio of effect size (such as young:old) divided by a variance measure, are less likely to produce type I error (i.e., false assertions that a gene varies with age or genotype when in fact it does not). The choice of a criterion for making such assertions must be higher for an experimental design that tests many effects simultaneously, because conventional thresholds, such as p(t) < .05, do not protect adequately against type I error in a series of multiple comparisons. The accepted procedure in such a case is to employ Bonferroni-adjusted significance levels, in this case, p = .05/265 = .0002 for our tests of 265 expressed genes. We found that 12 (5%) of the 265 genes expressed in a normal young liver varied sufficiently from their levels in young df/df mice to achieve p(t) < .05. Similarly, 14 genes (5%) met p < .05 in comparisons between 5- and 22-month-old normal mice. Among these 26 genes, however, only one met the Bonferroni-adjusted significance threshold of p = .0002 (i.e., IGF-1A, with normal:dwarf = 34 at 13 mo of age). Two other genes in our test battery approached this significance threshold: the c-ErbA thyroid hormone receptor (p = .0003, young:old = 3.0), and IL-11, an inhibitor of adipogenesis (p = .0003, normal:dwarf = 6.4 at 5 mo of age).
An efficient and cost-effective approach to compiling a reproducible list of age- and genotype-sensitive genes involves two steps: first, an initial survey, such as the one summarized in Table 1 and Table 2 , to identify genes that seem likely to show the effects of interest, followed by tests of those leading candidates in a second group of test animals. A follow-up analysis of the four genes suggested in Table 2 , to distinguish young normal from young dwarf mice, would only need to achieve p = .05/4 = .012 to provide good evidence for a genotype effect. The replicate samples could be examined by employing the same array method used for the initial survey, or by any other convenient quantitative method, as long as the work was conducted on samples prepared from animals not involved in the other survey.
We note that our data set, like any based on analysis of small numbers of animals or test samples, has very low statistical power, making it very likely that genes that do indeed vary with age or genotype have been omitted from our summary tables. Even when only a single gene is being tested (i.e., using p = .05), 22 mice per group would be needed to give 80% power for detecting a 50% increase in expression level for genes whose CV was 58% (i.e., the average level seen for young mice in our study); six to seven mice per group would be needed for genes whose expression changed by twofold or more. Protocols in which five genes are tested together would need to meet p = .01, and would require nine to ten mice per group for detection of a twofold change at CV = 58%. Genes with lower interanimal variance require fewer animals per group: for CV = 20%, for example, as few as five mice per group should be sufficient to detect 50% changes of expression for tests of five candidate genes (i.e., using Bonferroni-adjusted p = .01), but genes with this low level of interanimal variation make up only a small fraction of those expressed in the liver.
Two other groups have published reports on array-based detection of mRNA levels in tissues from aging mice: one study (6) compared skeletal muscle tissue in 5- and 30-month-old mice, and the other study (9) compared liver tissue in 3- and 24-month-old mice. Each study included a list of genes purported to be altered by the aging process, but neither used a significance test that was based on variation among the individual mice tested. The report from Lee and colleagues (6) used an oligonucleotide array to analyze expression of 6347 mRNAs, and selected genes for tabulation based on ratio statistics alone, without any tests of statistical significance. Han and colleagues (9), as in our study, used Clontech 588-gene arrays on nylon membranes and studied liver mRNA. Data analysis in the Han article made use of a principal-components method to produce a list of genes that were overexpressed or underexpressed in each of four separate comparisons of a young mouse to an old mouse. A fifth experiment, comparing a pool of RNA from the four tested young mice to a pool from the four tested old mice, was analyzed in the same way. These authors then tabulated all genes (n = 7) that showed significant over- or underexpression in at least three of these five comparisons. Although the principal-components approach provides a rigorous test of the hypothesis that, for a specific young/old pair, the gene in question is different between the two tested animals, it does not, unfortunately, provide a significance test of the hypothesis that mean expression level in young mice differs from that of old mice. Furthermore, only one of the seven genes proposed to be age-sensitive achieved the selected significance level (p < .05) in more than two of the four independent young versus old comparisons. It would be of interest to see, through simulation studies, how often genes with high CVs (but no real age effect) would produce false-positive results using the principal-components approach.
It is noteworthy that the set of 13 genes (Table 1 AD) found to be altered by age in either normal or dwarf mice in our own studies does not overlap at all either with the 113 genes reported to be age-sensitive in muscle by Lee and colleagues (6), or with the set of seven genes reported by Han and colleagues (8), to be age-sensitive in the liver. Although it is possible that some of these discrepancies reflect differences in background stocks, tissues, or age groups examined, we suspect that many of the apparent age effects in all three studies will prove, on further analysis, not to be reproducible in additional groups of replicate mice, because, at this point, none of the data sets has produced evidence for statistical significance of the age effects reported.
Table 1 and Table 2 present suggestive evidence for effects of age and/or genotype on expression of 22 genes in the mouse liver. The effect of the df/df genotype on IGF-1A expression was expected, because hepatic IGF-1 synthesis is known to depend on GH, which is absent in Ames dwarf mice. Responses to IGF-1 levels are regulated on a tissue-specific basis by at least two IGF-1 receptors and by a family of six IGF binding proteins (IGF-BPs), and it is thus of interest to note suggestive evidence for effects of aging, in normal mice, on both IGF-BP1 and IGF-receptor 2, the decline in middle age of IGF-1A and IGF-BP3 expression in the dwarf mice, and the apparent effects of the df/df genotype on IGF-BP2 and insulin receptor expression at different ages. Low IGF-1 levels have also been noted in breeds of dogs that show exceptional longevity (10)(11)(12), and further analysis of Ames dwarf mice and other mutant mice with alterations in GH-dependent pathways will help to sort out this complex network of interacting proteins.
The data did not support the simple hypothesis that the df/df genotype merely slows down age-related changes in gene expression. Of the four genes (Table 1 A) that showed the strongest evidence for age effects in normal mice, only one (cyclin G) clearly fit this predicted pattern of a clear age trend muted in the df/df mice. Nor were the effects of df/df on gene expression in young mice necessarily maintained at later ages. Of the four genes that showed the strongest evidence for a df/df effect at 5 months of age, three showed a peculiar and unexpected similarity (i.e., very low levels at 5 months of age [five- to sixfold below normal mice] followed by very high levels, three- to fivefold above normal mice, at 22 months of age). The three genes showing this pattern, IL-11, dioxin-inducible cytochrome P450, and homoeobox protein 4.2, do not share any obvious metabolic connections and have not been implicated by other models of decelerated aging.
In addition to their potential for documenting the effects of age and genotype on expression of specific genes, array-based methods can shed light on global patterns of gene expression. The data in Fig. 2, for example, indicated that the variation among individual mice in the expression level of specific genes increased with age in normal mice, and that this age-dependent change was delayed in Ames dwarf mice, whose mean and median CV for gene expression declined significantly between 5 and 13 months of age before increasing at 22 months. The data also shed light on, and in this case provide no support for, the hypothesis that gene expression patterns change more slowly in dwarf than in normal mice. As illustrated in Fig. 6, the distribution of change scores (mean expression at 5 mo divided by mean expression at 13 mo for each gene) did not differ significantly between dwarf and control mice, and, between 13 and 22 months, change scores were actually slightly (and significantly) higher for df/df than for control mice. The compilation in Fig. 7 shows that although there was a large (R2 = .56) correlation between change scores in dwarf and in normal mice, there are many individual genes that seemed to change more dramatically in dwarf than in control animals, or vice versa. The specific genes included in the Clontech panel were selected for their relevance to biological processes in a wide range of cell and tissue types, and so it is hard to assess the degree to which the global properties of the expression patterns of genes in this selected set give an accurate picture of liver gene expression as a whole.
Initial explorations of array-based methods have been hampered by the relatively high cost of arrays, making it difficult to acquire data on sufficiently large numbers of individual samples (e.g., n = 10/group) needed to produce convincing evidence for real effects of age, diet, or genotype on expression levels of specific genes. Even though technical improvements are rapidly decreasing the cost of arrays per se, the cost per assay will depend on other factors, including the effort needed to produce samples from rare or aged mice. For this reason, the problems of reducing type I errors in high throughput screens may require two-stage designs, like the one we are employing in our ongoing follow-up studies, in which initial surveys are used to generate a list of candidate genes, with suggestively low p values, to be reexamined in hypothesis-testing replication studies.
The power of such high-throughput approaches lies in their ability to generate new hypotheses; in this case, new ideas about which specific genes may mediate the dramatic effects of the df allele on life span and late-life disease resistance. The initial work reported here has yielded lists of candidate genes whose expression was altered in Ames dwarf mice at early ages (i.e., ages at which their effects could contribute to delayed aging). We also generated data on specific genes whose expression late in life differs in the Ames and control stocks, and could potentially serve as biomarkers of important age-sensitive processes. Once replicate studies in additional animals have confirmed some of these observations, it will become possible to compare results from the Ames model to parallel data emerging from studies of mice whose life span is extended by dietary restriction or by other genes with an effect on subsets of the endocrine pathways altered by the Prop1 mutation.
| Acknowledgments |
|---|
Address correspondence to Richard A. Miller, MD, PhD, Geriatrics Center, 5316 CCGCB, University of Michigan, Ann Arbor, MI 48109-0940. E-mail:
Received July 10, 2000
Accepted August 22, 2000
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
S. Melov and A. Hubbard Microarrays as a Tool to Investigate the Biology of Aging: A Retrospective and a Look to the Future Sci. Aging Knowl. Environ., October 20, 2004; 2004(42): re7 - re7. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Tsuchiya, J. M. Dhahbi, X. Cui, P. L. Mote, A. Bartke, and S. R. Spindler Additive regulation of hepatic gene expression by dwarfism and caloric restriction Physiol Genomics, May 19, 2004; 17(3): 307 - 315. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Gromov, G. L. Skovgaard, H. Palsdottir, I. Gromova, M. Ostergaard, and J. E. Celis Protein Profiling of the Human Epidermis from the Elderly Reveals Up-regulation of a Signature of Interferon-{gamma}-induced Polypeptides That Includes Manganese-superoxide Dismutase and the p85{beta} Subunit of Phosphatidylinositol 3-Kinase Mol. Cell. Proteomics, February 1, 2003; 2(2): 70 - 84. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. M. Bronikowski, P. A. Carter, T. J. Morgan, T. Garland Jr, N. Ung, T. D. Pugh, R. Weindruch, and T. A. Prolla Lifelong voluntary exercise in the mouse prevents age-related alterations in gene expression in the heart Physiol Genomics, January 15, 2003; 12(2): 129 - 138. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Morley Editorial: Citations, Impact Factor, and the Journal J. Gerontol. A Biol. Sci. Med. Sci., December 1, 2002; 57(12): M765 - 769. [Full Text] [PDF] |
||||
![]() |
K. Hopkin More Than a Sum of Our Cells Sci. Aging Knowl. Environ., October 3, 2001; 2001(1): oa4 - 4. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Bartke, K. Coschigano, J. Kopchick, V. Chandrashekar, J. Mattison, B. Kinney, and S. Hauck Genes That Prolong Life: Relationships of Growth Hormone and Growth to Aging and Life Span J. Gerontol. A Biol. Sci. Med. Sci., August 1, 2001; 56(8): B340 - 349. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. A. Miller, A. Galecki, and R. J. Shmookler-Reis Interpretation, Design, and Analysis of Gene Array Expression Experiments J. Gerontol. A Biol. Sci. Med. Sci., January 1, 2001; 56(2): 52B - 57. [Abstract] [Full Text] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
|---|
| All GSA journals | The Gerontologist |
| Journals of Gerontology Series B: Psychological Sciences and Social Sciences | |