Wednesday, August 28, 2013

Cystic fibrosis, genetic variation and gene function

We wrote last week about the difficulty in assessing gene function, including determining whether a given genetic variant is in fact responsible for a disease or trait. The ever-decreasing cost of DNA sequencing will add to the rapidly growing set variants of undetermined function -- some will be associated with a disease, but some will have no effect.  This is not only a challenge to human geneticists trying to characterize gene function, but also to clinicians trying to explain, predict or treat disease. Last week's post was in the context of X-linked intellectual disorders, but the problem is ubiquitous.  Today we write about cystic fibrosis.

More than 2000 variants in the cystic fibrosis transmembrane conductance regulator gene (CFTR) have been linked to the disease.  About 70% of Europeans with cystic fibrosis have 2 copies of a 3 base deletion, called ΔF508 because the deletion occurs at position 508 in the gene.  This deletion causes a particular piece of the CFTR protein not be made when the protein is being synthesized, and this leads to an abnormal protein, and disease. That there can be one predominant allele in this 'recessive' disease is consistent with population genetics theory and also with the early discovery of the gene--because a high fraction of cases have two copies of the allele (or at least one copy, along with some other variant) made it findable with the techniques of a generation ago.

Why this deletion causes the particular symptoms of cystic fibrosis is well-understood, but why most of the remaining 2000 variants cause disease has not been demonstrated. Indeed, most of them are quite rare and since this disease is "recessive" cause little if any effect on their own.  Now, a new paper in Nature Genetics ("Defining the disease liability of variants in the cystic fibrosis transmembrane conductance regulator gene," Sosnay et al.) addresses this question.

Data collected for the study; Sosnay et al., Nature Genetics, 8/25/13
Sosnay et al. collected phenotype and genotype data from nearly 40,000 people with cystic fibrosis in Europe and North America.  Phenotypes vary in how severe they are and whether, for example, they involve just lung and breathing issues or also obstruct pancreatic ducts causing other problems, and so on.

In this sample, 159 CFTR variants were at a frequency greater than 0.01%, accounting for 96% of the cases--that is, of people known from symptoms to be affected, even if the severity is not uniform.  Of these, they confirmed that 80% met the clinical and functional definition of cystic fibrosis.  Information from about 2000 fathers of people with CF allowed the investigators to determine that a handful of the remaining variants seemed to have no effect, and the effect of another handful is still undetermined.  That the investigators looked for effect in cases' fathers is somewhat curious if the disease is recessive, because one would expect no effect--but see below.

This study contributes new and useful information to knowledge of the genetics of the disease.  Understanding whether, or better, how, a variant causes disease is important for determining carrier status in prospective parents, and can be an important part of understanding how to treat the disease. But the task is never-ending, as new variants arise with every meiosis -- most will be benign, but some will not.

Some 'meta' explanation of this approach
Once the ΔF508 allele was found, it was shown to have a strong effect in carriers of two copies.  This was consistent with the idea that CF is "recessive", and in a kind of circular strategy this is what was found because the variant is so common.

But then other patients were found who had only one copy of that allele, plus a not-normal sequence in their other copy.  Because the trait was assumed to be "recessive", this other variant gets incorporated into the data base as if it's causal.  This is a kind of circular reasoning gone another step: if the disease is assumed to be recessive, then the variants in both the patient's copies of the gene must be causal!

In some cases, the nature of both mutations in the gene, in terms of where in the protein structure the variant occurred was able to confirm a likely causal effect.  This is consistent with the current paper.  However, it was obvious that the phenotypes were not all the same, and many if not most individual had two different variants.  That is not what "recessive" is classically supposed to mean--that is, two copies of the 'bad' variant.  Instead, we have a quantitative relationship between the diploid genotype (that is, both copies of the gene) and the severity of the trait.

If this assumption is wrong, it could be that one or even both variants the patient has are not themselves causal, but only causal in the context of some other site(s) in the genome that interact with the CFTR gene, or even that have similar effects on their own.  So to continue to use the term we esssentially get from Mendel and his peas ("recessive") when what we really have is a quantitative relationship between genotype and phenotype, that may not even always involve the same gene, is an example of a theory lasting beyond the evidence, because investigators 'want' the trait to fit the simple model.  The persistence of the simple model shows our addiction to it and the historical legacy by which terms and concepts cling on when they should be modified--or abandoned.  This is a common iassue that applies to many purportedly single-gene traits.  It's why we put the word in quotes in this post.

 Of course when a variant is very rare, and is generally seen in patients who also carry one of the strong-effect alleles, we have almost no way to test whether that variant is causal or not.  If it hits a known major part of the gene, some severity correlations can be identified--and this has been known since about 1990.  The current paper provides some larger-scale documentation, but really is confirming what has long been well known.  Naturally, it is no surprise that the authors could not attribute specific mechanism to the almost 2000 genes in their study, many of which will be singletons, and why they'll be so hard to confirm.  This requires enough samples, or some clear form of experimental evidence--difficult to obtain for many different alleles of various types--to get statistical evidence to show tha the specific variant is involved.

So here we see an example of the interplay between sampling, theory, inference, and the challenging problem of understanding genetic causation--even in one of the classical 'simple' diseases.

No comments: