Article Text

Download PDFPDF

Genes, race and research ethics: who’s minding the store?
  1. L M Hunt1,2,
  2. M S Megyesi2
  1. 1
    Department of Anthropology, Michigan State University, East Lansing, Michigan, USA
  2. 2
    Center for Ethics and the Humanities in the Life Sciences, Michigan State University, East Lansing, Michigan, USA
  1. L M Hunt, 354 Baker Hall, Michigan State University, East Lansing, MI 48824, USA; huntli{at}


Background: The search for genetic variants between racial/ethnic groups to explain differential disease susceptibility and drug response has provoked sharp criticisms, challenging the appropriateness of using race/ethnicity as a variable in genetics research, because such categories are social constructs and not biological classifications.

Objectives: To gain insight into how a group of genetic scientists conceptualise and use racial/ethnic variables in their work and their strategies for managing the ethical issues and consequences of this practice.

Methods: In-depth semi-structured interviews were conducted with a purposive sample of 30 genetic researchers who use racial/ethnic variables in their research. Standard qualitative methods of content analysis were used.

Results: Most of the genetic researchers viewed racial/ethnic variables as arbitrary and very poorly defined, and in turn as scientifically inadequate. However, most defended their use, describing them as useful proxy variables on a road to “imminent medical progress”. None had developed overt strategies for addressing these inadequacies, with many instead asserting that science will inevitably correct itself and saying that meanwhile researchers should “be careful” in the language chosen for reporting findings.

Conclusions: While the legitimacy and consequences of using racial/ethnic variables in genetics research has been widely criticised, ethical oversight is left to genetic researchers themselves. Given the general vagueness and imprecision we found amongst these researchers regarding their use of these variables, they do not seem well equipped for such an undertaking. It would seem imperative that research ethicist move forward to develop specific policies and practices to assure the scientific integrity of genetic research on biological differences between population groups.

  • race
  • ethnic groups
  • genetics
  • health disparities
  • research ethics

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

The recent surge of interest in human genetic research has included a variety of international collaborative projects, designed to identify genetic variants between population groups which may affect differential disease susceptibility and drug response. Genetic data are being gathered from geographically diverse groups of people, meant to capture variation amongst major continental groups. While the term “race” is most often avoided in these projects, the selected populations are unmistakably coterminous with widely shared notions of major racial groupings, such as European, Asian and African.1 This trend has brought new life to the dated notion that there are important biological differences between racial/ethnic groups and in turn provoked sharp criticism challenging the appropriateness of using racial/ethnic variables in genetics research.

To date, efforts to address potential ethical issues accompanying this line of research have been rather limited in scope; primarily concerned with assuring individual and community informed consent. Still, the potential for ratifying specious negative images of members of the so-called “population groups”, and for fomenting the inaccurate and dangerous notion that racial labels denote biologically distinct groups, remains unresolved. Researchers are encouraged to “be careful” in the language they choose for reporting their findings, to avoid over-generalisations,13 while more concrete principles and policies for the management of these complex issues have yet to developed and implemented.4 5

Important but neglected ethical considerations include questions of the scientific integrity of research using ill-defined racial/ethnic categories as key variables and of the capacity of genetic researchers to avoid promoting false notions of biologised racial difference when reporting findings from such research. In this paper we present a small exploratory study of how racial/ethnic variables are conceptualised, operationalised and interpreted in the work of a group of genetic scientists. We examine a number of contradictions between their understandings of the limitations of racial/ethnic classification and their routine use of these categories in their work. We encourage research ethicists to consider ways that they might productively contribute to the development of an expanded notion of ethically responsible research, one which forestalls inappropriately elevating the scientific legitimacy of popular concepts of inherent racial/ethnic difference.


Requirements for the collection of racial/ethnic data

Since 1993, US federal law has required that women and minorities be included in federally funded biomedical studies, in the interest of assuring equity to all in the potential health benefits. In implementing these requirements, the National Institutes of Health (NIH) has followed the bureaucratically expeditious course of using the racial/ethnic categories of the Office of Management and Budget’s (OMB) Race and Ethnic Standards for Federal Statistics and Administrative Reporting; the same categories that are used in the US census.6 7 While the goal of promoting consistency in data gathering may be served by using the OMB categories in this way, the data are routinely put to other purposes. These include describing population health statistics, risk modelling, health services assessment and being used as proxy markers for unexamined social and biological factors.8 It is the latter use that is of particular concern. The OMB categories, despite their ubiquitous use, were never intended to serve as scientific variables, but instead were honed for bureaucratic and political purposes.9 10 When health indicators are observed to vary by these racial/ethnic categories, researchers seem compelled to offer explanations for those differences, presuming inherent group differences, either cultural or biological, depending on their disciplinary background.11 12 Despite their otherwise rigorous approach to research, genetic scientists, who are also regularly using these categories, are likewise apt to offer such interpretations, tending toward biological determinism or genetic reductionism.1315

Race as a variable in human genetics research

There is much discussion in current health and medical literature mulling the appropriateness of using racial categories in genetics research, questioning the use of not only the OMB categories per se, but of racial/ethnic labels more generally. At least four major journals have recently devoted special issues to the debate (Nature Genetics, November 2004; American Psychologist, January 2005; American Journal of Public Health, December 2005; The Journal of Law, Medicine & Ethics, Fall 2006). A strong critique of the use of racial/ethnic classifications in genetic research has emerged, with many authors arguing that these classifications are at their core cultural in nature, rather than biological: racial groups do not correspond to genetic variation, the distribution of variation is not concordant with these groupings, only a very small percentage (5–10%) of genetic variation can be accounted for by racial classifications and admixture between presumed ancestral populations is neither rare nor recent.10 1626 They emphasise that human genetic variation is continuous, without regard for political boundaries, language or religion.

Many argue that using racial classifications in genetics research is not only of questionable scientific utility, but reinforces the erroneous belief that there are in fact biological differences between races and is unlikely to lead to advances in understanding the genetic causes of disease.2732 Critics further argue that the current emphasis on discovering the genetic causes of racial/ethnic health disparities directs resources away from a focus on social and environmental causes; factors that are well known to underlie much of the unequal distribution of common chronic diseases.10 3335

Research ethics and human genetics research

Much of this debate has been framed in response to the controversial and widely criticised Human Genome Diversity Project, which set out to collect DNA samples from “unique” populations around the world, intending to document and preserve human genetic diversity.36 The project was met with intense criticism of its goals and methods, particularly for disregarding the rights of target populations. Eventually federal support was withdrawn, which all but ended the project.3739

In the interest of identifying and addressing such ethical problems before they emerge, the recent wave of research associated with the high-profile Human Genome Project has included federal support for development of a programme specifically designed to identify the complex ethical, legal and social issues of human genetics research as the research is being generated.40 This undertaking has resulted in a robust discussion of the ethical considerations in conducting genetic research on specific racial/ethnic populations, focusing primarily on issues of informed consent, employment and insurance discrimination, privacy and misuse of personal genetic information.1 25 4145

In addition, there has been much discussion of the potential dangers of inappropriately promoting the notion that racial/ethnic categories have biological reality. Genetic scientists are being encouraged to avoid misinterpretation by using specific local terminology to describe their samples, rather than using common racial/ethnic labels,2 22 46 47 and to include analysis of the socio-economic and other environmental factors underlying race-correlated health differences and thereby expose the very real inequalities that affect the well being of racial group members.10 32 48 49

There has been some limited effort to establish explicit policies to address these issues. In 2000, the journal Nature Genetics published an editorial policy against using racial/ethnic variables unless there is a compelling argument for their inclusion.16 However, looking through articles published in that journal over the past three years, one finds frequent use of common racial terms such as “Caucasian” or “African American”, indicating that the journal may not be consistently enforcing that policy. Sankar et al50 have reported similar findings.

Similarly, the International HapMap project has published explicit instructions for labelling its samples in research publications, providing very specific terminology such as “Yoruba in Ibadan, Nigeria” and “Han Chinese in Beijing, China”.2 However, perusing articles published in the past year reporting analyses of HapMap data, one finds that, while the recommended terminology is usually used in the methods sections, many studies revert to common racial labels such as “HapMap Asians” or “the European population” throughout the rest of the article.

Thus, we see that, while ethical questions abound regarding the use of racial/ethnic variables in genetics research, resolutions are few. Toward the development of more effective policies and practices to address these issues, a more thorough understanding of how a group of genetic scientists think about these controversies and how they respond to them in their work will be insightful.


We conducted interviews with a purposive snowball sample of a cross-section of 30 human genetics researchers who were currently conducting research that included racial/ethnic variables as part of their research design. All were principle investigators, with PhD and/or MD training and were working on a variety of types of studies ranging from population modelling to linkage studies and a variety of diseases including rare inherited diseases and common chronic diseases. Table 1 summarises some characteristics of the sample.

Table 1 Selected characteristics of 30 scientists interviewed

Interviews followed a standardised set of open-ended questions, averaged about two hours and were tape recorded and transcribed. All study participants gave their informed consent to be interviewed, following IRB approved protocols.

We developed an SPSS database of demographic and open-coded variables, to facilitate simple descriptive and correlational analysis51 and also coded interviews into a text-based data analysis program, Atlas-ti (Scientific Software Development GmbH, Berlin, Germany).52 We preformed content analysis on the transcripts, identifying main topical areas and themes covered in the interviews, which were then further refined, in an iterative process, into emergent thematic categories.53 54 All phases of data processing and analysis were cross-checked in conference sessions wherein the research team discussed each case, reviewed emerging findings, honed analysis strategies and reached consensus about the application of coding categories.


In examining the types of racial/ethnic classifications these researchers report using and the criteria and methods they use for classifying their samples, we were struck by how impressively ambiguous and arbitrary these variables are and how little procedural care is taken in applying them. The familiar racial/ethnic labels of the OMB and the US Census (African American, Asian, etc) are also used by the NIH in their reporting requirements. These were by far the categories most commonly used by these researchers. It is noteworthy to consider the strikingly diverse criteria these categories combine, mixing sets of unrelated characteristics such as skin colour, language, or geographic location, as outlined in table 2.

Table 2 Racial/ethnic classification terms from the NIH inclusion enrolment report and types of classification they appear to represent*

Because the labels draw on multiple criteria, they are not mutually exclusive. Decisions need to be made about which characteristics to prioritise in classifying any given case.

However, definitions of these variables were consistently vague or almost entirely absent from the researchers’ responses and they described virtually no explicit principles or criteria for classifying individual cases. Instead, in most cases, these important classificatory decisions were left to the inherently idiosyncratic practice of “self-identification”, adding yet another layer of mystery to these already confounded categories.

We have examined these issues in some detail elsewhere and have concluded that the constructs and procedures behind the racial/ethnic variables commonly used by these genetic scientists plainly do not exhibit a level of rigor one would expect for a major variable in scientific research.55

Utilitarian step in medical progress

It is interesting to note that, although all of these scientists routinely use racial/ethnic labels as a key variable in their own research, none viewed the categories as fully adequate or unambiguously satisfactory. Some called race an “invalid”, “flawed” or “illegitimate” concept, having “no scientific basis”. Several noted that there is “no reliable way to group people” and that “in reality there are no clear-cut groups, but rather the classifications are overlapping”.

Despite this general agreement that racial/ethnic classifications are inadequate, nearly all of the researchers defended them as serving an important purpose that is of too much value to be set aside. This reasoning was commonly couched in terms of an oft-told-tale of “imminent medical progress”, employing a rather circular logic which may be summarised as follows: Racial/ethnic categories are only rough markers for true, underlying biological differences. They are useful in the meantime, until the actual genetic variation itself is identified. Diagnosis and treatment will then be appropriately tailored to the fit genetic variations on an individual basis. A genetic epidemiologist explained:

What we would ultimately like to be able to do is to just move to the genetic factor itself, forget the race and ethnicity as that middle group. So,… ultimately if you find that gene “Y” is an important predictor of hypertension in say, African Americans, you can go back to the Caucasian population and tell all of them who have got that gene variation that they should have this drug or this intervention. Race goes away as an intermediate. But we’re just not there yet and so we use these race and ethnicity sub groups as a way of discovering what might be potentially meaningful.

This utilitarian stance was justified in two distinct ways. In the first, there was a simple proclamation that “race” is a way to classify genetic ancestry. The scientists voicing this perspective matter-of-factly declared that racial groups are “markedly different in their genetic background”. As a biological geneticist put it: “It’s the race bit that you really want to get at when you’re talking about genetics… in terms of what genes have predisposed you to, then race is important”.

The second form of the utilitarian view, while less definitive, is perhaps a bit more cynical. Race, while flawed and imprecise, is useful as a heuristic tool: racial groups are easily identifiable and due to higher concentrations of certain illnesses, they are good groups to use for conducting research on the genetic basis of those diseases. An epidemiologist explained it this way:

People are easily identified as a race or an ethnic group. You can’t really go out and genotype everybody to find who you’re gonna study because the expense would be outrageous.

While acknowledging that this approach is flawed, several researchers rationalised it by drawing on the story of imminent medical progress. A human geneticist expressed this resigned cynicism in these words:

I think it will balance out in the end… in the long run, you won’t care what the person looks like when they come in the door. You would treat them based on their genotype. That’s the goal, but it's going to be a while before that happens. In the mean time decisions are made by the colour of the skin of the person who walks in the door. It’s a very cheap genetic test. It’s a bad test.

Proxy variable

Most of the researchers indicated that race and ethnicity are best understood as surrogate markers or as proxy variables for some other factors that are difficult to identify and measure. In the words of a genetic epidemiologist:

We see differences in incidence and survival by race… I don’t think we want to just ignore it… regardless of what that “race” means. I’m sure it’s a surrogate for lots of things. A surrogate for what, I don’t know. […] You know if we knew what they were surrogates for, we could avoid this whole issue and just measure those things, but I don’t think we’re at that point yet.

The list of factors which the researchers said race/ethnicity likely stands for includes an impressively wide array of non-genetic items. Put crudely, one might say they subsume just about anything and everything aside from biology: access to healthcare, socio-economic status, diet, education, marriage patterns, cultural background, racism, psychological stress, lifestyle, risk factors and so on.

In the parlance of genetic science, such variables are glossed as “environmental factors”, theoretically of considerable interest because they provide the context necessary for gene expression. With the exception of simple, monogenic diseases, genes do not cause disease, but rather affect the susceptibility of an organism to environmental influences. “Genes and environment are inseparable”, said one genetic epidemiologist. Indeed, nearly all those we interviewed discussed at length the importance of considering environmental factors. At the same time, however, we found little indication that these factors were being addressed in a rigorous way. Only a handful mentioned including environmental factors at all in their own research projects and those that did described doing so in extremely simplistic terms. For example: “Everything from poverty, stress, neighbourhoods—whatever” and “We look at environmental stressors at the zip code level, to try to build an integrated model”.

Thus we come to a somewhat frustrating juncture. There is wide agreement amongst these scientists that environmental factors are crucial to understanding how genes might impact the racial/ethnic distribution of disease. However, in place of an earnest effort to untangle the effect of such factors, their research clearly prioritise biological explanations over socio-cultural ones. In practice, even the most thorough of research designs could be characterised as, in the words of one epidemiologist: “Genes with a capital ‘G’ and environment with a small ‘e’”.

Dangers and safeguards

When asked what they thought the implications of this line of research might be for the racial/ethnic groups in question, most of the researchers first invoked the idea of imminent medical progress, listing the intended medical benefits: improved prevention, diagnosis and treatment for the diseases that affect these groups. At the same time, most also named a number of serious potential negative consequences. Many expressed concern that public interpretation of their findings might result in group members being “marked” as genetically inferior. They raised issues of racial profiling, group stereotypes and social labelling. Most also discussed the potential that racialised genetics research could lead to increased discrimination against group members, in employment and health insurance, due to presumptions about their future health. Some suggested that this might open the door to selective sterilisation or other eugenic practices. In the words of a genetic biologist:

We have a very bad history of misusing genetic information and misusing it in a way that has really harmed people. Being involuntarily sterilised is a major thing. Those people do want to have descendants. So there’s a fear that genetic research may promote new ways for people to categorise, label and marginalise other people.

In light of the seriousness of these concerns, the strategies the researchers suggest for minimising potential negative outcomes seem rather weak. For the most part, they defended the status quo, framing their suggestions in terms that presumed the good intentions of research scientists and prioritised preserving their autonomy. For example, one medical oncologist said, “You know we don’t discriminate. I don’t see any downside. I don’t see that there is an enormous risk”.

Still, most of those interviewed acknowledged the potential for “misuse” or “misinterpretation” of genetic studies of racial/ethnic groups, but thought that these potential problems could be managed simply by researchers themselves “being careful” in how they report findings. Nearly all mentioned “education” as key to assuring research findings do not negatively impact group members. However, they were rather vague about who should be educating whom about what. They framed the problem as one of finding ways to communicate in “clear and understandable terms”, to various target groups, including “the community”, “the public”, “politicians”, “clinicians” and “the media”. Particularly interesting were those concerned with educating “the community”. They focused on promoting better understanding of basic genetic concepts, such as mutation and susceptibility, not only to discourage discrimination, but also to show community members the potential benefits of genetic research and thereby increase racial/ethnic groups’ willingness to participate in research and testing.

In their discussions of strategies for managing potential problems of racial-genetics research, only two researchers acknowledged the problematic nature of the variables themselves. (Incidentally, both were themselves minority group members.) One said that comparisons between racial groups should be avoided, in order to promote appreciation of the heterogeneity within groups. The other, raised a lone voice questioning whether genetic scientists have the appropriate background and training to understand and manage the implications of using racial variables in their work:

We need better training of scientists… to have a deeper anthropological and linguistic and historical understanding of group formation and group identity and then a better understanding of the social experiences of people. […] So that when they are designing a study and when they are interpreting the results they know the social implications of what they are doing. Most of us say we are just doing science and “hands off” the possible consequences of our findings. […] But the long term consequence for the group could be stigmatisation, or the denial of what society really needs to do to resolve the very, very serious health issues of the people.

Thus, this comment raises an important question: is it reasonable to assume that genetic scientists have the necessary expertise in important issues of group identity and social history, to understand and manage the implications of their work for people carrying racialised identities in our society?


While this is a small, exploratory study, not designed to allow generalisation beyond those we interviewed, our findings raise a number of important issues regarding the routine use of race/ethnicity as a variable in genetics research and about ethical oversight of this practice. In analysing these interviews, we were struck by how arbitrary and unsystematic the racial/ethnic categories routinely used in health research are and how little procedural care these researchers report in applying them. Furthermore, we have seen that, while the researchers were well aware of the limitations of racial/ethnic classifications as scientific variables, like most health researchers in the US, they routinely use and interpret them in their research. While acknowledging that this practice may have significant negative implications for racial/ethnic group members, most defended their use in pragmatic terms. Framed as an interim step in a march toward imminent medical progress, they dismiss the scientific inadequacy of such poorly defined variables and minimise the potential social costs of this line of research, as manageable, temporary bumps on a road to the anticipated medical breakthroughs. Reliance on racial classification was defended by most as a convenient and efficient proxy variable, useful until we discern the underlying genetic diversity it stands for, en-route to developing truly personalised medicine. In place of overt efforts to address the inadequacies and problems of using racial/ethnic variables, many simply expressed confidence that it is in the nature of science for truth to be revealed.28 56 Despite temporarily relying on flawed techniques, they hold that science will inevitably correct itself.

Such tolerance of the routine use of variables widely recognised to be scientifically inadequate raises important issues regarding the management of error and negligence in science. In addition to the serious potential impact of essentialising racial/ethnic groups as biologically distinct, are questions of whether use of such poorly defined variables represent breaches of scientific principles such as reproducibility, comparability and external and internal validity.28

Despite the vociferous critique that has been raging in professional journals for the past several years, challenging the legitimacy and consequences of using racial/ethnic variables in genetics research, ethical oversight of these endeavours has been left almost completely to genetic researchers themselves, or to the editors of the medical and genetics journals who publish their findings. Given the general vagueness and imprecision we found amongst the researchers we interviewed concerning the meaning, nature and employment of racial/ethnic variables, they do not seem to be well equipped for such an undertaking.

Indeed, the ineffectiveness of this approach has been documented by several recent studies. For example, one recent literature review found that articles reporting associations between race, genotype and health outcomes most often include no explanation of criteria used to assign the race/ethnicity of subjects.57 Another study, based on interviews with the editorial staff of a number of prominent genetic journals, found that the editors have not become engaged with critiques regarding the inappropriateness of racial/ethnic variables for genetics research and instead view their use in genetic science as separate from social science concerns.15 58 Similarly, a discourse analysis of recent publications discussing the debate about use of race in the Human Genome Project, concludes that the concept of race as a biological entity remains prevalent, bolstered by rhetorical claims that scientific truth will eradicate any racist interpretations of these notions.56

Policy responses to questions of the ethical conduct of research in this domain, so far have primarily been concerned with privacy issues, insurance and employment discrimination, and community informed consent.1 44 The larger issues of the scientific integrity of the body of work produced when using poorly conceptualised variables remains essentially unaddressed by current regulations. Some have called for more restrictive legislation to prohibit the use of race as a biological variable and/or requiring the social correlates of race be examined in order to address underlying inequality.4 5 28 59 Shields et al10 pose this question:

If race variables in fact function as sponge variables that reflect a host of unmeasured factors that do affect one’s health but do not provide the information needed to address health disparities, is there not an ethical obligation to attempt to identify and measure these factors directly?

Science, no matter how well intentioned or well regulated, is a product of the dominant assumptions of the era in which it is produced: histories of racially based inequality have long been accompanied by histories of racialised science. To unravel these intertwined histories requires rigorous thinking and will not be accomplished by treating racial/ethnic classifications as a shortcut to biological and social factors.15 32 60 61

The potential negative consequences of tolerating the haphazard use and reporting of racial/ethnic variables in genetic research are serious and range from production of unreliable scientific findings, to reification of the basest notions of inherent racial difference. To leave the management of these consequences to simply “being careful” in the terminology selected for reporting findings, seems clearly inadequate.62 Race is indeed a proxy. It signifies all kinds of things in our racialised society. It is a complex, multifaceted construct that is saturated with meanings; meanings which have real consequences for the wellbeing of real people. It would seem imperative that we seek effective ways to reframe the questions ethicist are asking about the appropriateness of genetic studies using race/ethnicity variables, in order to directly address the question of the scientific integrity of our emerging genetic science.


This research was supported by the National Institute of Health National Center for Human Genome Research through grant #HG2299-05. We wish to thank the researchers we interviewed, whose kind cooperation made this research possible. J Bielo, N Truesdell and D Vacanti provided invaluable assistance with a variety of data analysis and literature review tasks. We also thank J Davis and H Brody for many thoughtful conversations which were seminal to the development of this argument.



  • Competing interests: None.

  • Ethics approval: All study participants gave their informed consent to be interviewed following protocols approved by the Human Subject Protection Programs at Michigan State University.