Immune Repertoire

Nigel R. A. Beeley, Ph.D.

The immune systems within most vertebrates have both an innate component and an adaptive component (1). The innate component, often thought of as hereditary, consists of a complex mix of anatomical barriers to pathogens as well as cellular responses and is typically ready for immediate action (2). The adaptive component (3), which has a relatively long response time and is usually referred to as the immune repertoire, is produced by B-cells and T-cells to help with the recognition and response to various external insults such as viruses, bacteria, worms, parasites and related micro-organisms, unwelcome chemicals and toxins, as well as modified “self” cells such as cancerous cells. There are two key components of the immune repertoire: the immunoglobulins (AKA antibodies), constructed from two B-cell derived proteins, the so called heavy and light chains; and the T cell receptors constructed from four T-cell derived proteins, labeled alpha, beta, gamma and delta, although only alpha and beta actually vary (4).

Some years ago a major question in the world of immunology was how the immune system generated a large repertoire of both immunoglobulins from the B-cells and T-cell receptors on the T-cells where the receptor is the topological equivalent to an antigen-binding fragment of the antibody (5). This was a particularly daunting question to answer since, at the time, researchers believed that DNA sequences were fixed and struggled to envision how they could be changed or modified to produce a potentially large array of an estimated maximum of 1011 variants of immunoglobulin and 1011 T-cell receptor sequences. An interesting aside here is that this belief or dogma should not be confused with the “Central Dogma of Molecular Biology”, a lasting example of poor choice of English by an esteemed British researcher, Francis Crick (6). The answer was what is now known as V(D)J recombination (7), the underlying genetic principle by which antibody diversity is generated and for which Susumu Tonegawa was awarded the Nobel Prize in Physiology or Medicine in 1987 (8). Heavy and light chain genes contain multiple copies of three different types of gene segments which define the variable (CDR) regions of antibodies. The human immunoglobulin heavy chain region contains 2 constant (C) and 44 variable (V) gene segments, as well as 27 diversity (D) and 6 joining (J) gene segments. The light chains also possess a similar set of C, V and J segments but are lacking in D. DNA rearrangement allows the generation of roughly 3×1011 combinations, one per B-cell, although some are removed due to self-reactivity. T-cell receptor genes have a similar set of C, V, D and J gene segments that can undergo rearrangement in a similar way to generate a library of T-cell receptors (1), (7).

When the end-point of such activities is examined, the number of observable circulating antibodies is typically a much smaller number, in the 1,000’s, as is the number of observable T-cell receptors. That is because somatic (sometimes called affinity) maturation occurs (9) and the maturation pathways can go through several thousand antibody variants until reaching an end point of a handful of antibodies which have the desired affinity and selectivity to effectively combat the original insult that has kicked off the immune response. Next generation gene-sequencing methods have recently become invaluable in investigating the immune repertoire, largely due to increases in read length of DNA sequences obtained in a typical shotgun sequencing protocol. The longer the read length, the easier it is to tease out the maturation pathway. With short read lengths below 250 bases it is almost impossible to figure out where a handful of amino acid changes are happening against an essentially identical background of constant regions.

A proof of concept illustration of this is some of the anti-HIV antibody work performed by the NIH vaccine research group a few years ago. The world of HIV infection has a small number of people, around 100 worldwide, called long-term nonprogressors (10), people who have been HIV positive for as long as 30 years or more but have never progressed to full blown AIDS and, furthermore, have never been treated for HIV infection. The blood and tissue samples of these long-term non-progressors are of great interest to the medical community since they may lead to novel and improved treatments for HIV as well as contributing to vaccine design. Classical methods of isolation of immunoglobulin fractions from the blood, followed by identification of individual antibodies are notoriously difficult, requiring multiple large samples, and for 20 years progress was slow. More recently at the NIH, three broadly neutralizing antibodies (VCR01, VCR02 and VCR03) were isolated from non-progressors using an elaborate trapping experiment with antigenically resurfaced HIV-1 envelope glycoproteins, followed by immunoglobulin gene expression of the captured individual B cells (11). They then set out to achieve a similar end result “ de novo” using first generation Roche 454 pyrosquencing, pushed to the limits of its capabilities for sequencing length, in conjunction with an custom bio-informatics suite, to elucidate heavy and light chain maturation pathways and then stitch the heavy and light chains together resulting in a handful of immunoglobulin sequences (in the computer) which might explain the non-progressor status of the donor blood samples (12). Those sequences were then processed, and monoclonal antibodies prepared in the laboratory and tested against different strains of HIV virus. Several antibodies with broadly neutralizing profiles to the various HIV strains which are now known to be prevalent were identified and the original VCR01 and several VCR01 relatives are being examined in the clinic.

This newly identified pathway to investigate the immune repertoire is now beginning to be helpful in examining a range of healthcare issues and questions. Recent progress in the immuno-oncology is one such area, where a small number of cancer patients are being successfully treated with chimeric antigen receptors (providing immunoglobulin specificity on transplanted T-cells) (13). This kind of approach can be investigated more thoroughly by looking at details of the immune repertoire. The basic observation here, even if denied by the originators of the therapies, is that the clinical results far exceed expectations based on the known biochemical mechanisms alone. In other words, a lot more is going on here than simply a one product/one target process. Another example is that of new approaches to metastatic melanoma treatment) (14).  Eight new drugs have been FDA approved for the treatment of melanoma, including four immunotherapies and four targeted therapies. The immunotherapy drugs are ipilimumab (Yervoy®), pembrolizumab (Keytruda®), nivolumab (Opdivo®), and talimogene laherparepvec (T-VEC, Imlygic™) (15). The targeted therapies are checkpoint inhibitors that “take the brakes off” the immune system and enable it to fight cancer include vemurafenib (Zelboraf®), dabrafenib (Tafinlar®), trametinib (Mekinist®), and lastly an oncolytic virus therapy cobimetinib (Cotellic®) (15). These drugs are thought to target common genetic mutations, such as the BRAFV600 mutation, found in a subset of melanoma patients. The observed successes of immunotherapy in metastatic melanoma cases compared to small molecule drugs suggests that more is going on here than the original mechanistic hypothesis (16), and that new approaches based on investigating the immune repertoire of these patients will reveal ways to improve and generate better treatments.

It should also be noted that second and third generation sequencing technologies are now available to read longer and longer sequences, ranging from 1,000 to over 11,000 bases per read, thus simplifying some of the bio informatics required (17).

How much data collection, collation, analysis, and interpretation will be involved in examining the immune repertoire response for a given project ? (18) Well, start with a human genome, per treatment regime, per patient, per sample versus time. In other words, an extremely large number, the sorts of numbers that only a fully annotated relational database, such as CDD Vault, can elegantly manage.


1) Owen J, Punt J, Stranford S (2013) “Kuby Immunology 7th Edition” pub. W. H. Freeman; ISBN-10: 142921919X, ISBN-13: 978-1429219198
5) Williamson AR. (1979) “Roy Cameron lecture. Control of antibody formation: certain uncertainties” J Clin Pathol Suppl (R Coll Pathol). 13:76-84. PMID: 391829
8) Brack, C., Hirama, M., Lenhard-Schuller, R., and Tonegawa, S. (1978). “A complete immunoglobulin gene is created by somatic recombination” Cell, 15 (1):1-14.
Tonegawa S, Brack C, Hozumi N, Pirrotta V. (1978) “Organization of immunoglobulin genes” Cold Spring Harb Symp Quant Biol. 42 Pt 2:921-31. PMID: 98276
11) Wu X, Yang ZY, Li Y, Hogerkorp CM, Schief WR, Seaman MS, Zhou T, Schmidt SD, Wu L, Xu L, Longo NS, McKee K, O’Dell S, Louder MK, Wycuff DL, Feng Y, Nason M, Doria-Rose N, Connors M, Kwong PD, Roederer M, Wyatt RT, Nabel GJ, Mascola JR. (2010) “Rational design of envelope identifies broadly neutralizing human monoclonal antibodies to HIV-1” 329 (5993):856-861. PMID: 20616233
12) Zhu J, Ofek G, Yang YP, Zhang BS, Louder MK, Lu GL, McKee K, Pancera M, Skinner J, Zhang ZH, Park R, Eudailey J, Lloyd KE, Blinn J, Alam SM, Haynes BF, Simek M, Burton DR, Koff WA, Mullikin JC, Mascola JR, Shapiro L, and Kwong PD. (2013) “Mining the antibodyome for HIV-1-neutralizing antibodies with next-generation sequencing and phylogenetic pairing of heavy/light chains” Natl. Acad. Sci. USA., 110 (16):6470-6475.
Zhu J, Wu, X, Zhang B, McKee K, O’Dell S, Soto C, Zhou T, Casazza JP; NISC Comparative Sequencing Program, Mullikin JC, Kwong PD, Mascola JR, Shaprio L. (2013). “De Novo identification of VRC01 class HIV-1-neutralizing antibodies by next-generation sequencing of B-cell transcripts” Proc Natl Acad Sci USA, 110 (43):E4088-97.
He L, Sok D, Azadnia P, Hsueh J, Landais E, Simek M, Koff WC, Poignard P, Burton DR, and Zhu J. (2014). “Toward a more accurate view of human B-cell repertoire by next-generation sequencing, unbiased repertoire capture and single-molecule barcoding” Sci. Rep. 4:6778.
16) For an interesting TED talk on new thinking in cancer see:
18) Kwong PD, Chuang GY, DeKosky BJ, Gindin T, Georgiev IS, Lemmin T, Schramm CA, Sheng Z, Soto C, Yang AS, Mascola JR, Shapiro L. (2017) “Antibodyomics: bioinformatics technologies for understanding B-cell immunity to HIV-1” Immunol Rev. 275 (1):108-128. PMID: 28133812

This blog is authored by members of the CDD Vault community. CDD Vault is a hosted drug discovery informatics platform that securely manages both private and external biological and chemical data. It provides core functionality including chemical registration, structure activity relationship, chemical inventory, and electronic lab notebook capabilities.

CDD Vault: Drug Discovery Informatics your whole project team will embrace