Loss of function (LOF) prioritisation
The LOF algorithm is designed to identify variants with the potential to cause LOF in a gene in which LOF is a known mechanism for pathogenicity. Copy number variants are only prioritised through the LOF algorithm.
Small variants
SNVs or small indels needs to satisfy all of the following criteria:
- Annotated with high impact consequence type in MANE Select, MANE Clinical or predefined transcript where MANE v1.0 transcripts are unavailable. The predefined transcript, in the absence of MANE Select or MANE Clinical, will be annotated in the Newborn Panel based on Ensembl’s “canonical” status and evaluated using published literature, with gene-specific documentation (Genomics England employees only). Sequence ontology terms to classify high impact are described in the table below:
Sequence Ontology Term | Sequence Ontology ID |
---|---|
Transcript ablation | SO:0001893 |
Splice acceptor variant | SO:0001574 |
Splice donor variant | SO:0001575 |
Stop gained | SO:0001587 |
Frameshift variant | SO:0001589 |
Stop lost | SO:0001578 |
Initiator codon variant | SO:0001582 |
Start lost | SO:0002012 |
Table 2: Sequence ontology terms included in LOF prioritisation
- Allele frequencies in all population datasets are below thresholds as specified in Table 2. For MNVs, the allele frequencies of at least one of the decomposed variants is below all thresholds.
- In a gene marked in PanelApp as a gene for which loss-of-function prioritisation needs to be applied.
Dataset | Population | Dataset size (individuals) | Monoallelic (dominant) | Biallelic (recessive) |
---|---|---|---|---|
Internal | Mixed | 5,855 | 0.001 | 0.01 |
GNOMAD_GENOMES | African/African American | 20,744 | 0.0005 | 0.01 |
GNOMAD_GENOMES | Latino/Admixed American | 7,647 | 0.001 | 0.01 |
GNOMAD_GENOMES | Ashkenazi Jewish | 1,736 | 0.003 | 0.01 |
GNOMAD_GENOMES | East Asian | 2,604 | 0.002 | 0.01 |
GNOMAD_GENOMES | European (Finnish) | 5,316 | 0.001 | 0.01 |
GNOMAD_GENOMES | Middle Eastern | 158 | 0.1 | 0.1 |
GNOMAD_GENOMES | European (non-Finnish) | 34,029 | 0.0005 | 0.01 |
GNOMAD_GENOMES | South Asian | 2,419 | 0.002 | 0.01 |
GNOMAD_EXOMES | Latino/Admixed American | 17,296 | 0.0005 | 0.01 |
GNOMAD_EXOMES | Ashkenazi Jewish | 5,040 | 0.001 | 0.01 |
GNOMAD_EXOMES | East Asian | 9,197 | 0.001 | 0.01 |
GNOMAD_EXOMES | European (Finnish) | 10,824 | 0.001 | 0.01 |
GNOMAD_EXOMES | European (non-Finnish) | 56,885 | 0.0005 | 0.01 |
GNOMAD_EXOMES | South Asian | 15,308 | 0.001 | 0.01 |
Table 3: Allele frequency thresholds for gnomAD populations in LOF prioritisation (gnomAD Genomes v3.1.2, gnomAD Exomes v2.1.1).
Copy number variants
Copy number variants (CNVs) detected using the DRAGEN CNV workflow with self-normalisation and the Shifting Level Models (SLM) segmentation mode can be prioritised through the LOF algorithm. High quality CNVs >10 kb in size are defined as those detected in the DRAGEN CNV workflow with filter status PASS
. CNVs between 2 and 10 kb in size are identified by combining the results of the DRAGEN CNV workflow and the DRAGEN SV caller. CNVs in this range detected by both with a minimum reciprocal overlap of 50% are deemed to be high quality and filter status is set to PASS
in the subsequent "enhanced" CNV files.
Several factors complicate the assessment of allele frequencies for CNVs:
- The breakpoints of CNV calls based on sequence coverage are imprecise, and therefore the same variant can have different breakpoint coordinates in different individuals.
- Large CNVs can be reported as several separate calls (i.e., fragmented calls). This is often due to a copy number change within the region of a large CNV, for example, due to a smaller nested CNV or a complex structural rearrangement.
- Distinguishing between different combinations of alleles that can give rise to the same diploid copy number is challenging. For example, a copy number of 3 could be the result of a tandem duplication with 2 copies on one allele and a single copy on the other allele, or a tandem duplication with 3 copies on one allele and a deletion on the other allele, or two single copy alleles with an additional copy elsewhere else in the genome.
- The accuracy of the copy number inference for gain variants with more than 3 copies is not validated.
In the Newborn Screening Pipeline CNV frequencies are calculated using 5,757 germline samples from unrelated individuals (participants in the Cancer program of the 100,000 Genomes Project and the COVID-19 research project).
The reciprocal overlap method uses an 80% reciprocal overlap threshold between an identified CNV and any CNVs identified in the reference frequency set. Any variants with this reciprocal overlap are regarded as the same variant for frequency calculations. A limitation of this method is that the frequency may be inaccurate in the event of CNV fragmentation, i.e., fragmented calls can inappropriately appear to be rare.
For LOSS variants, allele frequencies are calculated and reported. For GAIN variants, due to difficulties in determining the exact copy number and defining the alleles in all individuals, the proportion of individuals with any GAIN call is calculated and reported, not taking copy number into account.
A CNV needs to satisfy all of the following criteria:
- The CNV +/- 2kb must overlap the coding region of the gene if the gene with LOF pathogenicity is coding, or for non-coding genes, must overlap the transcript.
- For CNV gains, both exact breakpoints of the variant called by DRAGEN must occur within the same transcript.
-
The CNV must be rare in the internal reference frequency data, calculated using reciprocal overlap frequency. Exact thresholds:
- CNV losses in dominant mode of inheritance genes must have frequency < 0.001
- CNV losses in recessive mode of inheritance genes < 0.005
- CNV gains in dominant mode of inheritance genes < 0.002
- CNV gains in recessive mode of inheritance genes < 0.01
- The CNV matches the condition-associated mode of inheritance for LOF mode of pathogenicity.
CNV breakpoints
2kb flanking bases are added to the called CNV coordinates in order to reduce the chance of erroneously missing overlaps with relevant transcript regions due to breakpoint calling inaccuracies. As a result, in some cases a CNV may be prioritised even if the called breakpoints do not overlap an appropriate region. More information on reviewing CNVs can be found in the Variant Review SOP.