Loss of function (LOF) prioritisation

The LOF algorithm is designed to identify variants with the potential to cause LOF in a gene in which LOF is a known mechanism for pathogenicity. Copy number variants are only prioritised through the LOF algorithm.

Small variants

SNVs or small indels needs to satisfy all of the following criteria:

Annotated with high impact consequence type in MANE Select, MANE Clinical or predefined transcript where MANE v1.0 transcripts are unavailable. The predefined transcript, in the absence of MANE Select or MANE Clinical, will be annotated in the Newborn Panel based on Ensembl’s “canonical” status and evaluated using published literature, with gene-specific documentation (Genomics England employees only). Sequence ontology terms to classify high impact are described in the table below:

Sequence Ontology Term	Sequence Ontology ID
Transcript ablation	SO:0001893
Splice acceptor variant	SO:0001574
Splice donor variant	SO:0001575
Stop gained	SO:0001587
Frameshift variant	SO:0001589
Stop lost	SO:0001578
Initiator codon variant	SO:0001582
Start lost	SO:0002012

Table 2: Sequence ontology terms included in LOF prioritisation

Allele frequencies in all population datasets are below thresholds as specified in Table 2. For MNVs, the allele frequencies of at least one of the decomposed variants is below all thresholds.
In a gene marked in PanelApp as a gene for which loss-of-function prioritisation needs to be applied.

Dataset	Population	Dataset size (individuals)	Monoallelic (dominant)	Biallelic (recessive)
Internal	Mixed	5,855	0.001	0.01
GNOMAD_GENOMES	African/African American	20,744	0.0005	0.01
GNOMAD_GENOMES	Latino/Admixed American	7,647	0.001	0.01
GNOMAD_GENOMES	Ashkenazi Jewish	1,736	0.003	0.01
GNOMAD_GENOMES	East Asian	2,604	0.002	0.01
GNOMAD_GENOMES	European (Finnish)	5,316	0.001	0.01
GNOMAD_GENOMES	Middle Eastern	158	0.1	0.1
GNOMAD_GENOMES	European (non-Finnish)	34,029	0.0005	0.01
GNOMAD_GENOMES	South Asian	2,419	0.002	0.01
GNOMAD_EXOMES	Latino/Admixed American	17,296	0.0005	0.01
GNOMAD_EXOMES	Ashkenazi Jewish	5,040	0.001	0.01
GNOMAD_EXOMES	East Asian	9,197	0.001	0.01
GNOMAD_EXOMES	European (Finnish)	10,824	0.001	0.01
GNOMAD_EXOMES	European (non-Finnish)	56,885	0.0005	0.01
GNOMAD_EXOMES	South Asian	15,308	0.001	0.01

Table 3: Allele frequency thresholds for gnomAD populations in LOF prioritisation (gnomAD Genomes v3.1.2, gnomAD Exomes v2.1.1).

Copy number variants

Copy number variants (CNVs) detected using the DRAGEN CNV workflow with self-normalisation and the Shifting Level Models (SLM) segmentation mode can be prioritised through the LOF algorithm. High quality CNVs >10 kb in size are defined as those detected in the DRAGEN CNV workflow with filter status PASS. CNVs between 2 and 10 kb in size are identified by combining the results of the DRAGEN CNV workflow and the DRAGEN SV caller. CNVs in this range detected by both with a minimum reciprocal overlap of 50% are deemed to be high quality and filter status is set to PASS in the subsequent "enhanced" CNV files.

Several factors complicate the assessment of allele frequencies for CNVs:

The breakpoints of CNV calls based on sequence coverage are imprecise, and therefore the same variant can have different breakpoint coordinates in different individuals.
Large CNVs can be reported as several separate calls (i.e., fragmented calls). This is often due to a copy number change within the region of a large CNV, for example, due to a smaller nested CNV or a complex structural rearrangement.
Distinguishing between different combinations of alleles that can give rise to the same diploid copy number is challenging. For example, a copy number of 3 could be the result of a tandem duplication with 2 copies on one allele and a single copy on the other allele, or a tandem duplication with 3 copies on one allele and a deletion on the other allele, or two single copy alleles with an additional copy elsewhere else in the genome.
The accuracy of the copy number inference for gain variants with more than 3 copies is not validated.

In the Newborn Screening Pipeline CNV frequencies are calculated using 5,757 germline samples from unrelated individuals (participants in the Cancer program of the 100,000 Genomes Project and the COVID-19 research project).

The reciprocal overlap method uses an 80% reciprocal overlap threshold between an identified CNV and any CNVs identified in the reference frequency set. Any variants with this reciprocal overlap are regarded as the same variant for frequency calculations. A limitation of this method is that the frequency may be inaccurate in the event of CNV fragmentation, i.e., fragmented calls can inappropriately appear to be rare.

For LOSS variants, allele frequencies are calculated and reported. For GAIN variants, due to difficulties in determining the exact copy number and defining the alleles in all individuals, the proportion of individuals with any GAIN call is calculated and reported, not taking copy number into account.

A CNV needs to satisfy all of the following criteria:

The CNV +/- 2kb must overlap the coding region of the gene if the gene with LOF pathogenicity is coding, or for non-coding genes, must overlap the transcript.
For CNV gains, both exact breakpoints of the variant called by DRAGEN must occur within the same transcript.
The CNV must be rare in the internal reference frequency data, calculated using reciprocal overlap frequency. Exact thresholds:
- CNV losses in dominant mode of inheritance genes must have frequency < 0.001
- CNV losses in recessive mode of inheritance genes < 0.005
- CNV gains in dominant mode of inheritance genes < 0.002
- CNV gains in recessive mode of inheritance genes < 0.01
- The CNV matches the condition-associated mode of inheritance for LOF mode of pathogenicity.

CNV breakpoints

2kb flanking bases are added to the called CNV coordinates in order to reduce the chance of erroneously missing overlaps with relevant transcript regions due to breakpoint calling inaccuracies. As a result, in some cases a CNV may be prioritised even if the called breakpoints do not overlap an appropriate region. More information on reviewing CNVs can be found in the Variant Review SOP.