Skip to content

Analytical validity - sensitivity and specificity

The term analytical validity refers to how well a test predicts the presence or absence of a particular genetic change or a group of changes. To assess the analytical validity of the genes that will be included as part of the Newborn Screening Pipeline we aimed to estimate both sensitivity and specificity. The acceptance criteria, analysis and results are documented in the pipeline analytical validity reports page, accessible to Genomics England employees only.

Sensitivity

80.0% estimate

The rarity of conditions included in the Generation study means that it is not possible to directly assess sensitivity for each gene as we do not have enough true positives to calculate this. Therefore we assessed this in two ways:

  • We used data quality metrics such as coverage and CNV callability to reflect how well clinically relevant variants are expected to be detected in a particular gene. Coverage was assessed in 3,047 female samples that were aligned with DRAGEN v4.0.4 that are not expected to be enriched for rare genetic disease. Coverage was calculated across samples in the relevant transcripts (as described in the LOF algorithm section: MANE clinical/select otherwise Ensembl canonical) where Mapping Quality >10, Base Quality > 30 and soft clipped reads removed. Mean, median and proportion of gene >15X coverage was calculated per transcript and per exon. CNV callability was assessed by the proportion of the gene that falls into regions of increased sequence homology which are excluded by the DRAGEN 4 CNV caller. Genes that had compromised coverage and CNV callability across most of the gene were removed from the Newborn gene list.
  • We estimated sensitivity across 691 samples in the NHS GMS and 100k datasets that have a reported diagnostic variant in one of the genes in the Newborn Superpanel. We only included samples where the diagnostic variants align with the MOIs included in the panel. These samples were run through the NSP analysis workflow and the diagnostic variants were found to be prioritised in 551/691 samples yielding an overall sensitivity estimate of 80.0%. The variants that were not prioritised by the pipeline were primarily missense variants that were not present in ClinVar and QIAGEN. CVA inclusion was not considered for prioritisation in this analysis as this was the source of the data. There are some limitations to this analysis:

  • Special caller outputs were not generated for this test dataset.

  • CNVs were not included in this analysis as they are not supported in CVA.
  • We did not examine whether the diagnostic conditions in NHS GMS dataset fit the criteria of screening for newborns (early onset, treatability, strong penetrance of the condition).
  • There is ascertainment bias in what genes are included in the NHS GMS service and so we may not be capturing the sensitivity across the whole set of genes.

Specificity

97.3% estimate

  • Variant calling specificity was estimated in an analysis of 5,855 genomes of individuals not expected to be enriched for rare genetic disease aligned with DRAGEN v4.0.4. These samples were composed of samples with a mix of ancestries expected to reflect the British population. The set includes the subset of females as described in the data quality analysis for the sensitivity assessment complimented with the subset of males.
  • These samples were run through variant prioritisation in the Newborn Screening Pipeline with panel info from the Newborn Superpanel, and specificity was estimated to be 97.3% under the assumption that all samples with prioritised variants are false positives.
  • Gene and variant specificity were also calculated and these analyses informed decisions on what variants should be added to the exclusion list to reduce false positives (eg. sequencing artefacts, non-reportable variants) as well as what mode of inheritances should be included for certain genes.
  • As mentioned in the limitations of variant prioritisation, genes with multiple modes of inheritances can lead to prioritisation of carriers of variants only relevant when biallelic.