Extract coordinate specific columns on a separate sheet (Cleanup). The challenge with this data is that the alt base is reported using non ACGT characters. To be certain of using the correct base, our process is to reverse annotate the variant from protein notation and ensure the base coordinate matches. If there is ambiguities, use cDNA to resolve. This stage typically only is needed for synonymous/silent mutations. Filter out anything that has no AA change. Splice variant's Alt/Ref can trivially be deduced from UCSC and the "New Codon" field. New codon appears to report the splice site, not an actual codon. Variant on row 65 (SCP2 *207+TG) appears to be an insertion of TG where the splice site would start. Setting ref/alt to ""/"TG" Unclear why chr17:18140835 didn't have an amino acid change, but change seems trivival (chr17:g.18140835C>T) Remaining variants were processed using the reverse protein annotation using the GENE:p.RefPosAlt notation in Transvar Changed C18ORF26 for DYNAP:c.424G>T Changed DOM3Z for DXO:c.881G>A Changed KIAA0317 for AREL1:c.962C>T Changed TMEM85 for EMC4:c.458C>G Changed KIAA0182 for GSE1:c.1196G>A Changed EIF2C1 for AGO1:c.1064C>T Changed LASS6 for CERS6:c.406A>G We checked for inconsistancies with the reference coordinate to make sure the input values are correct/consistant. GON4L:c.3253C>T was used with cDNA due to multiple possible synonymous mutations. LTBP1 had multiple transcripts with different coordinates. GREB1L had multiple transcripts with different coordinates. DOM3Z had multiple transcripts with different coordinates. TJP3:1977C>T was used with cDNA due to multiple possible synonymous mutations.