Copy/Pasted rows in the SampleID 2 column of S11 where the column has values at the bottom of the table. Combined SampleID 1 and SampleID2 to create VARICARTA_SampleID_Combined and VARICARTA_Diagnosis_Combined. Feature/Accession were combined to VARICARTA_feature_accession. VARICARTA_SampleID_Combined and Individuals were combined to VARICARTA_SampleID_Individual(s). VARICARTA_Diagnosis_Combined and Child were combined to VARICARTA_Diagnosis_Child. Consequence and functionGVS were combined to VARICARTA_Consequence_functionGVS. scaledCADD and scoreCADD were combined to VARICARTA_scaledCADD_scoreCADD. Variants from Gecz_13_24696 and Gecz_11_10322 do not have inheritance provided (Supplementary data references a vlookup function where the file is missing). Variants with inheritance set to false positive (false+) failed were excluded. Variants with inheritance set to heterozygous or homozygous were set to inherited. Two variants appear to be from SSC with de novo information, but since we cannot confirm we're not willing to say they're de novo for sure. “1 151378489 A” “10 28879673 G” Inheritance set to Sanger validation were kept. Samples IDs where the cohort was reported as SAGE but had sample IDs for both SAGE and SSC used the SAGE sample ID. Samples IDs with two types of AGRE ID were set to the AUxxxx one, i.e.: 03C16890, AGRE_AU038204 and 03C16243, AGRE_AU035204. We removed variant labeled with non-ASD phenotypes (Intellectual disability, developmental disability, NQA (not quite autism)). Based on cohort information provided in Supplementary Table 8, variants from cohorts without an ASD diagnosis were also removed. Cohort was removed from the sample ID. N.B. There are a few cases where it says that "Validation failed" (or something along the lines of failure), with different terminologies for a small number of variants. We are unclear whether the authors meant that the validation failed to reproduce the variant (e.g. a false positive, which was labeled as such and therefor excluded) or if the validation failed for technical reasons, but they cannot exclude that the variant might actually be real. These cases were set to "unknown" for the validation status. To obtain the reference allele, we ran the HGVS mutations in transvar. 20 variant failed using the provided HGVS variant, so they were edited (either by removing the gene name first, otherwise the transcript but using the gene, and checking that the variant was successfully mapped.). The HGVS variants used are described in HGVSIDsToReprocess. One instance was missing data entirely (variant on chr:4 position 114302686, SampleID 212-21046-1). 9 splice/intronic variants were returned as incorrectly formatted (Variant NM_012302.2:c.-101+2T>A returns "[list_parse_mutation] [parse_pos] exception: invalid position string -43-2." ). Those were fixed by looking up the reference base manually to avoid misinterpreting the variant syntax. InDels were converted from 0-based to 1-based (VARICARTA_Start = Position + 1 for InDels).