In mid-February 2021, the company YSEQ from Berlin presented its new product, the WGS400. I was very pleased about this, because I have already had very good experiences with the company YSEQ in recent years and they offer some advantages over other companies for me. With the WGS400, they now also offer an NGS (Next Generation Sequencing) that is not only high quality, but also competitively priced and includes everything that is important to me as a DNA genealogy researcher.
Advantages of YSEQ:
- Low shipping costs and short shipping time because company and lab in Berlin, Germany.
- Sample is stored for upgrading.
- Buy an empty DNA Sample Kit and order the product of your choice as soon as the kit arrives at the lab.
- Large and flexible product range: Y-DNA (Y-STR panels, single-STR, SNP packs, single-SNPs, custom panels), mtDNA (HVR1, HVR2, full sequence), Whole Genome Sequencing.
- Almost any SNP orderable “Wish a SNP”, as well as creating custom panels.
- Dedicated customer service
Advantages of WGS400:
- An existing sample stored at YSEQ can be used to upgrade to WGS400.
- Low price
- Buy an empty DNA sample kit, check with a single SNP if the tester belongs to the correct Y haplogroup and only then extend the test to WGS400. (This is the most important point for me)
- Free BAM file (raw data), in hg38 format suitable for Y-genealogs. (No conversion for YFull from hg19 to hg38 necessary).
- Raw data (BAM) includes all Y-DNA, all mt-DNA and autosomal DNA.
- FASTA file for mt-DNA (raw data file for mt-DNA)
- 400-base long reads (advantages are described by YSEQ in the appendix below).
- More STR markers can be read.
- Very good length and depth coverage (SNPs)
- Direct access to transfer NGS data (Y-, mt-DNA) to the YFull.com database.
Here follows the promotional text from YSEQ for the Produkt WGS400 (link):
This is YSEQ’s new Whole Genome NGS Specifically for Genealogy Researchers with 400 Base Long Reads
The whole genome contains over 200 times as much data as the Y chromosome alone! Sequencing has improved dramatically over the last seven years, which makes it not very reasonable to pay a high price for a type of sequence that only covers 70% of the Y chromosome. At a substantially lower price per base, you can get nearly 100% of the Y chromosome with YSEQ’s Whole Genome test.
You’ll be able to get a complete and accurate sequence of the Y chromosome, the mitochondrial DNA, plus all the other chromosomes from 1 to 22 and of course the X chromosome.
Why are 400-Base Reads Important?
The genome sequence you’ll receive from most other companies will be composed of very short individual reads, of 100 or 150 bases each. This greatly limits what can be learned from the data. Many important kinds of variation in the genome are impossible to pick up with these short reads – deletions, insertions, inversions, and other complex rearrangements of your genome, that can have a much greater impact than simple one-base changes (SNPs). Not to mention, many STRs cannot be read with such short reads. Below, we give an example of how long reads will give us the ability to read STRs that are otherwise missed.
The greatest enhancement of the long read sequences is that they will allow us to create de novo assemblies. That is, instead of relying on a standard, one-size-fits-all reference sequence, we can decipher the real, unique Y chromosome sequence as it is found in each haplogroup.
In our pre-launch WGS400 test runs we’ve collected some astounding sequencing data which shows the new possibilities
FTP Link: https://genomes.yseq.net/WGS/400SE/
We’ve asked independent experts to have a look at the data and we present their results here:
400 Base Reads improve the readability of Y-STR markers
Normally, WGS testing is done with 2 x 100 base or 2 x 150 base paired end reads. This is good enough if you only want to look at short SNP markers, but longer repeating elements, such as STR markers, can’t be covered with short reads.
For example the Y-STR marker DYS684 (aka. DYS1005) is approximately 250 bases long, and looks like this:
Different persons may have different numbers of CTTT or CCTTT repeat units. Obviously, 150-base reads can be mapped to various positions on this sequence, and the ends can’t anchor to both sides of the repeat section at the same time. This way, it’s impossible to identify the correct repeat count, and find other persons who match you at this marker. 400-base reads can cover the whole stretch of long STR markers, and with this technology we can resolve almost every STR marker that has been tested in existing Y-STR databases.
More STR examples
Long STR markers can’t be covered with 150-base paired end reads, but 400-base single reads can reach across both ends of the repeat section. This is why longer reads are important for measuring STR markers.
YFull STR Details.
Long STRs with many repeats are significantly better covered by a WGS400.