Y-Haplogroups and migration, using the example of I-L38 > BY14026

Introduction

In April 2017, I did my first Y-DNA test and was surprised by the result. My parents come from Almopia in the north of Greece, but the Y-Haplogroup I-L38 to which I belong originated about 4500 years ago in northwest Europe, between the mouth of the Rhine and the Baltic Sea. Since then I have been studying the question of how this haplogroup came to Almopia and have already motivated some relatives and carriers of my subgroup Y125026 to do Y-DNA tests to get a clearer picture.

The aim of this article is to present the information available so far, in simple words and pictures and to bring the topic closer to people from Almopia, as well as carriers of the I-L38 subgroup BY14026. In addition, it is intended to motivate these people to use NGS (Next Generation Sequencing) tests, such as the Big Y of Family Tree DNA, as this will lead to a clearer picture of where our sub-branches originated and the path they took.
See blog post NGS – Take part at the Y-tree.

Y-DNA and Y-Haplogroup, some basics.

Y-DNA

Overview DNA
Fig.1: Overview DNA

The Y-DNA is only passed on from father to sons. You can think of it in a simplified way, as a book with about 23 million characters (sequences of A, C, G and T). Ideally, the father gives a copy of this book to his sons without changes or errors. So brothers or close cousins have the same Y-DNA. However, errors in transmission occur every few generations.
Let us look at the following example.

SNP (Single Nucleotide Polymorphism)

In the sentence
IPASSONMYYDNATOMYVERYVERYVERYVERYVERYVERYVERYBEAUTIFULSON
the father makes a mistake and turns it into
IPASSONMYYDNSTOMYVERYVERYVERYVERYVERYVERYVERYBEAUTIFULSON
at a certain point the father uses a wrong letter. This error is called SNP (Single Nucleotide Polymorphism). These errors are very “stable”, so that one can reliably distinguish two lines by means of these SNPs, since this error is also transmitted for the following generations and does not change back. SNPs are also used to name Y-haplogroups (e.g. L38, Y125026). In Y-DNA, an SNP mutates on average about every 100 years. This allows us to determine the time, TMRCA (Time to the Most Recent Common Ancestor) in “years before present”, that has passed since the common ancestor lived.

STR (Short Tandem Repeat)

The same sentence
IPASSONMYYDNATOMYVERYVERYVERYVERYVERYVERYVERYBEAUTIFULSON
becomes
IPASSONMYYDNATOMYVERYVERYVERYVERYVERYVERYVERYVERYBEAUTIFULSON
Here a father found his son particularly beautiful and, in repeating the word “very”, took one too many. Such mistakes happen at different places in the Y-DNA with different frequency. There are STRs that have a low mutation rate and are only very rarely passed on incorrectly. With STRs, with a high mutation rate, it often happens that an error occurs, so that they can mutate randomly to the same values (repetitions) in different subgroups.

Y-Haplogroup

A Y-Haplogroup is a group of men, all descended from one man, in purely paternal line. In the past, the so-called “long form” was often used for naming Y-haplogroups, which starts with capital letters and then alternately consists of numbers and small letters. Nowadays, one mostly uses an SNP for which all group members have tested positive. My own Y-haplogroup is known as I2a1b2a (long form, valid for 2019 [1]) or I-L38 (SNP). Each character of the long form stands for one SNP.

Long FormI2a1b2a2
SNPM170M438CTS2257L460M436Y10720L38S2606
TMRCA (YFull)2750021500n/a21000170001220044004400

The subgroup, which is the subject of this article, is listed at FTDNA under SNP BY14026. YFull uses the SNP Y31038 for this purpose. One can imagine this in such a way that one takes the well-known SNP as a substitute for the name of the common ancestor, which one does not know. BY14026 is a descendant of L38. L38 is again a descendant of I-M170.

Y-DNA Tests

In Y-DNA tests, these “mistakes” are specifically queried in order to classify the testers into Y-haplogroups, but also to determine the relationship, in a purely paternal line.

Simple SNP Tests

Bei einfachen SNP Tests, wie 23andme oder Living DNA werden nur sehr alte SNPs gefunden und damit nur gemeinsame Vorfahren, die vor mehreren Jahrtausenden gelebt haben. Bei mir ergab der Test von 23andme für die Y-Haplogruppe S2606. Hier liegt der TMRCA bei 4400 ybp. Es kann sein, dass dies dem ein oder anderen genügt. Mir genügt das definitiv nicht.

With simple SNP tests, like 23andme or Living DNA, only very old SNPs are found and therefore only common ancestors that lived several millennia ago. In my case the test of 23andme gave me the Y-Haplogroup S2606. Here the TMRCA is 4400 ybp. It can be that this is enough for one or the other. This is definitely not enough for me.

NGS Tests

The NGS (Next Generation Sequencing) test, like the Big Y of Family Tree DNA, can be seen as an extended SNP test, with an integrated STR test. Other providers of NGS tests are YSEQ.net, Fullgenomes.com, Nebula.org, dantelabs.com and all others that offer WGS (Whole Genome Sequencing). Almost all Y-DNA is read. Thus almost all “SNP mistakes” are found compared to our “ancestor”. Even those that have never been found on anyone else before. Additionally, a large number of STR is read, which can be used for comparison with pure STR tests.

This is the test that is used to create Y-SNP trees, like the ones from FTDNA, or YFull.com. The FigUre tree on this blog is a visual summary of these two Y-trees. Since this test is the only one that allows to clearly separate sub-branches even after the turn of time, it is my preferred test.

STR Tests

During those tests selected STR markers are read. You can compare the STR markers of two or more testers and estimate the relationship between them. When SNP tests were still too expensive, STRs were used to determine Y-haplogroups. You can use them to distinguish lines. But beware! STRs with a high mutation rate can randomly mutate to the same value, even if they are in different subgroups. Therefore, it is recommended that you confirm the predetermined subgroup to which you might belong by SNP testing, or extend the test to an NGS such as BigY.

Y-Haplogroups and migration, using the example of I-L38>BY14026.

One occasionally reads that Y-haplogroups are associated with peoples. For example I2a is often associated with the Slavs in general. As you can see in the previous table, I2a is an ancient haplogroup that originated more than 20000 years ago. Slavs as an ethnic group have only been spoken of since the sixth century [2]. Therefore it makes no sense to speak of this connection in a generalized way, without further differentiation.
To connect Y-haplogroups with peoples is possible, but – as we will see in the example of I-BY14026 – difficult.

First of all, we need to know the approximate time when a SNP was formed and thus when the common ancestor of a Y-haplogroup lived. With the help of NGS tests one can determine “young” SNPs and estimate when they were formed. At YFull.com the age of the branches is estimated mathematically [3], and is part of the YFull tree. One must not forget that this is an estimation, a confidence interval (CI 95%) is given in addition to the age. Estimating the age is only one of many advantages of YFull, so I recommend submitting the raw data to Yfull.

Then you need in addition a way, how to determine the place where an SNP was created and thus, where the common ancestor of a Y-Haplogroup lived. Hans de Beule describes such a method in “Ancestral Modal Windows of I-L38” [4].
One takes a Y-haplogroup and, using the Y-DNA testers, considers the distribution of the descendants, the individual sub-branches, on the map. In the area where the individual areas overlap, one can assume the origin of the common ancestor. This assumes, however, that the branches contain additional people whose ancestors remained loyal to the place and did not migrate. The more information one has about the sub-branches, the clearer one can draw a picture of where the individual SNPs originated and thus trace migration paths. If one does not have enough testers, it is often impossible to trace whether a line went from A to B, from B to A, or even from an unknown C to A and B.
For this reason, I want to encourage people to take an NGS test, or expand their STR test to an NGS test.

Depending on the information available, parts of migration paths can be traced quite well. However, in places where they are missing, you have to form hypotheses and sometimes you get multiple versions. Here it helps if you have knowledge of historical migrations. Mine are somewhat limited and I am grateful for advice from people with more knowledge. In my opinion it is a good way to first evaluate the information you get from the Y-DNA tests and only then to consider which historical migrations are appropriate.

 „Sons” of the Y-Haplogroup I-BY14026, an inventory.

The subgroup BY14026 of the Y-Haplogroup I-L38 has now more than 50 testers. However, for this inventory, only those testers were taken who have indicated a location for their earliest known ancestor and can be clearly assigned to BY14026, or one of its subgroups. The sources for this data are:

  • Haplogroup I-L38 Project at FTDNA [5]: NGS (Big Y), as well Y37STR, Y67STR and Y111STR
  • Yfull [6]: NGS of other providers
  • Serbian DNA Project [7]: PowerPlex 23 STR
Fig. 2: Pedigree of the Y-Haplogroup I-BY14026

For the subdivision into subgroups, not only SNPs were taken here, but also some STRs that are relatively meaningful (three or more stars at YFull for mutation rate). Placement in the time line is done using the age estimation of YFull [3], if an age has been estimated with this.
BY14026 was born about 2300 years ago. There are some currently known sub-branches, which are presented in order. BY25359, BY161442, BY14026* and BY14044.

BY25359 and subclades

Y125026 (DYR221=14, DYS448=21)

The subgroup Y125026 is mainly found in southeastern Europe. It was formed about 1500 years ago and is roughly divided into two branches.

The testers of the violet block are the only testers within Y125026 with unchanged values for the STRs DYR221=14 and DYS448=21. A01 and Y128714 do not share a common SNP below Y125026.

The ancestor of A01 comes from Almopia just like the common ancestor Y128714, from A02 to A04. The places of their ancestors are only a few kilometers apart. This means that the common ancestor of A01 to A04 came to Almopia (Enotia) shortly after the “birth” of SNP Y125026, about 1500 years ago.
A04 is the test of my own Y-line.
A03 is a man of our village, whom I tested to verify a family legend, which said that our common ancestor in pure paternal line came from Mani, the south of Greece, about 300 years ago. However, since the Y-Haplogroup has been in Almopia for about 1500 years, this family legend is thus falsified.
A02 is a nephew of my mother. The father of my mother belongs to the same Y-Haplogroup Y128714 (TMRCA 600 ybp.) as my father. It is said that the grandfather of my mother’s grandfather “came from the Vlach”. Since there are Meglenoroman villages north of Theodorakeion, I assume that he came from one of these villages. Notia, the place of origin of A0 is one of these villages. “In the early Byzantine period, the area was renamed to Enotia (Greek: Ενωτία) after a nearby fortress, probably in the vicinity of modern Notia [8].

Y125026 (DYR221=13, DYS448=22)

The testers in the red block also belong to haplogroup Y125026, but have a common ancestor that they do not share with the violet group. The STR marker DYS448 (21>22), part of the Y37 STR test at FTDNA, as well as the STR marker DYR221 (14>13), extracted by YFull from the NGS data, are modified. As the location for the formation of this block I assume, due to the highest number of subgroups (verified by NGS and predicted by STR), the Carpathian Basin between Serbia (Vojvodina) and Hungary.

The red block has two sub-branches defined by SNP (Y177573, BY111953) and several participants who have not yet formed a subgroup or only tested Y-STR. For the testers who have only tested Y-STR so far, I would like to recommend an upgrade to Big Y 700, so that we get more sub-branches here and get a better picture of Y125026. It is also possible that B08, B09 and B10 form a sub-branch due to DYS437=14 and B03, B05, B13 and B23 form a sub-branch due to DYS391=10. I would be especially happy about a second NGS in Serbia, or one in Hungary.

Y177573 is a subgroup of this block verified by SNPs and can only be found in Russia. A parent subgroup, based on the STRs DYS514=19 and DYR123=16 extracted by YFull, could be formed by the testers B21, B22, B23 together with Y177573. B21 from Romania, could be an indication that this sub-branch also has its origin in the Carpathian basin.

The sub-branch BY111953 was formed only recently and is very interesting, because of the Swedish D03. It does not fit into the scheme of southeastern Europe. Either it is an argument for the fact that Y125026 originated in the north, or the ancestor of the Swede, has moved back north, where the Y-haplogroup BY14026 originally came from, after Y125026 DYS448=22 was formed in the Carpathian Basin. I have a strong tendency to the second variant.

FT53143

BY25359, the ancestor of Y125026, has another descendant. The subgroup FT53143 with three testers and a TMRCA of 1300 ybp.
One of them is from the south of Sweden.
Two others, which form the subgroup FT52991, from Great Britain.
I assume that BY25359 originated in Sweden, FT53143 moved from there, less than 1300 years ago, to the UK and Y125026 moved south a few centuries earlier.

Fig. 3: Map of the Y-Haplogroup I-BY14026 (mapswire.com)

BY161442

The SNP Y32046 is not stable in NGS tests and was removed from the YFull tree. Nevertheless, I would like to point this out and use it to roughly divide the groups. Besides BY25359 there is a small group with positive results for this SNP. Testers of a surname project with members from Yorkshire belong to BY161442. There is also a man from Sicily, Italy, who shares a common ancestor with the gentlemen from Yorkshire because of his Y111 values. The exact relationship between the group and the Italian would be interesting.

BY14026* and BY14026 predicted with STR markers

Y125026 is not the only subgroup of BY14026 in the Carpathian basin. A Hungarian man from Vojvodina, Serbia (G01) is the only NGS tester of the subgroup BY14026*. He is not one of the other known subgroups of BY14026. Whether his ancestor came to the south with, or independently of Y125026 can only be speculated. It is interesting that this, as well as the testers of BY25359 and BY161442 (NGS on YFull), has a modified value for the STR marker DYR8.

The testers H01 to H03 were only predicted as BY14026 by STR marker. Whether they belong to a further, previously unknown subgroup or to one of the already known subgroups cannot be determined due to their STR markers. To determine this with certainty, these testers would have to upgrade to Big Y.

BY14044, pre-BY14044 and subgroups.

BY14044 is the second major subgroup of BY14026, roughly divided into BY14044 and pre-BY14044.

The verified part is BY14044 and consists of two sub-branches, with BY14044* one tester from Denmark and BY25362, a whole series of testers from England. Some Y-STR testers can be placed very well in this group. BY25368 is formed by two Big Y testers from the Firth surname project. According to “Descendants of haplogroup IJ-M429” [9] the actor Colin Firth belongs to this haplogroup.

The unverified part, I called pre-BY14044, because they have the typical STR values for BY14044 (DYS607=13, DYS617=14 only for Y67 and Y111). Some members with these values had not given a country of origin for their ancestor, so they could not be considered in this article. The most frequently given places are Prussia and Scotland, but even a tester from Spain is in this group. To these testers, I would like to recommend an upgrade to Big Y 700, so that we can get sub-branches for Prussia and Scotland and get a better picture of how BY14026 came to Scotland and Spain.

Example, Y125026 and the Heruls

In the last three years a lot has happened in the Y-Haplogroup BY14026. Several NGS testers have been added and the picture about the evolution of BY14026 and the migration of Y125026 to the southeast of Europe is getting increasingly better. One of the first hypotheses, how I-L38 came to Greece, included the Varangian Guard. “The Varangian Guard (Greek: Τάγμα τῶν Βαραγγῶν, Tágma tōn Varángōn) was an elite unit of the Byzantine Army from the tenth to the fourteenth century. The members served as personal bodyguards to the Byzantine Emperors. The Varangian Guard was known for being primarily composed of recruits from northern Europe, including Norsemen from Scandinavia and Anglo-Saxons from England.” [10].

Later, as the Carpathian Basin became more in focus, the Gepids, “an East Germanic tribe who lived in the area of modern Romania, Hungary and Serbia, roughly between the Tisza, Sava and Carpathian mountains. They were closely related to, or a subdivision of, the Goths.” [11], became my favorites for the expansion of Y125026.

With the appearance of the tester from Sweden (D03), the question arose whether Y125026 DYS448=22 originated in Scandinavia or in the Carpathian Basin. In the second case, the ancestor of D03 must have “gone back” to Scandinavia. Here, the Heruls, “an early Germanic people. Possibly originating in Scandinavia, the Heruli are first mentioned by Roman authors as one of several “Scythian” groups raiding Roman provinces in the Balkans and Aegean, attacking by land, and notably also by sea.“ [12] could be possible, but I am not yet convinced of this.

Fig. 4: The Heruls in Scandinavia, by Troels Brandt

The map (Fig. 4) by Troel Brandt, The Heruls in Scandinavia [13], served as the basis for the animation, which concerns the migration of the Heruls. Migration paths and trade routes, as well as year dates were taken from this map. Other movements were only hinted at, based on the information of the Y-trees.

Animation: Migration of the Y-Haplogroup I-BY14026 – Version “Heruls”

Unfortunately, there is not enough information about the BY14044 and Pre-BY14044, subclades yet. There are only four NGS tests here and only one of them was submitted to YFull, so no age estimates are available. BY161442 is also somewhat weak, with only two NGS. For this reason the migrations of these groups were not integrated into the animation.

Conclusion:

When I did my first Y-DNA test in 2017 – I started with a Y37 at FTDNA – my result confused me completely and I only assumed that the samples were mixed up. I very soon expanded to a Big Y, hoping that the Y-trees would grow quickly and it would become a little clearer how I-L38 came to Greece. Whether Varangians, Gepids or Heruls were involved in the migration, or it was completely different, will occupy me for a while, and that is how it should be, because I like to deal with this topic. With every additional tester the picture becomes clearer and you can throw previous hypotheses overboard.

This is not only true for my sub-branch, but for the whole haplogroup I-L38 and the one or other neighboring branch, which is just waiting to be discovered. Y-DNA genealogy is a hobby that only works if there are enough motivated “players” on the field and actively participate in the game. Not every player will show full commitment in the game and make a NGS which he submitts to all Y-trees. It is also possible to participate with a small effort and support the game.
For this reason I would like to ask you for the following:

  • If you have already done a test with FamilyTreeDNA, please,
    • Go to “Account Settings > Genealogy” and add the “Earliest known Ancestors” at “Country of Origin” and “Location”, if known.
  • Regardless of whether you have done a test at FTDNA, please
    • Spende einen kleinen Beitrag an das Haplogroup I-L38 Project bei FTDNA. Hiermit werden Tests finanziert, die wichtig sind für die Forschung an der I-L38 Haplogruppe. So können sehr selten Unterzweige auch dann gefunden und am Baum positioniert werden, wenn der Tester sich keine weiteren Upgrades leisten kann.
    • Donate a small amount to the Haplogroup I-L38 Project at FTDNA. This will finance tests that are important for research on the I-L38 haplogroup. Thus, very rare sub-branches can be found and positioned on the tree, even if the tester cannot afford further upgrades.

Sources:

[1] ISOGG Y-DNA Haplogroup Tree 2019-2020: https://isogg.org/tree/
[2] Wikipedia – Slavs: https://en.wikipedia.org/wiki/Slavs
[3] YFull’s age estimation methodology: https://www.yfull.com/faq/what-yfulls-age-estimation-methodology/
[4] Ancestral Modal Windows of I-L38, by Hans de Beule: https://sites.google.com/site/haplogroupil38/-2019-ancestral-modal-windows-of-i-l38
[5] Haplogroup I-L38 Project atFamily tree DNA: https://www.familytreedna.com/groups/i-2b-2
[6] Yfull tree: https://www.yfull.com/tree/I-S2555/
[7] Serbian DNA Project: https://dnk.poreklo.rs/DNK-projekat/
[8] Wikipedia – Almopia: https://en.wikipedia.org/wiki/Almopia
[9] Descendants of haplogroup IJ-M429: https://haplogroupijm429.wordpress.com/2020/05/22/i2-colin-firth/
[10] Wikipedia – Varangian Guard: https://en.wikipedia.org/wiki/Varangian_Guard
[11] Wikipedia – Gepids: https://en.wikipedia.org/wiki/Gepids
[12] Wikipedia – Heruli: https://en.wikipedia.org/wiki/Heruli
[13] The Heruls in Scandinavia, by Troels Brandt: http://www.gedevasen.dk/heruler.html

Leave a Comment

Your email address will not be published. Required fields are marked *