With about 3.2 billion speakers, the Indo-European language family is the most widely spoken in the world. It is believed that Proto-Indo-European was spoken on the Pontic-Caspian steppe approximately five thousand years ago, largely by nomadic herders and pastoralists (often linked to the Yamnaya culture), before its speakers dispersed across Europe and western and southern Asia. The modern family consists of eight primary branches, roughly from east to west:
- Indic languages
- Iranic languages
- Nuristani languages
- Armenian: a Caucasian branch consisting of a single, pluricentric, language
- Baltic languages
- Slavic languages
- Hellenic languages: spoken primarily (as Greek) in Greece and Cyprus, but also by minorities in Turkey and Italy (as well as by a diaspora around the world)
- Albanian: a Balkan branch with a somewhat mysterious history, consisting of a single language
- Germanic languages: aside from English, spoken primarily in northwestern Europe and in South Africa; including English, spoken widely in North America, Oceania, and parts of Asia and Africa
- Italic languages: the only surviving branch of which is the Romance languages, spoken by hundreds of millions primarily in Europe and the Americas
- Celtic languages
There are also two known branches which are now extinct:
- Anatolian, thought to have been the first Indo-European branch to break away from the rest. Spoken in what’s now Turkey, attested from the 17th century BCE, but became extinct during the Hellenisation of Anatolia.
- Tocharian, which was spoken in what’s now western China, attested between the 5th and 8th centuries CE. Thought to have been supplanted by Uyghur.
On top of this, there are other ancient languages with somewhat sparse attestation, which are attested enough that it can be determined they were Indo-European, but not enough that they can be analysed or classified much more deeply than that. Many of these are Paleo-Balkan languages, with some others attested from the Italian and Iberian peninsulas.
One isogloss that splits the Indo-European languages from a pretty early stage is the centum-satem split. Originally posited as a west-east split, later scholarship (in particular after the discovery of the Tocharian branch) has recast it as a centre-periphery split. The branches which remained close to the centre of the Indo-European sphere (Indo-Iranian, Armenian, Balto-Slavic and Albanian) became satem languages, while the languages of the periphery (Celtic, Italic, Germanic, Hellenic and Tocharian) were centum. Anatolian is thought to have branched off from the rest of PIE before the centum-satem split occurred.