An Intoduction to Pidgins and Creoles

Sabrina Lowney

In the Storycraft Character and Scene class, there were some questions about pidgins and creoles, so I thought maybe I’d write an intro on the subject on here, for the class and whoever else may be interested.

Pidgins and creoles are two groups of what are called “contact languages.” Most contact languages spoken today are creoles, and there’s a lot more research on creoles than on pidgins, for reasons that will become clear when I talk about function. There are a few things I want to talk about in regards to contact languages: myths, distinctions, form, and origin. (Scroll down to the bottom if you just want the shorty-short tl;dr version).

But first, some resources and terminology.



1. WALS, or the World Atlas of Language Structures (https://wals.info), is a free online database of various structural properties (aka stuff like grammar, sound system, vocabulary) of the world’s languages.

They have fun article-like pages, like Zero Copula for Predicate Nominals (https://wals.info/chapter/120), which talks about whether a language does or doesn’t require the equivalent of the ‘to be’ verb in statements like “the dog is happy.” And this feature, like a gazillion others, can be plotted on a map (https://wals.info/feature/120A#2/14.1/146.0), showing the world distribution of languages that do and don’t allow you to drop the ‘to be’ verb.

You can also map different language families. Here (https://wals.info/languoid/genus/creolesandpidgins#2/16.5/146.0) is a map of some of the pidgins and creoles in the world (that are known and at least partially documented). There aren’t a lot, because WALS mainly deals with non-pidgin/creole languages, but there are some.


2. Even better than WALS (for contact languages, at least) is APiCS, the Atlas of Pidgin and Creole Language Structures (https://apics-online.info). The website’s look is very similar to WALS, but it focuses on pidgins and creoles and has a lot more information on them.

Here the map: (https://apics-online.info/contributions#2/30.3/10.0), color-coded based on lexifier.

If you click on “Surveys,” you can get an intro to all of the languages represented on the site. Here’s the page for Bislama, a creole from Vanuatu (just southwest of Papua New Guinea), (https://apics-online.info/surveys/23). This gives background on the language’s origin, as well as info about its sound system and grammatical structure.

They also have lots of example sentences, unlike WALS (https://apics-online.info/sentences).


3. For class I used Viveka Velupillai’s “Pidgins, Creoles and Mixed Languages”, which I really like, but I only have the physical copy and it’s about $50, even the ebook version. However, I do have a PDF copy of “Pidgins and Creoles” by Jaques Arends and co.; here it is: PIDGINS AND CREOLES_ An introduction - Edited by Jacques Arends.pdf.



For simplicity’s sake, let’s pretend all language contact situations are composed of two languages. Based on this, language contact situations come in two flavors: those where both languages have equal prestige and those where one language has prestige and the other does not.

If both languages have prestige, they are called “adstrates”; two or more languages in a contact situation with roughly equal prestige.

If one language has prestige and the other doesn’t, the former is the “superstrate” and the latter is the “substrate.” Generally, a superstrate provides the new contact language with its lexicon, while the substrate provides more of the grammatical infrastructure.

The language in a contact situation that provides the lexicon to the new contact language is called the “lexifier.” Thus, the superstrate and the lexifier and generally the same thing, but even a contact situation composed of adstrates will have a lexifier.

Creoles generally exist upon a “creole continuum” of “‘lects,” which are sort of like different dialects but they’re spoken not by different people in different geographical locations, but rather by the same people in different social situations.

-> an “acrolect” is the form of the language that most closely resembles the lexifier/superstrate. So for Jamaican Creole, the acrolect is the form of the creole that is closest to English. Acrolects are spoken in more formal situations.

-> a “basilect” is the form of the language that most either closely resembles the substrate, or is simply the most divergent form of the language from the superstrate. It’s the most “creole-y” form of the language. Basilects are spoken in more informal situations.

-> in between the acrolect and the basilect are generally a variety of “mesolects;” they are what makes the continuum a continuum and not a see-saw.

These continuums are pretty stable, in that creoles do not change over time to be more acrolectal or basilectal.

The last term is “jargon.” In linguistics, jargons are ad hoc solutions to situations of mutual unintelligibility (where the individuals involves speak languages different enough not to be understandable to each other). They are highly variable and unstable, and often involve a lot of gestures and such non-verbal communications. They aren’t real languages.



There’s a lot of misinformation out there in the common understanding. For one thing, “pidgin” and “creole” tend to be used interchangeably; they really only have distinct meanings within the specialized language of linguistics.

More pernicious myths are that pidgins and creoles are “not real languages”—that they are: “broken” versions of “real” languages; spoken by people who are illiterate or uneducated; just a dialect or accent of their lexifier language; lack grammar; etc.

Obviously, none of this is true. Contact languages are real languages, they are distinct languages, they have their own grammar and vocabulary, and they are spoken by just a wide of range of people in terms of literacy and education as any other language.

Another assumption is that contact languages are restricted in what they can say; they’re inadequate for the full range of expression that English or Persian or anything is capable of. With pidgins, this is generally true. Not so with creoles.



There are actually three kinds of contact languages (that are actual languages and not ad hoc jargons): pidgins, creoles, and mixed languages. Hence the title of my textbook, “Pidgins, Creoles and Mixed Languages.”

(Mixed languages are quite a bit rarer and quite a bit weirder, so I’m not really doing to talk about them. But, in short, you know how lazy language inventors for books might just slap nonsense words onto what is grammatically English? Like, you take an English sentence, and go through and swap out each word for something made up? Well, mixed languages are like that, except they take the grammar of one languages and swap in the vocabulary of another. Or, even weirder, they take the verbs from one language and the nouns from another language and use them together like a single language.)

Pidgins and creoles are more common and they’re really what I’m talking about here. Contact languages, unlike other languages, have more than one ancestor and have a fairly abrupt (in the linguistic timeline, anyway) beginnings.

The difference between them is that pidgins have no native speakers and are typically restricted in their use and vocabulary to specific social domains. This is why the “inadequate for full expression” thing is sort of true for pidgins.

Creoles, on the other hand, have native speakers; they are pidgins that are acquired by children as a mother tongue, a process called nativization. During this process, the grammar and vocab is expanded and they lose their domain-restrictedness, and can be used in any social setting. At this point they are full languages, and, as Velupillai says in my textbook, “whatever can be talked about, thought about, and conducted in, for example, Japanese, Italian, Russian or any other language of the world spoken as a mother tongue by an entire community, can also be talked about, thought about and conducted in any given creole language.”

The cut-off between a pidgin and a creole is not very well-defined. They exist on a sort of continuum of their own, like a creole continuum. One one side you have a very rudimentary pidgin, a “stabilized pidgin,” which is like a jargon that’s been standardized enough to have its own structure that newcomers must learn (whereas in a true jargon it’s more ‘make it up as you go’), but is still domain-restricted. On the other side you have a creole proper.

In between are all the various levels of “expanded pidgins” you might have, with their more expanded grammars and potentially unlimited speech domains, but still no native speakers. Lingua francas tend to fall into this intermediate category.


To sum all this up:


  • no native speakers.
  • generally have simpler grammars and smaller vocabularies than creoles, but do have internal structure and rules that must be learned, just like any other language.
  • starts out domain restricted, but may become an expanded pidgin capable of being used in any speech domain.


  • native speakers
  • generally have more complex grammars and larger vocabularies compared with pidgins.
  • no domain restrictions.



This is pretty interconnected with the above section, since I had to talk a little bit about the features to explain what makes pidgins and creoles different, but anyway.

There are a small number of linguistic features that are associated with pidgins and/or creoles. This isn’t to say they only exist in these languages, but rather that these languages tend to have these features.



They tend to be analytical in terms of how they form words (their morphology).

Basically, European languages tend to form words by taking the word root and adding on little bits (affixes), without which the word would either not be a real word or simply mean something else. For example, in English, “walked” is the root “walk” plus the affix “-ed” which marks past tense. In Spanish, “quiero” is “quier” plus “-o”, but the former is not a word on its own, unlike “walk.” Thus European languages are fusional.

If you know anything about Mandarin Chinese, it’s analytical. The English future tense is an analytical construction, “I will walk,” but an analytical language has this sort of thing, where different grammatical bits are represented by different words, more ubiquitously.

For instance, various ‘Eskimo pidgins’, aka Inuit and Yup’ik (and probably Aleut) trade languages sprung up due to the fishing and whaling industries. The Eskimo-Aleut language family is known for have very complex words that make German compounds look like baby talk. For instance, Central Alaskan Yup’ik is the proud owner of “angyaliciqsugnarquqllu”, which means “also, he will probably make a boat.” However, in Eskimo Pidgin, “kapi suli picuktu awoña” means “I want some more coffee.” Much more analytical.

Related to this, pidgins tend to use bare stems for all words. They have little to no derivational processes (like how “subordinate” can turn into “subordination” into “insubordination”).

Pidgins also have limited vocabularies, and lack synonyms.

In terms of their phonologies (their sound systems), pidgins are influenced largely by the substrate language.



Creoles also tend towards being analytical, but they also develop derivational processes.

A lot has been written about verbal systems in creoles. In linguistics, there are grammatical categories called tense, aspect, and mood (TAM) that are typically marked on verbs or on particles adjacent to verbs. It’s considered something of a creole universal that creoles have three (or fewer) preverbal particles marking tense, aspect, and mood in that order. This differs from pidgins because pidgins generally lack any TAM marking at all. This ‘universal’ has been challenged but does still represent most creoles’ TAM markings.

Creoles have vocabularies similar to a non-contact language, so they can say anything they like (they aren’t domain restricted) and have synonyms.

In terms of phonologies, creoles are much more influenced by their superstrate language, which is the opposite of pidgins.



In short, extended intense/frequent contact between speakers of mutually unintelligble languages, who need to communicate, is what gives rise to a contact language. If you only occasionally have need for communication, you can jargon your way through the situation when it arises. The contact has to be extended for a pidgin to become useful, and it has to remain useful in order to gain complexity and become nativized (and thus become a creole).

A multiethnic situation is important to contact language formation because a multiethnic group contains the necessary ingredients (multiple, mutually unintelligible native languages and extended contact provoking a need for reliable communication) for contact language formation.

An interesting example from Hawai’i: “[O]n the islands of Hawai’i intense contact [existed] initially due to the Pacific trade and the whaling industry, but very soon also due to missionary work and the agricultural industry on the islands, led to a Hawaiian-lexified pidgin (Pidgin Hawaiian). The contact situation remained and intensified, leading to import of labourers. At the same time English-language schools were set up on the islands. With this development English gained in prestige and a new, English-lexified, pidgin emerged (Hawai’i Pidgin English) and replaced the first pidgin that had been used on the islands” (Velupillai).



There are trade and maritime pidgins, workforce pidgins, military pidgins, and urban pidgins. They have their origins in trade (often along coasts and ports, hence all the maritime pidgins, but also in other non-maritime trade environments), workforces such as domestic workers and plantations, military camps and ventures, and urban areas.

My guess is that trade and maritime situations gave raise to most pidgins historically, but most contact languages still spoken today (the majority of which are creoles rather than pidgins) have their origins in workforce pidgins, which are, perhaps without exception, products of migration and colonization.

Here are some examples from my textbook:

  • Lingua Franca (also called Sabir) was a maritime pidgin spoken along the Mediterranean coast roughly from the time of the Crusades to the 19th century (Velupillai).
  • Chinese Pidgin Russian was a trade pidgin spoken from the 18th to 20th centuries and was used along the Sino-Russian border (Velupillai).
  • Butler English is a domestic workforce pidgin in India which developed after Britain colonized the subcontinent and was used between local staff and their colonial superiors.
  • Plantation pidgins are probably the most common kind of workforce pidgin, and the origin of most of the world’s creoles. Tok Pisin, which has become a creole, was first used on German-controlled New Guinea plantations among the workers there.
  • Shaba Swahili, which developed in the mid-1900s, was a mine pidgin used in the Belgian copper mines in the south-east of the Democratic Republic of Congo. As of the late 1990s, it had become an extended pidgin.
  • Juba Arabic is a military pidgin from South Sudan that developed in Egyptian military camps that had recruited soldiers from a variety of ethnic Sudanese backgrounds. Like Tok Pisin, it has acquired native speakers.
  • Hawai’i Pidgin English was developed by non-native English speakers in primarily English-speaking urban centers in Hawai’i. It eventually became the main pidgin of the plantations, but it originated in the urban environment.

Here's a (non-exhaustive) map of pidgins scanned from Velupillai's book:




There are two broad categories of creole: exogenous and endogenous.

Exogenous creoles develop in a contact situation where none of the input languages are native to the area where the creole develops. Plantation creoles, like those that developed among African slaves taken to European plantations in the Caribbean, are like this. In addition to plantation creoles, maroon creoles are also exogenous. A maroon creole is a creole spoken by ex-slaves who successful escaped plantations and formed settlements, usually in locations difficult to access and thus somewhat protected. Saramaccan (English- and Portuguese-lexified) is a maroon creole from Suriname and Palenquero is a Spanish-lexified maroon creole from northern Columbia.

Endogenous creoles develop in a contact situation where one or more of the input languages was native to the area where the creole develops, generally the substrate languages. Fort creoles, such as those that developed along the Gold Coast in Africa, are an example. In West Africa, they’re usually Portuguese-lexified, such as Guinea-Bissau Kriyol.

And here's a (also non-exhaustive) map of creoles, also scanned from Velupillai:




The short version is, pidgins and creoles are real languages with their own grammatical rules and processes. They emerge out of intense extended or repeated contact situations between speakers of different languages. They typically ‘sound like’ their lexifier, the language from which they derive the bulk of their vocabulary, and are often stigmatized as being a “broken” version of that lexifier.

Pidgins have no native speakers and small vocabularies that are typically constrained to a few semantic fields/speech domains. Creoles have native speakers and have large vocabularies, and are capable of expressing anything the speaker of, say, English or French, can express.

Because of the stigmatization, there isn’t a lot of research on contact languages as compared to other languages, and it can be difficult to get speakers to speak openly in their pidgin/creole.

Thanks so much for this! I've had some small scenes playing with languages in my stuff, but this gives me such a better frame of reference (it was regarding a creole language) and some actual language (lol) to think about. I'm super grateful you took the time to write this all up. And with references 😍

