Warning: this post is VERY long so I've condensed it into a series of spoilers. There's a lot of history in here so if you want to just get to the debate without taking into account ancient linguistic theory feel free to skip to sections relevant to your interests.
Basic explanation if you're not aware of how a positional number system works:
For those unfamiliar with positional numeral systems, a base, also called a radix is the quantity upon which a positional number system is formed. The vast majority of the human population uses a base expressed by the number of dots in the following parentheses (:::::). We call it "ten" in English but using such a word to describe it may be less than useful as I will describe later. In our base, we count 0123456789 and then create a new digit in the "tens" place and roll the units back down to 0, creating "10". However, in other bases have a different number of units, such as base (::::) which we call "base 8" or "octal", only 01234567 exist, and instead of rolling over after 9, a digit rolls over after 7, so 10 comes after 7, 20 comes after 17, and 100 comes after 77. There are also bases with a larger number of units , such as base (:::: ::::) with 0123456789ABCDEF so 10 comes after F, 20 comes after 1F, and 100 comes after FF.
Prehistory of the development of base ::::: and why we call it "ten":
While most people accept that we count in base ::::: because we have that many fingers, I have never heard an argument for nor been able to find literature explaining how one arises from the other. However, I do have a hypothesis:
Counting in quantities of ::::: almost certainly arose before positional number systems, as fingers were the most convenient way to represent numbers before written systems came about. Since we have ::::: fingers, and early humans usually didn't need to count very far above this, most languages assigned a name to each quantity of fingers possible. In other words, they came up with ::::: terms for quantities (though at this stage most considered ( ) dots as not a quantity but merely the absence of quantity). There is evidence for the influence of such a counting system on the names for numbers: Proto-Indo-European (from which all major modern European languages but Finnish, Estonian, and Hungarian are descended- descendents also include Hittite, Latin, Sanskrit, and Hindi) reconstructions indicate that the name for "five" was probably the word for "fist" and the name for "ten" was probably the two words for "two hands"
Presumably, early humans interpreted larger numbers than these not as the sum of fingers onto one person but as the sum of pairs of hands, and this is why there are no units after (:::::). This means that the number of fingers contained on two full hands is both a unit AND a radix- it can be expressed with either one or two digits, and this is probably why the need for a zero was ignored for so long, as what we think of as a multiple of "10" was expressed as a multiple of the units digit "ten". This created a mixed base10-base1 system where both the units and the digit after it go up to "both hands" and the only way to express larger numbers is by reduplicating hands. I.e. any digits to the left of the "hands" digit are in unary, and what we think of as 200 is would probably be expressed in this ancient language as "hundred-hundred". 20 would have originally been literally "two tens" rather than two times ten. Etymology for the number 20 in English agrees with this analysis of twenty as "two-tens". Numbers from 11 to 99 would have been expressed as the sum of full pairs of hands (tens) plus non-full pairs of hands (units).
Numbers on the scale of 100 typically aren't needed for specification by early stone age societies (who might have used a term for a large number akin to "zillion" but with a meaning corresponding to what we think of as hundreds or thousands. In fact, the word that came to mean "100" may have originally been this "zillion"-like term before become fixed as "ten tens". There is some practical indication of this: although we don't know the original usage of the PIE term (kmtom) or even if it was used as a definite number, its daughter language Proto-Germanic began calling ten tens "kmtom count" in their own language, indicating that the original kmtom was not a definite count to begin with. Etymologically, the PIE word for "100" appears to be some sort of conjugation of "ten", indicating it might have literally meant something similar to "many tens".
My hunts for etymology of "thousand" were inconclusive: there are reconstructed proto-germanic words for "thousand" but their origins are unknown and the complete incoherence between the oldest attested forms of "thousand" in various languages younger than PIE indicates there likely was no word for "thousand" in that language. This makes more sense if we assume we were right about PIE having no definite word for hundred and instead having a general "large number" term which became a definite term only in its daughter languages. Thousand need not derive from terms for smaller numbers, as the oldest attested ancestor of the Latin word for thousand appears to have been an abstraction derived from the grinding of grain.
The earlier nonpositional systems:
Latin: the long story with the history of each symbol:
Earlier languages tended to use a combination of base-10 and other bases. As described in the prehistory section, the earliest of these was probably a mixed unary-base ::., where counting started with each number below a fist (of ::. fingers) being represented by that actual number rather than an abstract digit, and a second "digit indicating whether there were one of such hands or two. Latin's smaller numbers exhibit this system: the numbers before a full hand are given in unary (|, ||, |||, and |||| in the initial stages of its development.) These strokes originated as tally marks and may either be simplifications to ease notching in rock (straight lines only), representations of the shapes of fingers, or both. ::. was represented with two crossed notches in a ^ shape, probably originally representing a hand, and x was two such double notches, probably originating as a simplified drawing of two hands together. In the original system, the entire system was unary and the extra notches on the ^ and x served merely as placeholders. For example, ::. was marked ||||^. However, since ^ is never marked before notching ||||, ^ eventually came to imply all marks before it, and ::. became just ^. The same happened with the rather copious ||||^||||x becoming just x. In a stroke of genius, the writers of the tally systems apparently realized it would be quicker to write |||| as |^, expressing it as the | before ^, and |||| became |^ and ^|||| became |x. Around this time, the tallies began to be used in the same texts as literary writings, and were quickly either confused or identified with the similar I, V, and X, and became replaced by them. (In the same way a confused stone-age Arabic-numeral society adopting the Latin alphabet might have started to use the numerals "IZ34S67B9". the tenth V in the counting system received an extra stroke and became ᗐ. Attempts to make this more quickly identifiable transformed it into ⊥, and then the "we want it to look like our letters" occurrued again and it became L. The tenth X also received an extra stroke to become various symbols with the extra stroke in different places (but all vertical), but Ж was most popular. Yet another "we want it to be made of letters" occurred and it became ƆIC, which was abbreviated variously to Ɔ or C, with C of course winning because that was a Latin letter. The hundredth V was circled Ⓥ but became Ð when people attempted to make it faster to write and from there, if you had been paying attention to what happened to the previous letters, obviously became D. Similarly, the hundredth X Ⓧ became ∞ to ease writing, then ⋈ to resemble a letter, and then M. Larger numbers didn't become necessary until after the fall of the western Roman Empire, and so are less important here.
Latin: the short story:
Latin started out counting in tallies but put extra marks on multiples of five and multiples of multiples of five, which eventually evolved into only the multiples and individual tallies being used for numbers. The tallies were corrupted after the people insisted on writing them using only letters of their alphabet and the system gradually evolved from unary to a mixed base2-base5 system with characters for 1, 5, 2 5s, 50, 2 50s etc.
Arabic numerals: origins:
This system can be traced back to the Brahmi numerals. Unlike Roman Numerals, only the digits -, =, and ≡ are easily recognizable in Brahmi, and +, which comes after ≡, has only a small number of possible origins. However, due to the cursive script of the language it is nearly impossible to determine anything of the origins of the other symbols without extensive comparisons between languages and large numbers of original texts which are not known to still exist, as several strokes often became a single cursive stroke. This number system may have started as a tally system, but by the time of our first records of it it had already become a hybrid exponential base-10-100 script. That is, it had 20 numerals, the first equivalent to 123456789 and the second equivalent to 10 20 30 40 50 60 70 80 90. A number like 78 for example, is written with the unit "70" and then the unit "8". For powers larger than these, each power of 10 had a separate symbol. For example, 4348 is written with the units "4" "1000" "3" "100" "40" "8". The first nine of these symbols were later adopted into a then-new positional system which added a zero and a negative sign (which was originally +, funnily enough) as well as a decimal point, creating a positional system identical to the modern one except for the use of different symbols and the lack of a bar to indicate repeating numbers. This bar was introduced many centuries later by Muslim scholars. The symbols used for the units evolved over time, shown in the preceding image from the second-to-bottom row to the top. When introduced to Europe in the 12th century, Arabic numerals were adopted in a text-like vertical positioning system that gave the numbers "tails" and "necks" like modern English's g's and h's respectively. Centuries later, as typesetting gained speed, attempts to create a uniform system of numerals with all the numerals at the same height and depth became popular and largely replaced these old "tailed and necked" numbers. Originally, Arabic numerals were restricted to the intellectual elite, who learned them alongside Roman numerals and were able to convert between the two, but as more difficult mathematics involving multiplication and division became common in European life, the Arabic numerals spread to different classes, making Roman numerals obsolete for regular mathematics by the 16th century.
The language of numbers and its relevance to the base-debate:
English is currently mostly based in base ::::: (ignoring words like dozen, gross, great gross, and score). It could in its current form be used to express numbers in lower bases by simply skipping certain words, like skipping from eight to ten and from eighteen to twenty if using base (::::.). However, the fact that it has been based in base-::::: for so long means that there are now ambiguities as to the relation between the word, its meaning as a quantity, and its meaning as a representation. For example, I've heard arguments that "10" in hex is not "ten" but "sixteen", and this seems to be the most popular opinion because people seem to insist that the meaning of a representation is only meaningful is base 10. However, communicating what is written in another base is more efficient if you communicate in the same base as you write, as this eliminates the need for the speaker to translate from the "written base" to the "spoken base". Under such a system, the number after 1F in hex is called "twenty", for example. Although the English words are ultimately based etymologically in base :::::, most people are unaware of these histories and are therefore unlikely to be confused that "ten" in hex does not match its pronunciation as a butchered PIE "two hands". However, people still associate the name with a fundamental quantity in unary (they think of "ten" as not 10 but :::::, an association that makes it difficult to adopt a language in a different base using the same number words.) However, the simplest language for communicating in a base above ::::: would recycle all existing words so that it's easier to learn the (now small number of) new words. For example, in a base 12 system with the new units "ah" and "be", pronunciation after nine would go "ah, be, ten, eleven...nineteen, ahteen, beteen, twenty...ninety-be, ahty...bety...bety-be, one-hundred". In this case, "ah" is used instead of the English pronunciation of "A" because "1A" and "18" are homophones or near-homophones under the latter pronunciation. Also "4D" sounds like "40" in systems containing the unit D.
The possible proposals and their merits. In this section, I have expressed those numbers for which base-::::: and the given-base representation refer to different quantities using dots. For example, decimal 6 is represented as ::: in base 4 but 6 in base 7.
Base (.) (Unary): This base has only one unit, and the value of the number is determined by the number of digits present. As such, unary is a "degenerate" positional system as each successive position is "increased" by a multiple of (.). Because of this, any shapes and positions of digits can be used arbitrarily without problems (:::.. is the same as .:.:.. is the same as 0;258= , determined only by the number of marks), and the numbers can even be two-dimensional as positions are irrelevant. Because numbers take so long to write in this system (it's identical to a tally system), this system is only useful as a last-resort backup to represent numbers between two parties that use different bases but are unable to understand eachother's bases or number language. This base is also largely useless for multiplication and division as these operations require physically dividing or multiplying groups of digits like in an abacus (::::/: equals "::" :: equals ::)
Base : (Binary): This base allows for all the mathematics of base-:::::, but because of its much lower number of unique units, numbers are much longer than in base :::::. For example, the age of the universe in binary is about 1.101*10^100001 years (I have written everything, even the exponent, in binary), where even the exponent used for scientific notation (which is 33 in base :::::) takes a lot of space to write. The number itself, only 14 billion in decimal, is roughly 1.01 decillion in binary (a billion trillion trillions). Base : is also prime, meaning that if you divide orders of magnitude in this language by anything other than : or ., you will end up with a repeating decimal. The only really useful purpose of this number language is to converse with today's computers, but this use is rendered mostly obsolete by communication in base 8 or 16, for which there is an exact 1-3 or 1-4 correspondence in digits, respectively.
Base :. (Ternary): This base has more compact numbers than binary, but the radix is still a prime number so division is difficult. However, due to the nature of data storage (the information cost of a number is the product of the number of digits it has in a base and the number of possible states each digit has) :. is theoretically the *most efficient* base for integer storage, even more so than : (See here for how these calculations work out. However, because this number is not divisible by : without a repeating decimal, and its computing purposes would be more quickly represented in base (::::.) or (::::: ::::: :::.), this base is not very useful for human expression. The age of the universe in ternary is about 1.1*10^210 years , or roughly 1.1 sextillion (a billion trillions).
Base :: (Quaternary): This base allows for more divisibility than binary, but is still a poor semiprime (It only has three unique divisors). It has been found in a language native to the Americas but is a relatively poor choice for a base because the base is still rather small and is poorly dividable. The age of the universe in this base is about 3.1*10^40 years in this base, or roughly 31 quadrillion (a thousand trillions).
Base ::. (Quinary): This base has very few uses (it's prime) except to express the number of full hands required to express a number using said hands as a radix. It's been found in a few still living languages and has historically been used in compound base:-base ::. systems by the Romand and ancient Chinese where the digits were organized in groups of seven (two to indicate the presence or lack of a five, and five to indicate the number of ones, up to four and including zero). The age of the universe in this base is about 2.12*10^24 years, or roughly 212 trillion.
Base ::: (Senary): This is perhaps the first base useful for division, as it's divisible by both two and three. This base has been found in a living language and has been hypothesized for a few extinct languages. The age of the universe in this base is about 1.02*10^21 years , or roughly 10.2 trillion. This base might arguably be more convenient than base :::::, as numbers in this base are easily divisible by 2 or 3 and the additional lengths of numbers is not very significant.
Base :::. (Septenary): Another unremarkable prime base, and therefore a poor choice for a radix. The age of the universe in this base is about 1*10^15 years, or about 1 trillion.
Base :::: (Octal): This base is mostly useful for more quickly expressing binary numbers by grouping them in groups of 3. This is the largest such grouping possible without introducing more unique units digits to the base-::::: system. This base is perhaps as useful as senary, although it is divisible by four instead of three. The age of the universe in this base is about 1.5*10^13 years, or about 150 billion.
Base ::::. (Nonary): This base is analogous to the base 4 system but with the prime being 3 instead of 2. Because numbers in this base are not always easily divisible by 2, this base is probably about as useful as base-4, with shorter numbers but more complicated division. The age of the universe in this base is about 4.01*10^11 years, or about 40 billion.
Base ::::: (Decimal): This base is analogous to the base ::: but with 5 as a divisor instead of 3. Unfortunately, dividing by 5 is a rarer occurrence than dividing by 3 in trade agreements, so although this base takes less space to write numbers than base ::: it has the well-known problem that dividing by three often produces non-terminating decimal numbers. Counting on fingers is *not* a decimal system, it is mixed unary-binary as the positions of the fingers have no importance and each digit has only two meaningful states. Treating both hands as a single digit in representation, however, creates a mixed unary-decimal system. The age of the universe in this base is about 1.4*10^10 years, or about 14 billion.
Base ፧፧፧: (Undecimal): This radix is also prime. Its only differences from base-7 are the specific prime and the fact that computer storage in this base is more costly and written storage is less space-filling. Therefore it makes a very poor choice as a radix. The age of the universe in this base is about 5.a3*10^9, or about 5.a3 billion. After base 10, the general format for the units after 9 are A, B, C, etc. all the way to Z at base ⁞⁞⁞ ⁞⁞⁞ ⁞⁞⁞
Base ፧፧፧፧ (Duodecimal): This is apparently the most popular suggestion for the adoption of a new base. It is divisible by the first four positive integers, allows for numbers to be written in slightly shorter forms than base A, and requires only two additional unique units. Although these new digits are called A and B as part of a system to allow expansion up to base-⁞⁞⁞ ⁞⁞⁞ ⁞⁞⁞, those suggesting the adoption of duodecimal have coined two new "number-looking" symbols from the rotating the first two units that have minimal rotational symmetry. These digits are ᘔ and ᘍ. Arguably, fractions used in daily life terminate more often in this system because 1/3 is more often encountered than 1/5. The age of the universe in this base is about 2.86*10^9, or 2.86 billion.
Base ⁞⁞⁞. is another prime, and displays the same patterns in usefulness as the previous primes; that is to say, little usefulness.
Base ⁞⁞⁞: is another semi-prime. Its advantages and disadvantages to base A are largely identical to A's advantages and disadvantages over base 6
Base ⁞⁞⁞፧ is another semi-prime, but isn't even divisible by 2 so its usefulness is very limited.
Base ⁞⁞⁞⁞ (hexadecimal) is highly useful as the last base allowing grouping of binary digits that humans can easily calculate in. It also allows for 1-2 correspondences to digits in base 4. This base's usefulness and accessibility for humans grouping binary computer digits has made it a staple of computing, where two hex digits correspond exactly to a byte. The age of the universe in this base is about 3.427*10^8 years (Which rounds to 3.42 in this base since 7 is less than 8, which is half of 10) or about 342 million. This rounding brings up an interesting question: rounding in hex produces a maximum error of 8 in the selected rounding area, while in decimal rounding produces a maximum error of 5 in the selected rounding area. With such large errors, it might be useful to "semi-round" where the units 0124 round to 0, 56789AB round to 8, and CDEF round to 10.
Larger bases progressively become harder and harder to use as they involve using powers of larger and large numbers. By the time you get to base ⁞⁞⁞⁞⁞: for example, you have numbers that take up three digits which would take up five in base A. Counting to 1000 at a rate of one number per second in this base takes nearly 3 hours while the same takes less than 1/9 the time in base A. However, the age of the universe expressed in this base is a more concise 5.DA*10^7 years, or about 5.DA million.
So, reading what you guys have from this, what do you guys think? Questions and discussions on the language of numbers and the theoretical adoption of other bases is welcome in this thread.
Edited by Gondor2222 - Wednesday, 14.08.2013, 00:51
Some have argued that Hittite is a sister language of IE, that IE and Hittite share a common ancestor. This to explain some features in Hittite that IE lacks, such as the three genders. It's a finer point of classification in my opinion, and the gender system in Hittite just serves as an excellent explanation for how genders arose in IE.
Quote (Gondor2222)
reconstructions indicate that the name for "five" was probably the word for "fist"
I don't think that is well supported. I think it's lost to obscurity. It has been proposed that IE penkʷe is the postfix "and" (-kʷe) and a word for thumb, then meaning "and the thumb". A colourful explanation, suggesting that counting arose from naming the individual fingers and last and fifth the thumb. But I doubt it.
Quote (Gondor2222)
the name for "ten" was probably the two words for "two hands"
dekṃ(t) = de-kṃt ("two hands")? It's really tempting to read the de- as "two". And "kṃt" looks very similar to Germanic "hand" which is otherwise unexplained. But this I think belongs to pure speculation. It looks more likely that dekṃt "ten" and kṃtom "hundred" are related (not necessarily excluding the "two hands" hypothesis), but how or even which way doesn't seem entirely clear.
What is pretty certain is that our counting system arose from the number of fingers (otherwise a more practical base would have been chosen), and that the names for the numbers are very, very old and have changed relatively little. They're certainly recognisable if we go back more than 5000 years.
Note that there is another non-IE language in Europe not related to anything else, Basque. It's probably the last surviving remnant of a language group that was widespread in Europe before the IE invasion. It uses a vigesimal counting system, and it has been suggested that Basque has influenced French, Danish and Celtic languages which have a semi-vigesimal system.
Quote (Gondor2222)
its daughter language Proto-Germanic began calling ten tens "kmtom count" in their own language, indicating that the original kmtom was not a definite count to begin with.
I'm not sure what you're saying here, but it's worth noting that "hundred" in Germanic languages meant "120" well into the middle ages. It may be an early Germanic invention.
Quote (Gondor2222)
then M. Larger numbers didn't become necessary until after the fall of the western Roman Empire, and so are less important here.
There was a need for larger numerals (than 1000) even during the republic. Such as in economy. There exist a couple of notations: adding lines around the letters or stacking up like this: CCCCIƆƆƆƆ (deciens centena milia, 1,000,000), but I don't have my books here to check what was used in the different periods. I think, though, such extra notation frequently was omitted, assuming it was understood whether the counting was in singles, thousands or 100,000's.
A discussion of the Babylonian numerals would fit in here as well, which was sexagesimal (well, more precisely decimal-sexagesimal perhaps). The system survives in our minutes, seconds and angles.
Added (14.08.2013, 20:37) --------------------------------------------- In my copy of my favourite encyclopedia, Natural history by Pliny, there are many big numbers, which are either written in full (such as "ā turbidō ad lūnam uīciēns centum mīlia stadiōrum" "from the windy air to the moon two million stades (=370,000 km)). Other places regular numerals with strokes above and/or to the sides to multiply with 1000 and 100,000. But the manuscript has been copied many times at the notation could have been updated in medieval times. So we must look at inscriptions to find certain examples of ancient use. In republic inscriptions I find symbols for 5000, 10,000 and 100,000. 5000 and 10,000 look like an extra D or M stacked outside an inner one, not unlike CCIƆƆ which I already mentioned. So such symbols did exist. The 100,000 symbol (circle with a vertical bar and to V's inside) was on an inscription dated as early as c. 260 BC.
NIL DIFFICILE VOLENTI
Edited by midtskogen - Wednesday, 14.08.2013, 19:44