Wednesday, April 13, 2005

You be the judge...

Ok, so it has been nearly a month since my last post. Despite my reassurances, I bet some of you were wondering if I had forgotten, or simply lost interest. I assure you that neither of those is true. What, then, have I been doing? Allow me to explain.

You may or may not believe this, but I love making work for myself. I know, I know, who doesn't? But by this I don't mean that I am not very wise and usually make decisions that delay or completely unlgue processes in progress. Well, I may be guilty of that sometimes, but when I say that I love making work for myself, what I mean is that I love creating things for me to do. This blog was one such thing. So was and is my book. So is the program I have been writing for the past three weeks.

Yes, for those of you who are even the slightest bit alert, you know that I have a degree in computer science. I acquired this degree with the intention of becoming a successful IT professional. I found out toward the end that my perceived love of programming was really rooted in my love of linguistics. I also found out that it is fairly difficult for a fresh computer science graduate to jump right into being successful.

So, one year away from graduating I thought, "Well, I've come too far to stop now." I've seen people quit college that close, and even closer, to graduating and wondered how they could possibly throw away all that hard work. And while I was convincing myself to trudge stoically through my senior level CS classes, I was voluntarily checking out volumes from the school library on such gripping subjects as "the origin of language" and "applied linguistics". While I was learning ultra-cool phrases like "labial dental fricatives" (which is not as suggestive as it sounds), my GPA was hovering somewhere around "help me"! See what I mean about making work for myself?

So, you would think that now that I'm out of school and working a mildly professional IT type job, despite the fact it is not what I am passionate about, I would never willingly write code in my spare time. Well, you would think...but you'd be wrong. See, I do like programming, but only when it's my own project. Programming is fun when you don't have a deadlines or strict corporate procedures to follow. Granted, those things can make for clean, efficient code, but they can be inhibitive to the creative process I believe is necessary to write truly useful and dynamic programs.

I believe a programmer is a kind of writer, especially when it is his passion. Programming is certainly useful to me in expressing my passion, but it in itself is something I am not passionate about. I shall elaborate.

As a writer I use language to express my creativity, but it is not my only outlet; I also create language. I am what as known in some circles as a "conlanger". A conlanger is a person who engages in the creation of a nonexistent language. The reasons anyone would do this are many and varied, but ultimately it comes down to entertainment. Even L.L. Zamenhof, who created Esperanto, was, I'm sure, entertained by the hope of everyone speaking his "international language". The reason I do it? Well, my aforementioned love of linguistics and writing figure heavily into it. It also helps to give depth to the races in my book. This idea is not unlike how J.R.R. Tolkien wrote Lord of the Rings to create a backdrop where his Elvish languages flourished.

So, what does this have to do with programming? Well, as a conlanger, one of the most frustrating and tedious exercises is creating a lexicon (that's dictionary, for the laypeople). It has been said that, in order to have a language where one can communicate fully and comfortably, that language must have a lexicon of about 2,000 words. Yes, folks, coming up with a grammar is the easy part. You can flesh out the syntax of a constructed language in a couple of hours, but a 2,000 word vocabulary.

a..
abacus..
abase..
abash..
abbess..
abbey..
abbot..
abdicate...


Yeah...abdicate. I abdicate from writing this whole damn dictionary. Fortunately for those with short attention spans, like me, there are programs out there that will randomly generate a 2,000+ vocabulary for you in about two seconds. It's actually pretty easy to write a program that will string together a few random characters in mass quantities. What is not easy is making those random characters sound like they come from the same language.

The individual sounds we make when speaking are called phonemes. Within a specific language, speakers use a specific set of phonemes. The alveolar flap r in Spanish is a different phoneme from the retroflexed alveolar approximant r in English. Aside from this distinction, there are also syllabic patterns to be taken into consideration. Many languages follow some kind of alternating vowel/consonant pattern. There are some spelling rules that come into play as well. For example, hobzgodjh is not a likely English word, despite the fact it is not too phonemically different from the word hopscotch.

While this is a lot of stuff to think about, it still isn't too terribly difficult to write a program that will do it. Jeffrey Henning's Langmaker software is one such program. So, problem solved, right? Wrong. You really didn't think it would be that easy, did you? Here's the next problem, and this one's a doozy. Aside from phonemic consistency, when creating a lexicon you have to worry about morphemic consistency.

Ok...wow. Should I slow down?

A morpheme is the smallest unit in a language that still has meaning. Paragraphs are made up of sentences, sentences are made up of words, and words are made up of morphemes. Sometimes a morpheme is a whole word by itself, but other times it is a affix or inflection of some sort. Take the word portability. It's morphemes are: the root word port (as in to take from one place to another), the suffix -abil (a variation on -able), and the suffix -ity (indicating that the word is describing a quality). Morphemic consistency is a common property of language. There are few languages in which all words are roots that stand alone without modification by other morphemes.

It's pretty tough to write a program that will build morphemic consistency into a randomly generated lexicon. Let me show you what I mean.

What I would theoretically want a program to do:

English:Fictional Language:
ableila
abilitypaula
portaun
portablela'aun
portabilitypaula'aun


What available programs will give me:

English:Fictional Language:
ableanu
abilityhino
portpa
portableinefu
portabilitynohanina


Now, I am making some assumptions about my fictional language for the sake of simplicity, but it helps illustrate the point. You might notice from the first example that the translations have some consistency. In the second example, the words might seem like they are based on a realistic phonology, but they are clearly completely random.

So, am I trying to write a program that will randomly generate a morphemically consistent lexicon? Well, I might consider it as a doctoral thesis when I go back to get my linguistics degree, but for now it's way more than I have the stamina for. Instead I am trying to develop a middle ground, lexical management solution if you will. It is a program that will help you generate your lexicon by hand and easily look up roots, morphemes, and related words while you do. It will tell you if the word you've just created is already in the lexicon and will even have the ability to randomly generate a word when you are out of inspiration.

Again, I say...I love making work for myself. I actually have all the data structures coded, I just need to work on the user interface. In the meantime I will keep plugging away. And for those of you who are afraid this distracts me from creative expression that is more accessible to those who appreciate literature instead of 1it3r4tur3, fear not. I am always writing...

No comments: