Language information and sources (being revised 1/21/10 after all these decades) in no particular order.

  1. database of language structures (
  2. "An encyclopedic reference work cataloging all of the world’s 6,909 known living languages." ( (this is the work of the SIL .) Read more.
  3. The Irvine Phonotactic Online Dictionary (IPhOD) is a large collection of English words and pseudowords developed at UC Irvine for research on phonological processes in speech perception and production. It may be used to select items for experiments, according to sublexical and lexical phonological measures: for example, how many words sound like "cat", how often do the "KA" or "AT" sound sequences occur in English? IPhOD is freely available online to search or download, so other researchers can use it in their studies. Vaden, K.I., Hickok, G.S., & Halpin, H.R. (2009). Irvine Phonotactic Online Dictionary, Version 2.0*. [Data file]. Available from
  4. the worst sounds in the world. (Contribute your data!) Funny ones too!
  5. "Talking Brain" blog; views on the neural organization of language.
  6. Cognitive tests online including auditory and other reaction times
  7. CHILDES language acquisition database (
  8. Access CHILDES for word frequency counts by age and mlu.
  9. Child language vocabulary norms online (CLEX).
  10. International phonetic association (IPA) -- a system for representing all possible speech sounds. See their phonetic alphabet page.
  11. Latent Semantic Analysis - could a computer grade your essay exams? Latent Semantic Analysis (LSA) is a theory and method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text. The underlying idea is that the totality of information about all the word contexts in which a given word does and does not appear provides a set of mutual constraints that largely determines the similarity of meaning of words and set of words to each other. Even if it can grade your essays, does that have much to do with the way our minds work? LSA people say "yes."
  12. The MRC Psycholinguistic Database is a machine usable dictionary containing 150837 words with up to 26 linguistic and psycholinguistic attributes for each - psychological measures are recorded for only about 2500 words. The dictionary may be of use to researchers in psychology or linguistics to develop sets of experimental stimuli, or those in artificial intelligence and computer science who require psychological and linguistic descriptions of words.
  13. WordNet® is a large lexical database of English, developed under the direction of George A. Miller. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Check out its cousin, "Imagenet."
  14. Written sign language (ASL)?
  15. Fourier analysis tutorial

16. Anagrams created online.

17 the Lord's prayer spoken in Old English. Beowulf prologue in Old English. (Several examples on Youtube and elsewhere.)

18. Eliza (not the best I've seen.)

19. Michigan language corpus (lots of transcribed language)