- Winter 2011. Lecturer, Ling 20001: Introduction to Linguistics: course page on chalk.
- Fall 2010. Lecturer, Ling 20101: Introduction to Phonetics and Phonology: course page on chalk.
- Winter 2010. TA for Ling 20001: Introduction to Linguistics, Prof. John Goldsmith: course page on chalk.
- Fall 2009. TA for Ling 20101: Introduction to Phonetics and Phonology, Prof. Alan Yu: course page on chalk.
- Fall 2008. TA for Phonetics, Prof. Jason Riggle: course page.
- Winter 2008. TA for Programming for Linguists, Prof. Jason Riggle: course page.
- Computational and corpus linguistics; phonological theory; quantitative typology and language change; variation; models of grammar and learning; complexity.
- Jun 18, 2011: cascadilla.cls now has its own project page, hosted at github.com.
- Jan 24, 2011: Posted preprint: Global properties of the phonological networks in child and child-directed speech.
- Jan 5, 2011: Posted some MRI images of my own vocal tract.
- Dec 28, 2010: Posted preprint: Longitudinal phonetic variation in a closed system..
- Dec 8, 2010: Updated cascadilla.cls to version 1.6.
- May 11, 2010: Updated cascadilla.cls to version 1.5.
- Apr. 14, 2010: A concise bibliography of work related to information entropy and phonological systems.
Presented at the LSA, Pittsburgh, 2011:
- How many language types are there? [slides]
Methods of estimating the number of unseen types in a sampled population are often used in biological ecology and studies of lexical richness, but have yet to be applied to linguistic typology. I apply a Bayesian method of estimating the number of unseen types in a population (Zhang and Stern 2009) to a typological sample of quantity-insensitive stress systems, giving a principled estimate of lower and upper bounds on the plausible number of possible language types, i.e., distinct stress systems. These bounds compare favorably with the typological predictions of Gordon's (2002) OT model of quantity-insensitive stress. (Friday, January 7, 4:00pm)
- Phonetic convergence among reality television
contestants, with Morgan Sonderegger
Previous work has shown that in short-term laboratory settings, aspects of one.s speech can change under exposure to the speech of others, and that this change is mediated by social variables. The implicit hypothesis is that phonetic convergence can help explain dialect formation and social stratification of speech. A link between laboratory results and community-level change is needed to show that convergence is a possible source of socially stratified change. We address this question using data from reality television. Our results show significant longitudinal change in voice onset time for four speakers over 13 weeks, mediated by social interaction. (Friday, January 7, 9:00am)
- Submitted. Bane, Max, and Jason Riggle. When more choice means less freedom: a note on candidate sets and typologies.
- In press. Carlson, Matthew T., Morgan
Sonderegger, and Max Bane.
Global properties of the phonological networks in child and child-directed speech. In
BUCLD 35: Proceedings of the 35th annual Boston University Conference on Language Development. [preprint]
This paper uses the perspective of network theory (Vitevitch, 2008) to compare phonological neighborhood structure in child speech (CS), child-directed speech (CDS), and adult-directed speech (ADS) lexicons. Previous work on the phonological lexicon has focused on neighborhood density, a property of individual words. The network approach focuses on global properties of the "phonological networks" of entire lexicons as a novel approach to the phonological structure of children's input and early lexicons. The networks of CS, CDS, and ADS were constructed using all orthographic word types as nodes, with edges between phonological neighbors and homophones. A number of metrics that have been associated with network stability and searchability were evaluated (Arbesman, et al., 2010): average shortest path, clustering coefficient, transitivity, and assortative mixing by degree. The results suggest that phonological networks in CS and CDS may be more stable and searchable than in ADS, and that CS and CDS are remarkably similar in their structural properties, despite a large difference in the size of the CS and CDS networks. This tentatively supports the hypothesis that the global neighborhood structure of the CDS lexicon presents favorable conditions for language development. Considering the challenges of using local neighborhood density to compare child and adult lexicons (Charles-Luce & Luce, 1990; 1995; Coady & Aslin, 2003), comparison by global network properties can shed new light on the role of neighborhood structure in child vocabulary development.
- To appear. Bane, Max. A grammar sampling model of variation in the English dative alternation. In Proceedings of the 36th annual meeting of the Berkeley Linguistics Society.
- In press. Bane, Max. Deriving the structure of
variation from the structure of non-variation in the English
Proceedings of the 28th annual meeting of the West Coast
Conference on Formal Linguistics. [preprint]
We can distinguish two general approaches to modeling probabilistic, variable linguistic behavior. The first, which we can call the gradient grammar approach, incorporates gradient, real-valued parameters directly into the grammar as objects whose values must be fitted or learned. In contrast, a second approach, grammar sampling, posits random sampling from a set of parameter values for some categorical grammar, with membership in the set determined by learned categorical restrictions. The primary thrust of this paper is to demonstrate that in a grammar sampling model of the English dative alternation, the categorical facts contain the seed of the variable facts. That is, the grammatical implications of what doesn't vary restrict the set of possible grammars to one that, when sampled from randomly, corresponds closely to the frequency structure of what does vary.
- In press. Bane, Max, Peter Graff, and Morgan Sonderegger. Longitudinal phonetic variation in a closed system. In Proceedings of the 46th annual meeting of the Chicago Linguistic Society. [preprint]
2010. Bane, Max, Jason Riggle, and Morgan
Sonderegger. The VC dimension of constraint
Lingua, 120:5, pp. 1194-1208. [preprint | paper]
We analyze the complexity of Harmonic Grammar (HG), a linguistic model in which licit underlying-to-surface-form mappings are determined by optimization over weighted constraints. We show that the Vapnik-Chervonenkis Dimension of HG grammars with k constraints is k-1. This establishes a fundamental bound on the complexity of HG in terms of its capacity to classify sets of linguistic data that has significant ramifications for learnability. The VC dimension of HG is the same as that of Optimality Theory (OT), which is similar to HG, but uses ranked rather than weighted constraints in optimization. The parity of the VC dimension in these two models is somewhat surprising because OT defines finite classes of grammars---there are at most k! ways to rank k constraints---while HG can define infinite classes of grammars because the weights associated with constraints are real-valued. The parity is also surprising because HG permits groups of constraints that interact through so-called `gang effects' to generate languages that cannot be generated in OT. The fact that the VC dimension grows linearly with the number of constraints in both models means that, even in the worst case, the number of randomly chosen training samples needed to weight/rank a known set of constraints is a linear function of k. We conclude that though there may be factors that favor one model or the other, the complexity of learning weightings/rankings is not one of them.
2009/in press. Bane, Max and Jason Riggle. The
typological consequences of weighted constraints.
Proceedings of the 45th Annual Meeting of the Chicago
Linguistic Society. [ROA
A common "typological criterion" on linguistic models is that they should predict (almost) all observed patterns while minimizing overgeneration. For optimization-based models, it has been argued that constraints should be ranked rather than weighted to minimize overgeneration. Recently, however, weighting has been shown to elegantly capture patterns that ranking misses. To evaluate the issue, we provide software that builds ranked/weighted-typologies. We find that some independently motivated restrictions eliminate much overgeneration but that, in general, weighting leads to numerous novel (and odd) constraint interactions.
2009/in press. Bane, Max and Ed King. Local predictability in
the lexicon. Proceedings of the 45th Annual
Meeting of the Chicago Linguistic Society. [preprint]
We present a general methodology of assessing how locally driven the lexical phonotactics of a language are, by comparing its attested lexical material to some hypotheses of what else that material ``could have been'' (here, permutations and edits), in terms of its local, bigram predictability (pointwise mutual information). In other words, we ask how ``optimized'' the lexicon is for local predictability within some space of possible lexica. We apply the method to seven languages and find that they exhibit interesting variation in the degree to which they are optimized in this way, but that every language is significantly more locally optimal than random; and furthermore, that the languages which appear least optimized in this sense are precisely those known to obey important nonlocal lexical tendencies (vowel harmony).
2009/in press. Bane, Max, Jason Riggle, James Kirby and John Sylak.
Multilingual learning with parameter co-occurrence
clustering. Proceedings of the 39th Meeting of
the North East Linguistic Society. [preprint]
The computational task of language learning has long been a central issue in theoretical linguistics, and most work has focused on its monolingual formulation, in which the learner's sample is drawn from a single target language. This paper considers a minimal extension of the usual monolingual formulation to accommodate the multilingual setting, and presents a novel strategy for discriminating and learning languages within it by clustering grammatical properties according to their co-occurrence in the sample. The heuristic that we propose is generic in the sense that it is applicable within any parameterized linguistic theory for which it is feasible to compute the possible parameter-settings implied by observing a single input-output mapping; for purposes of concreteness and evaluation, we present the algorithm within the framework of Optimality Theory, using syllable structure grammars as a case study.
2008. Bane, Max and Jason Riggle. Three correlates of the
typological frequency of quantity-insensitive stress
systems. Proceedings of the Tenth Workshop of the
Association for Computational Linguistics' Special Interest
Group in Morphology and Phonology. Rutgers Optimality
Archive #966. [preprint]
We examine the typology of quantity-insensitive (QI) stress systems and ask to what extent an existing optimality theoretic model of QI stress (due to Gordon, 2002) can predict the observed typological frequencies of stress patterns. We find three significant correlates of pattern attestation and frequency: the trigram entropy of a pattern, the degree to which it is ``confusable'' with other patterns predicted by the model, and the number of constraint rankings that specify the pattern.
2008. Bane, Max. Quantifying and Measuring Morphological
Complexity. Proceedings of the 26th West Coast
Conference on Formal Linguistics, 67-76. [paper]
It is a standard assumption in linguistics that all human languages are equally (and enormously) complex; when looked at as a whole, no language can be called "simpler" than another. Certainly, languages can differ in the distribution of their complexity, so that one might employ a richer inflectional system, or entertain a more complicated gamut of syllable shapes than another, but it is generally supposed that these differences must "even out" as one considers entire linguistic systems. A number of researchers have recently begun to approach this equal complexity hypothesis as an empirical claim to be tested under particular definitions of complexity. Perhaps the most famous recent example is McWhorter's (2001) controversial claim that "creole grammars are the world's simplest grammars," but see also Juola (1998), Shosted (2006), Nichols (2007), and Pellegrino (2007). This paper argues for an information theoretic approach to defining linguistic complexity and offers preliminary results for a novel method of using the mathematical notion of Kolmogorov complexity together with an automatic lemmatizer to construct a numerical metric of morphological complexity.
- 2007. Jason Riggle, Edward King, James Kirby, Max Bane, Heather Rivers, Evelyn Rosas, John Sylak. Erculaor: A Web Application for Constraint-Based Phonology. In University of Massachusetts Occasional Papers in Optimality Theory 36: Papers in Theoretical and Computational Phonology. [preprint]
- 2009. Bane, Max. Predicting usage with a competence grammar: Variation in the English dative alternation. Qualifying Paper, Dept. of Linguistics, University of Chicago. Advised by Jason Riggle and Karlos Arregi.
- 2008. Bane, Max. Modeling the typology of quantity-insensitive stress systems. Qualifying Paper, Dept. of Linguistics, University of Chicago. Advised by Jason Riggle and Alan Yu.
- 2009/in press. Bane, Max, Juan Bueno, Tommy Grano, April Grotberg, Yaron McNabb (eds.). Proceedings of the Chicago Linguistic Society 44, vols. 1 and 2. Chicago: The Chicago Linguistic Society.
- 2010. Bane, Max. Deriving the structure of variation from the structure of non-variation in the English dative.28th West Coast Conference on Formal Linguistics. [slides].
- 2010. Bane, Max. A combinatoric model of variation in the English dative alternation. Berkeley Linguistics Society 36. [slides].
- 2009. Bane, Max. Longitudinal Phonetic Variation in a Closed System. U. Chicago Council on Advanced Studies Workshop on Language, Cognition, and Computation.
- 2009. Bane, Max and Jason Riggle. Evaluating Strict Domination: The Typological Consequences of Weighted Constraints. 45th Annual Meeting of the Chicago Linguistic Society. (Delivered by Jason Riggle). [handout]
- 2009. Bane, Max and Ed King. Local Predictability in the Lexicon. 45th Annual Meeting of the Chicago Linguistic Society. [slides]
- 2009. Bane, Max. Grammatical Correlates of Cross-linguistic Frequency: Quantity-Insensitive Stress. U. Chicago Council on Advanced Studies Workshop on Language and Cognition. [slides]
- 2008. Bane, Max, Jason Riggle, James Kirby and John Sylak. Multilingual learning with parameter co-occurrence clustering. 39th Meeting of the North East Linguistic Society. [slides, handout]
- 2008. Bane, Max and Jason Riggle. Three correlates of the typological frequency of quantity-insensitive stress systems. The Tenth Workshop of the Association for Computational Linguistics' Special Interest Group in Morphology and Phonology. [slides]
- 2008. Riggle, Jason, Max Bane, James Kirby, John Sylak. Distinguishing Grammars in Multilingual Learning Using Parameter Co-occurrence. 82nd Annual Meeting of the Linguistic Society of America.
- 2007. Multilingual Learning as Parameter Co-occurrence Clustering. U. Chicago Council on Advanced Studies Workshop on Language and Cognition. [handout]
- 2007. Quantifying and Measuring Morphological Complexity. 26th West Coast Conference on Formal Linguistics. [slides]
- 2007. Riggle, Jason, Max Bane, James Kirby, Jeremy O'Brien. Efficiently Computing OT Typologies. 81st Annual Meeting of the Linguistic Society of America.
- Python module implementing Gale and Sampson's (1995/2001) "Simple Good Turing" method of frequency estimation/smoothing: Github project page.
- UK Big Brother, season 9 sociophonetics: cleaned up minute-by-minute log of the season (2.7 MB), as writen by some poor Sun reporters; a video (42 MB, mpeg4-encoded avi) of weighted pointwise mutual information between contestants' being mentioned together over time in the log; daily graphs of contestants' co-occurrence in the log.
- Syllable structure typological data: tds-syllprops.xml and tds-syllstructure.xml, some XML-structured results extracted from the Typological Database System at Utrecht.
- Quantity-insensitive stress stats: qistress-allStats.csv, CSV table of statistics used in Bane and Riggle (2008), Three correlates of the typological frequency of quantity-insensitive stress systems.
- Bible corpora: bible-corpora.tar.bz2 (49 MB). Plaintext translations of the bible into 24 languages, namely: Albanian, Croatian, Czech, Danish, Dutch, English, French, German, Greek, Haitian Creole, Hiligaynon, Hungarian, Icelandic, Italian, Latin, Maori, Norwegian, Portuguese, Romanian, Russian, Slovak, Spanish, Swahili, Swedish. Automatically collected, cleaned, and combined from www.biblegateway.com. For some languages only the New Testament was available.
- MRI images of my vocal tract!
- Praat-Py for OS X
Praat's built-in scripting language is idiosyncratic. Fortunately, Joshua Tauberer maintains Praat-Py, a patched version of Praat that supports Python, a much more widely known and used scripting language. Joshua only maintains built binaries for Windows and Linux, so I've taken some time to create a native Mac OS X build, which you can download here:
This version is based on Praat version 5.1.19 (21 October 2009), and version 0.7 of Joshua's patch. Just download, unzip, and copy Praat-Py.app to your Applications folder to install.
I should mention that I've only tested it on Intel-based Macs, since that's what I have available, but in theory it's a "universal binary" that should work on any reasonably recent PowerPC-based Mac.
This LaTeX class provides an extension of the "article" document class that can be used to typeset papers conforming to the style sheet of the Cascadilla Proceedings Project, which is used by a number of linguistics conference proceedings (e.g., WCCFL). Suggestions, questions, and bug reports should be directed to Max Bane.
As of June 18, 2011, cascadilla.cls is now hosted on github.com, with its own project page, where you can download the most recent version, submit and read bug reports, follow changes to the source code, and even submit your own source code improvements.
cascadilla.cls is now also available on CTAN, the Comprehensive TeX Archive Network!.
- TDS: the Typological Database System. Excellent, excellent resource for typological data.
- Jeff Heinz's beautiful stress pattern database.
- Linguistica, the automatic morphology induction system.
- The Chicago Phonology Laboratory.
- The Chicago Linguistic Society. Oldest, and still best, student-run linguistic society in the US.
- Workshop on Language, Cognition, and Computation. A recurring workshop at the U of C that I have occasionally helped organize and presented at.
- Alice Lemieux recently (Spring 2008) organized a very nice series of talks by local professors on "getting started in linguistic research". Copies of slides/handouts are available here.
- Computer Science Pizza Seminar. A good place to meet nerds.
- A map of Chicago coffeeshops, thanks to Mr. Sonderegger.