Teaching | CV | Papers | Presentations | Data | Links
I am a PhD student in Linguistics, and a research assistant in the Chicago Language Modeling Lab.

Teaching

Curriculum Vitae

Research Interests

  • Computational linguistics; quantitative typology and language change; variation; models of grammar and learning; complexity.

What's New

Papers

  • 2009/in press. Bane, Max, Jason Riggle, and Morgan Sonderegger. The VC dimension of constraint based grammars. Lingua. [preprint | paper]
    We analyze the complexity of Harmonic Grammar (HG), a linguistic model in which licit underlying-to-surface-form mappings are determined by optimization over weighted constraints. We show that the Vapnik-Chervonenkis Dimension of HG grammars with k constraints is k-1. This establishes a fundamental bound on the complexity of HG in terms of its capacity to classify sets of linguistic data that has significant ramifications for learnability. The VC dimension of HG is the same as that of Optimality Theory (OT), which is similar to HG, but uses ranked rather than weighted constraints in optimization. The parity of the VC dimension in these two models is somewhat surprising because OT defines finite classes of grammars---there are at most k! ways to rank k constraints---while HG can define infinite classes of grammars because the weights associated with constraints are real-valued. The parity is also surprising because HG permits groups of constraints that interact through so-called `gang effects' to generate languages that cannot be generated in OT. The fact that the VC dimension grows linearly with the number of constraints in both models means that, even in the worst case, the number of randomly chosen training samples needed to weight/rank a known set of constraints is a linear function of k. We conclude that though there may be factors that favor one model or the other, the complexity of learning weightings/rankings is not one of them.
  • 2009/in press. Bane, Max and Ed King. Local predictability in the lexicon. Proceedings of the 45th Annual Meeting of the Chicago Linguistic Society. [preprint]
    We present a general methodology of assessing how locally driven the lexical phonotactics of a language are, by comparing its attested lexical material to some hypotheses of what else that material ``could have been'' (here, permutations and edits), in terms of its local, bigram predictability (pointwise mutual information). In other words, we ask how ``optimized'' the lexicon is for local predictability within some space of possible lexica. We apply the method to seven languages and find that they exhibit interesting variation in the degree to which they are optimized in this way, but that every language is significantly more locally optimal than random; and furthermore, that the languages which appear least optimized in this sense are precisely those known to obey important nonlocal lexical tendencies (vowel harmony).
  • 2009/to appear. Bane, Max and Jason Riggle. Evaluating strict domination: The typological consequences of weighted constraints. Proceedings of the 45th Annual Meeting of the Chicago Linguistic Society.[preprint available soon]
  • 2009/in press. Bane, Max, Jason Riggle, James Kirby and John Sylak. Multilingual learning with parameter co-occurrence clustering. Proceedings of the 39th Meeting of the North East Linguistic Society. [preprint]
    The computational task of language learning has long been a central issue in theoretical linguistics, and most work has focused on its monolingual formulation, in which the learner's sample is drawn from a single target language. This paper considers a minimal extension of the usual monolingual formulation to accommodate the multilingual setting, and presents a novel strategy for discriminating and learning languages within it by clustering grammatical properties according to their co-occurrence in the sample. The heuristic that we propose is generic in the sense that it is applicable within any parameterized linguistic theory for which it is feasible to compute the possible parameter-settings implied by observing a single input-output mapping; for purposes of concreteness and evaluation, we present the algorithm within the framework of Optimality Theory, using syllable structure grammars as a case study.
  • 2008. Bane, Max and Jason Riggle. Three correlates of the typological frequency of quantity-insensitive stress systems. Proceedings of the Tenth Workshop of the Association for Computational Linguistics' Special Interest Group in Morphology and Phonology. Rutgers Optimality Archive #966. [preprint]
    We examine the typology of quantity-insensitive (QI) stress systems and ask to what extent an existing optimality theoretic model of QI stress (due to Gordon, 2002) can predict the observed typological frequencies of stress patterns. We find three significant correlates of pattern attestation and frequency: the trigram entropy of a pattern, the degree to which it is ``confusable'' with other patterns predicted by the model, and the number of constraint rankings that specify the pattern.
  • 2008. Bane, Max. Quantifying and Measuring Morphological Complexity. Proceedings of the 26th West Coast Conference on Formal Linguistics, 67-76. [paper]
    It is a standard assumption in linguistics that all human languages are equally (and enormously) complex; when looked at as a whole, no language can be called "simpler" than another. Certainly, languages can differ in the distribution of their complexity, so that one might employ a richer inflectional system, or entertain a more complicated gamut of syllable shapes than another, but it is generally supposed that these differences must "even out" as one considers entire linguistic systems. A number of researchers have recently begun to approach this equal complexity hypothesis as an empirical claim to be tested under particular definitions of complexity. Perhaps the most famous recent example is McWhorter's (2001) controversial claim that "creole grammars are the world's simplest grammars," but see also Juola (1998), Shosted (2006), Nichols (2007), and Pellegrino (2007). This paper argues for an information theoretic approach to defining linguistic complexity and offers preliminary results for a novel method of using the mathematical notion of Kolmogorov complexity together with an automatic lemmatizer to construct a numerical metric of morphological complexity.
  • 2007. Jason Riggle, Edward King, James Kirby, Max Bane, Heather Rivers, Evelyn Rosas, John Sylak. Erculaor: A Web Application for Constraint-Based Phonology. In University of Massachusetts Occasional Papers in Optimality Theory 36: Papers in Theoretical and Computational Phonology. [preprint]

Theses

  • 2009. Bane, Max. Predicting usage with a competence grammar: Variation in the English dative alternation. Qualifying Paper, Dept. of Linguistics, University of Chicago. Advised by Jason Riggle and Karlos Arregi.
  • 2008. Bane, Max. Modeling the typology of quantity-insensitive stress systems. Qualifying Paper, Dept. of Linguistics, University of Chicago. Advised by Jason Riggle and Alan Yu.

Edited Volumes

  • 2009/in press. Bane, Max, Juan Bueno, Tommy Grano, April Grotberg, Yaron McNabb (eds.). Proceedings of the Chicago Linguistic Society 44, vols. 1 and 2. Chicago: The Chicago Linguistic Society.

Presentations

  • 2009. Bane, Max. Longitudinal Phonetic Variation in a Closed System. U. Chicago Council on Advanced Studies Workshop on Language, Cognition, and Computation. [slides]
  • 2009. Bane, Max and Jason Riggle. Evaluating Strict Domination: The Typological Consequences of Weighted Constraints. 45th Annual Meeting of the Chicago Linguistic Society. (Delivered by Jason Riggle). [handout]
  • 2009. Bane, Max and Ed King. Local Predictability in the Lexicon. 45th Annual Meeting of the Chicago Linguistic Society. [slides]
  • 2009. Bane, Max. Grammatical Correlates of Cross-linguistic Frequency: Quantity-Insensitive Stress. U. Chicago Council on Advanced Studies Workshop on Language and Cognition. [slides]
  • 2008. Bane, Max, Jason Riggle, James Kirby and John Sylak. Multilingual learning with parameter co-occurrence clustering. 39th Meeting of the North East Linguistic Society. [slides, handout]
  • 2008. Bane, Max and Jason Riggle. Three correlates of the typological frequency of quantity-insensitive stress systems. The Tenth Workshop of the Association for Computational Linguistics' Special Interest Group in Morphology and Phonology. [slides]
  • 2008. Riggle, Jason, Max Bane, James Kirby, John Sylak. Distinguishing Grammars in Multilingual Learning Using Parameter Co-occurrence. 82nd Annual Meeting of the Linguistic Society of America.
  • 2007. Multilingual Learning as Parameter Co-occurrence Clustering. U. Chicago Council on Advanced Studies Workshop on Language and Cognition. [handout]
  • 2007. Quantifying and Measuring Morphological Complexity. 26th West Coast Conference on Formal Linguistics. [slides]
  • 2007. Riggle, Jason, Max Bane, James Kirby, Jeremy O'Brien. Efficiently Computing OT Typologies. 81st Annual Meeting of the Linguistic Society of America.

Data and Source Code

  • Python module implementing Gale and Sampson's (1995/2001) "Simple Good Turing" method of frequency estimation/smoothing: sgt.txt. Download and rename to "sgt.py".
  • UK Big Brother, season 9 sociophonetics: cleaned up minute-by-minute log of the season (2.7 MB), as writen by some poor Sun reporters; a video (42 MB, mpeg4-encoded avi) of weighted pointwise mutual information between contestants' being mentioned together over time in the log; daily graphs of contestants' co-occurrence in the log.
  • Syllable structure typological data: tds-syllprops.xml and tds-syllstructure.xml, some XML-structured results extracted from the Typological Database System at Utrecht.
  • Quantity-insensitive stress stats: qistress-allStats.csv, CSV table of statistics used in Bane and Riggle (2008), Three correlates of the typological frequency of quantity-insensitive stress systems.
  • Bible corpora: bible-corpora.tar.bz2 (49 MB). Plaintext translations of the bible into 24 languages, namely: Albanian, Croatian, Czech, Danish, Dutch, English, French, German, Greek, Haitian Creole, Hiligaynon, Hungarian, Icelandic, Italian, Latin, Maori, Norwegian, Portuguese, Romanian, Russian, Slovak, Spanish, Swahili, Swedish. Automatically collected, cleaned, and combined from www.biblegateway.com. For some languages only the New Testament was available.

Other

  • Praat-Py for OS X

    Praat's built-in scripting language is funky. Thankfully, Joshua Tauberer maintains Praat-Py, a patched version of Praat that supports Python, a much more widely known and used scripting language. Joshua only maintains built binaries for Windows and Linux, so I've taken some time to create a native Mac OS X build, which you can download here:

    Download Praat-Py-5119-OSX.zip.

    This version is based on Praat version 5.1.19 (21 October 2009), and version 0.7 of Joshua's patch. Just download, unzip, and copy Praat-Py.app to your Applications folder to install.

    I should mention that I've only tested it on Intel-based Macs, since that's what I have available, but in theory it's a "universal binary" that should work on any reasonably recent PowerPC-based Mac.

  • cascadilla.cls:

    This LaTeX class provides an extension of the "article" document class that can be used to typeset papers conforming to the style sheet of the Cascadilla Proceedings Project, which is used by a number of linguistics conference proceedings (e.g., WCCFL). Suggestions, questions, and bug reports should be directed to Max Bane.

    The current version is 1.4, and is distributed as a zip file containing the class file (cascadilla.cls), a BibTeX style file (cascadilla.bst), and an inline-documented example paper (example.tex, exampleref.bib, example.pdf), which demonstrates the class's use and output.

    Download cascadilla-14.zip here.

    Thanks to Jonathan Brindle for pointing out, and helping to resolve, some glitches in previous versions.

    Version History:

    • Version 1.4
      • Updated the appearance of citations and references to be more in line with the Cascadilla stylesheet.
    • Version 1.3
      • Put in some missing \selectfont's
      • Adjusted \abovecaptionskip and \belowcaptionskip for use with \centering rather than the center environment.
    • Version 1.2
      • Restored blank space between title and author
    • Version 1.1
      • Made title matter optional
      • Added notimes option
      • Added additional blank line after title
      • Section labels end with a space rather than a quad
      • Made figure/table captions bold
      • Added "immediate" subsection commands for proper spacing of sub-headings that immediately follow super-headings.
    • Version 1.0
      • Initial release


Links