Programatically pluralizing English words is tricky. There are lots of special cases and rules, and exceptions to those rules. Many have tried before, and many will try after, but English will remain organic and intractable. Nevertheless, I think this attempt is valuable for a couple of reasons:
- It's in JavaScript, where there are few good inflectors.
- It's better researched than the others I've looked at.
- It comes with a unit test suite documenting the edge cases.
- It's extensible.
- It has an LGPL license, so you can use it anywhere.
Download
- pluralize.js
- pluralize.zip (includes license and tests)
Or "try before you buy" (not that there's anything to buy) on this live test page:
Documentation
The library provides one function: pluralize()
. Here's it's signature in psuedo-code:
String pluralize(String noun, optional Integer Count, optional String plural);
It exists in the owl
namespace, and you usually use it like so:
owl.pluralize("baby"); // returns "babies"
If you're not sure if you want to pluralize or not, you can also pass in a count. If the count is one, the singular form will be returned, otherwise the plural is returned.
function deleteMessage(n) {
return "deleted " + n + " " + owl.pluralize("record", n);
}
deleteMessage(0); // "deleted 0 records"
deleteMessage(1); // "deleted 1 record"
deleteMessage(42); // "deleted 42 records"
The library detects and preserves the case of the word passed in:
owl.pluralize("alumnus") == "alumni";
owl.pluralize("Alumnus") == "Alumni";
However, all-uppercase worlds get a lowercase "s", because this is correct for initials:
owl.pluralize("IBM") == "IBMs"
Basically, it assumes that you're using normal English casing rules. If you're trying to build an all-uppercase message, you should use toUpperCase()
after pluralize()
.
Sometimes, it also uses casing as a hint. The nationalities and languages ending in "ese" (such as "Chinese") should always be capitalized, so the function uses case as a hint:
owl.pluralize("chinese") == "chineses";
// this is wrong, but so was the input. GIGO.
owl.pluralize("Chinese") == "Chinese";
Not every word will be correctly pluralized by default, of course. The first is simply to pass the plural into pluralize()
as the third argument. Obviously, this is only useful if you're also using the count feature.
owl.pluralize("emacs", 1, "emacsen"); // "emacs"
owl.pluralize("emacs", 42, "emacsen"); // "emacsen"
Following the open/closed principle, the library provides a way to extend it's behavior without modifying the code. You can define your own custom plurals.
owl.pluralize("emacs) == "emacses"; // lacks geek cred.
owl.pluralize.define("emacs", "emacsen");
owl.pluralize("emacs) == "emacsen"; // now it's geek-awesome.
From then on, that word will aways be pluralized the as defined. There's no way to undefine a plural.
Octopuses and Dwarfs
A couple of choices made by the library are the targets of popular misconception and urban myth and require justification.
The first is "octopus:"
owl.pluralize("octopus") == "octopuses";
No, not "octopi." This young lady explains it rather well:
What's true of "octopus" is true for many other words ending in "us" that do not come from Latin originally. That means I can't add a general rule, and since the anglicized plural of octopus in particular is preferred to fake Latin, there's no reason to add a special case either.
A rather more sensitive example is "dwarf." Dwarfs are people having the medical condition of dwarfism. Dwarves are fantasy creatures that live under mountains... at least according to Tolkien. The Disney movie, on the other hand, is called "Snow White and the Seven Dwarfs." So this library takes the high road and uses "dwarfs."
Edge cases like this forced me to define a philosophy of pluralization. This library attempts to maximize the likelihood of "acceptable" plurals. This means it uses regular, anglicized plurals whenever possible. If the anglicized plural is common and acceptable, no special case is added, i.e. "octopuses." Only when the anglicized plural is considered wrong, ugly, or ignorant does it use an irregular form, i.e. "criterion" is pluralized as "criteria." The idea is minimize the chance of a really bad plural showing up in your content.
And of course, if you're building a website about the sea life of Middle Earth, go ahead and register "octopi" and "dwarves" as custom plurals.
Prior Art
This library is largely based on Damian Conway's pluralization algorithm. However, I only handle nouns, including pronouns. I don't handle verbs, adjectives, or phrases. I've also added a large number special cases beyond the tables he provides. I've also used anglicized plurals in most places, and used classical plurals only for well-known examples such as "alumni".
I looked at the Ruby on Rail's Inflector first, but didn't get much from it. It's a surprisingly sophomoric effort considering how central it is to the framework (Rails, by default, maps singular model class names to plural table names using the inflector.) Damian's algorithm, even implemented half-way like I did, is much better.
- Oran Looney July 31st 2010
Thanks for reading. This blog is in "archive" mode and comments and RSS feed are disabled. We appologize for the inconvenience.