Analyzing “I Do”
How many of you have ever checked out Google’s Ngram Viewer? Here’s a spin-off for any armchair demographers out there: a tool for analyzing The New York Times Weddings/Celebrations Section, WeddingCrunchers.com. (The creators have a similar tool for rap lyrics, called Rap Stats.)
If you’ve never seen or heard of an n-gram before, it’s a name for a simple concept that packs a punch. An n-gram of words is a sequence of text n words long. Take the phrase, “Speak truth to power.” This 4 word statement has three 2-grams (Speak truth, truth to, to power) and two 3-grams (Speak truth to, truth to power). You could have n-grams of letters, too, or other items of text. The math attached to n-grams is really cool, drawing from combinatorics.
The tools highlighted above show the frequency of word n-grams—phrases that are one or several words long— by year in specific collections of text. Google allows you to search for phrases from a wide variety of books since 1950, Rap Stats pulls its data from rap lyrics starting in the year 1990, and Wedding Crunchers analyzes the Weddings/Celebrations section beginning with the year 1981. Since Weddings/Celebrations announcements are pretty formulaic, Rap Genius Engineering Team explains, Wedding Crunchers is able to make use of n-gram searches to reveal some trends about what couples care to share. The website allows you to search all Weddings/Celebrations announcements since 1981 for phrases you’re curious about—like words showing age, degrees, or hometown. You can search for several different phrases at once, and see how they have been used over time.
Want to know how often your alma mater is mentioned compared to some rivals? Try entering college names, separated by commas, into the search box to see the trends:
You could also try to search for some multiword phrases (n-grams), like job titles, to check out occupations:
Searches on the words, “cotillion, debutante, ball, party” or “internet, social media,” demonstrate some changes in social customs and the appearance of new technology. The site also lets you check out searches other people have done with the button “or just show me a good search”. There are plenty of other great things it does, too—some details are here.
Two questions you might have:
Why are these frequencies so low?
Remember, this chart gives the ratio of n-grams you searched to the entire set of n-grams in the announcements. It doesn’t tell you in what percentage of all announcements this phrase appeared; think of it as revealing the percentage of all similarly long phrases that are made up by the phrase.
What is “smoothing”?
Smoothing is a way to average estimates over time in order to prevent too much “noise,” or peaks and valleys from year to year. A smoothing factor of 2 will average a given year’s result with the 2 years before and after it. This keeps the trend line looking more stable, which might help in determining changes over longer periods of time. If you do want to check out the non-smooth data, just choose a smoothing factor of 0.
Of course, this tool is subject to a lot of bias: with the exception of the “just show me a good search” searches, it allows the user to find only what he or she enters into the “search” box, and—of course—it examines a fairly non-representative subset of couples. According to Rap Genius writer ATodd, this section of The Newspaper of Record “is a perfect natural experiment designed to answer the question: What do the world’s most self-important people think is important?” Self important, social climbing, deliriously happy, a mirror of our times? You have fun with the searches and decide.