Monday, 26 June 2017

Naughty boys and sexy girls

21st Century = equality, right?  Cathrine Norberg decided to investigate, using the New Model Corpus, a 100 million word corpus of current English drawn from the web, to see if there were any differences in the usage of the words GIRL and BOY.

Norberg started by examining verbs and found that boys are frequently associated with physical activities.  Jump, run and kick for example, were mainly found with BOY. GIRL was principally used with ‘non-movement’ verbs, like sit and wait. Play was strongly linked to BOY, although the thinking verbs, discover and understand were also exclusive to BOY, suggesting they are more curious and mentally active.  

Girls were the objects of verbs far more often than boys. The violent verbs rape, traffick, abduct, assault, attack and steal were all exclusively found with GIRL, whereas the only similar verbs with BOY as the object were beat and drown, suggesting that girls are more frequently represented as victims.  GIRL was much more frequently the object of marry, indicating that the idea of girls being ‘given’ in marriage still persists. GIRL was often used as the object of date, love, and fuck, showing that females are seen as passive objects of male sexuality.

Norberg also analysed adjectives, dividing them into three groups:

      In physical appearance, female identity was closely connected to appearance, often with sexual undertones.  Attractive and sexy are more frequent with GIRL and the use of naked suggests that there is a stronger connection between nakedness and sexuality when it is a girl who is without clothes (‘I spot a naked young boy chasing a cow’ vs. ‘...viewing naked girls was a matter of sexual delight’).  Boys were discussed in terms of their physical size, with big also used in the sense of importance.  Girls were more often described as little but in the sense of sweetness and innocence (‘...that little girl is so cute’). Young was also used more with GIRL, although freshness, in terms of girls’ sexuality, was a more common meaning than age (‘I could...enjoy this young girl’s body for pleasure…’).  Old was used more with BOY, although rarely with reference to age; more commonly it had the sense of belonging to a ‘club’.  Similar to big, old indicates male importance (‘Study finds corporate old boys have positive impact on governance reform.’)

      In personal characteristics, naughty was used for both BOY and GIRL but with a difference, often including sexual aspects for girls (‘Can you make your skirt slit any higher, you naughty girl?’) and only conduct for boys (‘We’ve got a naughty boy in school.’)  Norberg found a set of adjectives used more with GIRL that diverged from the traditional image of girls as agreeable and passive:  words like tough, crazy and popular. However, once again, many of them alluded to sexuality (‘Edith, a very pretty blonde, was the popular girl, who received lots of valentines.’)

      In roles and social identity, Norberg found that boys were associated with activity (delivery boy) whereas girls were associated with sexuality (call girl).  Only GIRL occurred with single and unmarried, indicating that females are more likely to be referred to in terms of their marital status. This also happened with nationality words like Swiss and religious terms like Christian, suggesting that females are seen as ‘others’, becoming Muslim girls, whereas males were just Muslims.

So, our language reveals that we still think in gender specific terms, sometimes to an alarming extent. Definitely not 21st century equality….


Norberg, Cathrine (2016). Naughty boys and sexy girls: The representation of young individuals in a web-based corpus of English. Journal of English Linguistics 44: 291-317.

doi: 10.1177/0075424216665672  


This summary was written by Gemma Stoyle

Saturday, 6 May 2017

OMG! Is texting wrecking our language?

How many times have you heard someone say that texting is ruining your spelling? Or, perhaps even more dramatically, the whole English language? It’s a point of debate that has only intensified with the advent of smartphones and permanent connection to the Internet wherever you go, but – as you may not be surprised to hear – linguists find that the situation is actually more complicated than that.

- See you l8er?

Sali Tagliamonte, along with various colleagues and students at the University of Toronto, did a study between 2009 and 2010, where they collected almost 200,000 words of data. These were conversations from emails, texts and instant messenger conversations, which mostly came from Facebook – all forms of computer mediated communication, or CMC for short. The students, along with these samples, were also asked to show a piece of formal written work, as a point of comparison. Tagliamonte chose to analyse three different linguistic features across the corpus: acronyms and other short forms, such as lol; intensifiers, such as really or literally; and future temporal reference, or more specifically, the use of go in the future tense. After checking their frequency across the four formats, Tagliamonte found some interesting results.

Unsurprisingly, the formal written work had no acronyms, and very few intensifiers; only very, out of all of them, showed up. As for the future tense, the only attested form was will – bog-standard formal English. So far, so boring. Where the interesting divisions lie is when looking at the different types of CMC.

Emails were found to consistently be the most formal, with writing coming in much larger chunks than their texted or messaged counterparts, and very few acronyms or intensifiers. Then there was a tie between SMS (texts) and IM (instant messages), with more rapid exchange of turns, and far more acronyms and intensifiers. In particular, the intensifier so was very common in the SMS corpus, probably because this was back in the dark ages of character limits on text messages. On the other hand, going to as a temporal marker was actually less common than its other future tense counterparts. This suggests that, contrary to Tagliamonte’s original prediction, the use of going to is actually more conservative than other shortened forms such as i’ll and ima. The upshot of these observations, though, is that the students had stable registers across each form of writing, with the grammar remaining stable in each medium.

Another interesting observation was the trajectory of the acronym lol. Evidence in this corpus shows it is still the most popular of all CMC acronyms, but it has been found to have decreased in use, before increasing again. By examining its placement in the sentence (whether it was used in the middle of a phrase or at the ends), Tagliamonte found that it was usually used at the end of sentences and standing alone, suggesting it is now being used as filler.

Of course, the study still has some drawbacks. For one, as Tagliamonte notes, there was still a strong distinction between texts and instant messages back in 2010, as not everyone had smartphones with full access to the Internet. Now that everyone uses Whatsapp, iMessage, or whatever other messenger app you care to use, the distinction between texting and instant messaging has been blurred. If the study were to be repeated now, therefore, there would almost be no point in separating the two as different categories. There is also the issue of emojis; is it possible that the popular lol has now been overthrown by😃? Regardless, however, it is fairly certain to say you won’t be writing “omg Macbeth was so crazy lol” in an English essay any time soon!


Tagliamonte, S. A.  in collaboration with Dylan Uscher, Lawrence Kwok, and students from HUM199Y 2009 and 2010 (2016). So sick or so cool? The language of youth on the internet. Language in Society 45: 1-32.

doi: 10.1017/S0047404515000780


This summary was written by Marina Merryweather