Gender can be guessed from a Tweet

Algorithms can do a lot these days. A group of researchers claims to have developed one that allows them to guess the gender of someone using Twitter.

According to Digital Trends, researchers with the MITRE Corporation have established a way to correctly guess gender by separating certain words in a Tweet. Twitter doesn’t gather gender information in their profiles, this makes it a good medium for testing the algorithm.


First, the team had to collect information on the location, description, profile name and real name of all Twitter users in the test group. The majority of the Twitter users in the sample have only posted once on the micro blogging service. A primary test was used to discover if the algorithm was able to detect a person’s gender from the name and the computer guessed had an 89 percent success rate.


The algorithm only looked at the content of an individual tweet and it was able to predict gender about 66 percent of the time. By analyzing all of the tweets that make up a user’s account, the researchers were able to increase the accuracy of the algorithm to over 75 percent. Additional results had about 71 percent accuracy on using just the description and 77 percent accuracy with the screen name.


When all four fields were combined with the tweets, the computer had a 92 percent success rate.


The researchers also said that punctuation popped up frequently as an indicator of gender. The use of a smiley face or an exclamation point usually showed that the gender of the user was a female. It is more likely that a female would use words like “love”, “cute”, “happy”, “mommy”, “sleep”, “school”, “baby”, “bed”, “chocolate” and “hate” on top of using Internet terms like “LOL” and “OMG”.


Interestingly men were only linked to a few phrases like “http” and “google”.


The analysis also displayed gender lines for “possessive bigrams”, a phrase that starts with “my” or “our”. Expressions credited to men were “my wife”, “my gf” and “my beer”. Females frequently said “my yogurt” and “my husband”.


The phrases were also looked at for political classification. Tweets involving yoga, vegetarians and the Los Angeles Lakers (huh?) most likely came from Democrats while tweets about Wal-Mart, weapons and LSU probably came from Republicans.


In case you were wondering, there is a reason that all of this seemingly unimportant information was collected. The research group thinks that the algorithm could be used to target a specific group on Twitter. The foundation of this project might help brands and businesses trying to market their services to the Twitter community.


The research paper that this project is based on is available online. Bet you never thought that Twitter could be taken so seriously.