Psycholinguistic Databases & Corpora

ARC Nonword Database

Working on a lexical decision task and need some weird nonword foils?   Hmmm... but what if I need to specify some weird orthotactic constraints on my nonword stimuli?  Never fear. The bulk of the work has been done for you with the ARC nonword database. Link here 

asl-lex

This visually stunning and beautifully crafted database is from Professors Naomi Casselli, Zed Sevcikova Sehyr, Ariel Cohen-Goldberg, & Karen Emmorey. ASL-LEX provides lexical and phonological properties for about 1,000 signs of American Sign Language, including iconicity, frequency, and many other variables.

calgary semantic decision project and embodied cognition ratings

These category decision norms and embodiment ratings are from the awesome Penny Pexman's group.  Link here. I'm such a fan of their group's work both in abstract words and sound symbolism.

concreteness ratings for 40k English Lemmas (Brysbaert et al., 2014)

Here's a mammoth set of word concreteness ratings from the great Marc Brysbaert and colleagues.  Visit here.

corpus of contemporary english

The Corpus of Contemporary American English (COCA) is the largest freely-available corpus of English. Visit here.

english lexicon project

Here's another bread-and-butter psycholinguistic database from Professor David Balota at Washington University in Saint Louis. This monster has trial level naming and lexical decision data for zillions of English words.  Visit the ELP here.

English noun imageability & phonology dataset  

This is a psycholinguistic database reflecting phonological, lexical, and semantic attributes for a large set of English nouns (N=2877).  I created this as part of my doctoral dissertation in 2005  [download database here]  [article here].

glasgow psycholinguistic norms (imageability, valence, etc.)

Normative ratings for 5,553 English words on nine psycholinguistic dimensions: arousal, valence, dominance, concreteness, imageability, familiarity, age of acquisition, semantic size, and gender association.  Link here.

mrc psycholinguistic database

Here's the queen mother of all psycholinguistic databases from the MRC/CBU (Cambridge).  Many of the measures are a bit too dated at this point (e.g., Kucera frequency norms), but the filtering features, concreteness, familiarity, etc. make this site tough to beat. Click here to visit MRC.

perceptual and affective ratings for 750 abstract and concrete English nouns

Here's a spreadsheet with Mechanical Turk ratings (N>350 people) on 15 different cognitive dimensions for 750 abstract and concrete English nouns. We recently published details of the scaling procedures in Frontiers in Human Neuroscience (see Troche et al, 2014; Crutch et al., 2013).  

SubtLex American Word Frequency Database

Need frequency values for a list of your stimuli based on a corpus of 50 million words? Dump them into Professor Marc Brysbaert's database, and voila.  Link to this awesome psycholinguistic database here.

taboo + common noun corpus

Here's one for the scientific annals (and yes, I did say annal). This is a corpus composed of 480 common English nouns (e.g., bucket) judged on the quality of how they combine with extant profanity (e.g., cunt) to form novel profane compound words (e.g., cuntbucket).  For each word, you will find its rating (judged by 21 people) as well as coding across a range of psycholinguistic variables.  Download the file in CSV format here

taboo single word prediction database

Here’s a database of 1205 English high frequency words coded across 22 psycholinguistic variables. The DV in our analyses was tabooness, but you can use this for whatever the hell you want. Download the CSV file here. 

urban dictionary

Today’s featured entry is “back burner bitch” or Triple B. It’s a friend who is your last resort for hanging out with (but doesn’t know it). Urban dictionary has zillions of these entries. We use the database extensively in our work on taboo word usage.