a little hint of jitter


I'm fond of this simple little facet_wrap scatterplot. I think it's much more informative than a bar chart or boxplot.  The y-axes scale free. I also passed a color vector to ggplot

ggplot(scat.melt, aes(variable, value, shape=variable, fill=variable)) + geom_point(shape=21, color="black", size=2.3, alpha=.6, position = position_jitter(w = 0.17, h = 0.0)) + 
  scale_fill_manual(values=coltry) + facet_wrap(~Condition, scales="free_y") + theme_bw() + 
  theme(panel.grid.minor = element_blank(), panel.grid.major = element_blank(), panel.background = element_blank())+ ylab("Pupil Dilation (mm)") 



A simple, straightforward dendrogram


Scrunch the branches of a cluster dendrogram:
t1.kp<- as.dendrogram(t1.clust)
cutree(t1.kp, k=7) 
t1.test <- t1.kp %>% 
  set("labels_cex", 0.75) %>%
  set("labels_col", k=7, value = cust.r.7) %>%
  set("branches_lwd", 1) %>%
  set("branches_k_color", k=7, value = cust.r.7)
par(mar=c(12, 0.5, 0.5, 1))
t1.plot <- plot(t1.test, horiz=F, axes=F)


Scatterplot with a gam smoothing function


Here's a  clean scatterplot of naming ability vs. global cognition in a group of neurotypical older adults.   The R Script is here.  This was relatively straightforward using ggplot2's 'pretty' function to automagically scale the x-axis (Montreal Cognitive Assessment Score).  I manually scaled the y-axis and changed the default colors of the points, fill, and trendline.





Building a multiplot correlation matrix

Sometimes I take the easy way out and export plots out of R to do finer-grained aesthetics in Adobe Illustrator.  Mock me if you must.  I just built up a multi-panel correlation matrix using four separate 'corrplot' functions.  Step 1: create raw correlation matrices.  They look ugly right out of R -- like the matrix on the bottom left.  I exported this into Illustrator and there are 2 key steps (embed and ungroup).  These will allow you to edit the correlation matrix, change parameters, etc.    Eventually, you can build the plot up to a nice(r) looking matrix like the one below.


facet multiplot of semantic scale ratings for bilingual spanish-english vs monolingual english speakers using a custom theme in ggplot2

Here are plots for color, sound, etc for words rated by two groups (bilingual/monolingual). This uses a custom theme I just finagled. It's sort of pretty in how minimal it is.  Here's the script.

The theme part  is:  theme(legend.title=element_blank(), axis.line = element_line(colour = "black"), panel.background = element_blank(), panel.grid.major = element_line(colour = "gray91", size=0.1))


Interpolating & smoothing a continuous time series of pupillary dilation with switch events annotated


Here's one minute of continuous recording of pupillary dilation as the participant hears tones that shift in frequency at the markers. This nicely illustrates pupil spikes when change occurs.  Courtesy of Ally Dworetsky - our summer intern phenom. Script here -- includes interpolation, smoothing, and plotting parameters.


Adding just the faintest bit of jitter to a scatterplot

Boxplots can really misrepresent data. I'm working on using scatterplots like this one more often. The trick with scattering a categorical variable (see x-axis) versus a continuous variable (y-axis difference score) is that the points often have so much overlap that it's very difficult to tell what's going on.  Here's a plot where I added just the faintest whiff of jitter in the horizontal plane and changed the point opacity. Here's how.


behold a tanglegram

This is called a tanglegram. It contrasts two hierarchical cluster dendrograms. In this case, the dendrograms represent English and Spanish clusters generated for translation equivalents among bilinguals (N=20) when rating the same set of words on color, size, emotion, distance, sound.  The 'tangles' show how meaning "remaps" when switching between languages.

I generated the clusters using K-means partitioning and colored the branches of the dendrogram by clusters. Here's the script.  Here are the data

I'm amazed at how smart the people who developed the dendrextend package are.   The R community in general is so incredibly helpful.  This plot took a LONG time to figure out, but it was worth it.



Human pupillary response functions to "dirty" words

Here's the result of a time series analysis reflecting dilation of the pupil for a sample of 21 adults as they heard neutral words, technical terms for body parts, or profanity. There was some slight funkiness with this in terms of plotting a range and getting ggplot to recognize custom colors.

R-script here


Correlogram of Ratings and Reaction Times to English Profanity

Here's a correlogram. This is simply a visual depiction of a correlation matrix. These are Pearson correlations. The variables are ordered by similarity using the hclust function of the corrplot package. This was a little funky because I didn't like the built in color scale (1 was blue), so I reversed it by manually passing a new color palette.  Here's the R script.


Histogram of Common Noun Ratings as Candidates for Novel English Profanity

Here's a fun little histogram.  This reflects counts for the distribution of Likert-scale ratings (x-axis) for 21 adults who judged whether a common noun combines well with existing English profanity to form a novel emergent profane term.  I changed the bin width here and specified counts on the Y-axis. Here are the data and the script


Simple X-Y Scatter

Noun Imageability and Concreteness values from the MRC Psycholinguistic Database

Annotated R code here


3d Scatterplot using the rgl package in R

Here's a 3d plot representing how the meanings of abstract and concrete nouns cluster in a semantic space bounded by three dimensions.  

I used the rgl package in R.  It's pretty neat.  Once the plot is generated it allows the user to rotate to an optimal plane. 


Annotated R code here


The fancy scatterplot above gets even fancier

Fancy Scatter No Borders.jpeg

I messed with the aesthetics of the fancy(ish) scatterplot above.  GGPlot uses themes to alter elements of the plot.  Here's the same plot in a half box (only X-Y axes appear) with the major gridlines resurrected.  

Annotated R code here


Using the facet wrap function for multiple plots


Here we have multiple plots.  GGPlot uses the facet_wrap function to arrange plots this way. The program breaks the data into subplots based on the factor a user specifies (in this case language).  These data are from a study we are on the verge of submitting. People force choice guessed whether aurally presented words in unfamiliar languages (e.g., Arabic, Dutch, Hebrew, Hindi, Korean, Russian) represented abstract or concrete concepts.  Most people were remarkably above chance even after we eliminated cognates from the mix. The shaded rectangle represents a range of approximate chance responding.

Annotated R code here 


Lonely old bar graph

Here's one from an eyetracking study we just completed plotting average response latencies for the word and picture versions of the Pyramids and Palm Trees Test (objects) relative to the Kissing and Dancing Test (actions).    We eliminated the x and y top and right borders and scaled the y-axis minimum to .75.  R code here,  Dataframe here


Changing points by color in a 3D Scatterplot


Oh man... this was annoying to create. Using the RGL package we just wanted to create a 3d scatterplot varying the point colors by a categorical variable (concrete or abstract word).  This saga took a zillion more hours than I have to produce the following plot...

R Code Here


Time series plot of continuous sampling of pupil diameter during a visual symbol cancellation task in a person with post concussive syndrome

R Code Here     It's sort of a bear to get R to recognize a column of numbers as a time series when it wants them to be a factor. I struggled to get GGPlot to plot the time series. Instead, I reverted to R's plotting function after recoding the data as a time series. This graph represents very rapid fluctuations in the diameter of a pupil (the black part of your eye, not a student) for a person who is experiencing post concussive symptoms during a symbol cancellation task (i.e., many visually similar distractor symbols).


Multiple time series overlaid on the same plot. These data reflect continuous sampling of pupil dilation during the same visual symbol cancellation task for two people. 

R Code Here      Dataset Here

Elizabeth Brophy spent the greater part of today learning how to overlay two discrete time series. Was it worth it in the grand scheme of her limited time on this earth?  You'd have to ask her, but my feeling is that the plots look great.

This reflects pupillary fluctuations measured at 120Hz. The nice thing about this graph is the axis cutting and the comparison of two peoples' time series.  She needs to rescale the y-axis a bit, and we should also add in the event markers. That's for next time. 


Blink and you'll miss it: Linear interpolation

During continuous plotting of pupil diameter, something pesky happens. People blink. The time series for the blink events contains chunks of zeros (i.e, the pupil diameter is measured at 0mm during a blink event).  It's necessary to interpolate across the blink events and discard those pesky zeros as missing data.  Luckily we don't have to do a whole lot of interpolation, but here's an example of what it looks like using R's Zoo package. Here's how this works... The blue line shows continuously sampled pupil diameter including blinks and other weird drift artifacts.  See the breaks and weird noncontinuous parts of the time series? The red line represents a continuous time series interpolated across those missing events.  When you overlay the original time series with the interpolated time series you can get a fairly good picture of the trend.  Too much interpolation is a no-no, but this looks ok.   R-Code here  


Heatmap of our abstract word topography data

Here's a heatmap plot that reflects a hypothetical semantic space wherein 400 highly abstract and concrete English nouns are situated.  The R-code is here. The dimensions across the bottom reflect domains where >350 participants rated the 400 English nouns on their emotional valence, visual salience, etc. The vertical axis reflects increasing word concreteness beginning with abstract words such as justice increasing to concrete words such as dog. The database and all associated word ratings are here. These are the data we reported recently in our Frontiers in Human Neuroscience article.   Hotter areas of white indicate "higher" ratings on a particular domain. This plot is interesting because it shows some nice latent structure of abstract and concrete words in terms of emotion, polarity, and sensory salience.


Much fancier version of the last heatmap

This $#*@ took me about 40 hours to nail down. This heatmap reflects the same data as the previous plot but with many more bells and whistles.  I used the gplot package in R and its heatmap.2 plotting functions, of which I had no working knowledge until about 40 hours ago. These things are really obsessive little puzzles. This involved restructuring my original spreadsheet, coercing R into handling the data table as a matrix with column 1 as row names and then moving stuff around in Illustrator.  Here's the annotated R-code


Facet wrap scatterplots

Here are scatterplots for 14 dimensions arrayed using R's facet wrap function.  Here's the spreadsheet (in long form). Here's the R-code for plotting in ggplot2.


Correlogram reflecting bivariate correlations between odor, motion, visual form, space, emotional valence, and other variables for 750 English Nouns

Here's a correlogram for an article we're writing up now. For those unfamiliar with this format, it is simply a visual depiction of a standard bivariate correlation matrix. The color map is scaled to Pearson R values. I created this using the corrplot package in R.  Here's the spreadsheet and the code.


Interpolation and application of a moving average smoothing algorithm to pupil dilation data

So here's an interesting little time series plot. This reflects a the dilation of a single person's pupil over the course of a few seconds when a monitor rapidly flashes from white to black (the flash point is the orange dotted line). Here are the data and the R-script.  Our eyetracker samples at 120Hz, so there are blink trials that need be interpolated across.  The data off the tracker are jolty and noisy, so we applied a moving average smoothing algorithm of 8 places. This illustrates the time course of the pupil dilation nicely.  


Bar graph of parameter estimates from an fMRI ROI analysis


Here is a bar graph that was a bit challenging to put together. It reflects a very simple design, but GGplot was up to its old tricks.  Here's the annotated code



The last plot with some Adobe Illustrator clean-ups.

This came out pretty nicely. This was the figure that ultimately made its way into this article in Brain and Language:

Reilly J, *Garcia A, & Binney RJ (2016). Does the sound of a barking dog activate its corresponding visual form? An fmri investigation of modality-specific semantic access. Brain and Language, 159, 45-59. doi: 10.1016/j.bandl.2016.05.006








Time series:  Pupil dilation for imagining a sunny day in response to Yes versus looking into a dark room in response to NO.

Here are two time series snaking within one another with error bars created using the pointrange function in ggplot.  Here's the R script.  Loving the 2-4 second window.


Ribbon plot

Bonnie Zuckerman created this nice little ribbon plot demonstrating changes in pupil diameter as participants produced different semantic clusters over a one minute period in a verbal fluency task (i.e., Name as many animals as you can in one minute). She cleverly color-coded the time series by cluster (e.g., sea animals, house pets, etc). R Code here.


Manually passing a vector of standard errors to a simple bar chart... with some lazy Photoshopping

These are some graphs of contrast estimates for an fMRI paper we now have under review. I needed to add standard error bars to a series of a few bars. Here are the data and as you will note these are simply means. That is, I am not asking Ggplot2 to generate SE bars based on a stat summary. For this reason, you must first create a vector of standard errors that you will pass to geom_errorbar.  On the bottom left is what the raw plot from R looks like.  I got lazy and did some Photoshopping rather than playing with manually annotating the plot so that it eventually looked like the plot on the right. There were a few challenges with this one: 1) Getting the error bar width adjusted and centering them on the bars -- you need to use the width function to specify the whisker length (here it is .3), and the position=position_dodge(0.5) to get the error bars centered. The trick with the position dodge function for error bars is that it must match the width of the bars specified in the geom_bar aesthetic (in this case .5). 

Here's the R-script for making this happen. 


Pupillary dilation/constriction for two time series alternating Dark-Bright

So impressive... Ally Dworetsky after one week in the lab has produced this beautiful plot using ggplot. She measured her own pupil dilation dynamics over a minute as she viewed a black screen that switched at the 30 second point to yellow (causing a pupillary constriction). She also overlaid another of the summer intern's (Rena) time series for yellow to black with a switch at the 30s point (causing a pupillary dilation). The result is this really nice time series plot. Great work, Ally!  Download the script here


forcing ggplot2 not to re-order factors when faceting a string variable 

Bully to Bonnie Zuckerman for figuring out this R conundrum.  When faceting a plot, R tends to want to re-order string variables, alphabetizing them. This can get you in trouble when order is meaningful.  Here's R code to coerce ggplot into faceting by a trial number but labeling by trial name. This respects the original order of the data (data here). These plots reflect Helen Felker's pupil response data for recall of a word list. Each plot is a different word in an ordered list.


3d scatterplot of profanity versus taboo words in a semantic space constrained by valence, physiological arousal, and social acceptability

Here's a fun little 3d scatterplot using the scatterplot3d package.  This plot represents subjective ratings of emotional valence, social acceptability, and physiological arousal for a series of profane words relative to matched "taboo" body part words. There were a few tricky parts to executing this block of R-code (download here) and here are the data (download csv here). This plot represents the subjective ratings. To come is a plot reflecting peak pupil amplitudes when hearing profane vs. taboo but not profane words.