This folder contains seven tab-separated text files with data from the self-paced reading and eye-tracking studies:

stimuli.txt               : sentence stimuli and comprehension questions
stimuli_pos.txt           : part-of-speech tags of sentence stimuli
selfpacedreading.subj.txt : self-paced reading subject information
selfpacedreading.RT.txt   : self-paced reading results
eyetracking.subj.txt      : eye-tracking subject information
eyetracking.RT.txt        : per-word fixation times
eyetracking.fix.txt       : eye-tracking fixation data

REFERENCES
[1] Frank, S.L., Fernandez Monsalve, I., Thompson, R.L, & Vigliocco, G. (2018). Reading-time data for evaluating
    broad-coverage models of English sentence processing. Behavior Research Methods, 45, 1182-1190.
[2] Fernandez Monsalve, I., Frank, S.L., & Vigliocco, G. (2012). Lexical surprisal as a general predictor of reading
    time. Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
    (pp. 398-408). Avignon, France: Association for Computational Linguistics.
[3] Frank, S.L. (in press). Uncertainty reduction as a measure of cognitive load in sentence comprehension. Topics in
    Cognitive Science.
[4] Frank, S.L. & Thompson, R.L. (2012). Early effects of word surprisal on pupil size during reading. In: N. Miyake,
    D. Peebles, & R.P. Cooper (Eds.), Proceedings of the 34th Annual Conference of the Cognitive Science Society
    (pp. 1554-1559). Austin, TX: Cognitive Science Society.

[1] is the main reference for the current data set.
[2,3] use the self-paced reading data (native speakers only) for model evaluation.
[4] uses the eye-tracking data (of 17 monolingual participants) for model evaluation.

-----------------------------------------------------------------------------------
stimuli.txt 

sent_nr     : Sentence number
sentence    : The sentence, as a single string
question    : Comprehension question (or hyphen if no question)
answer      : Correct answer to comprehension question (or hyphen if no question)

NOTE -- After collecting the self-paced-reading data, three typo's were found
        in the sentence stimuli:
            sentence 43  : "Scott" was "Sott"
            sentence 269 : "at" was "that"
            sentence 337 : "Margaret" was "Margeret"
          These errors were fixed in the eye-tracking study.
NOTE -- Only the 205 sentences with the lowest number of letters were used in
        in eye-tracking study.

-----------------------------------------------------------------------------------
stimuli_pos.txt 

sent_nr     : Sentence number
pos         : String of part-of-speech tags (Penn Treebank style)

NOTE -- Part-of-speech tags were generated by Tsuruoka & Tsujii's (2005) automatic tagger, after which they were corrected
        by hand in accordance with the Penn Treebank part-of-speech tagging guidelines (Santorini, 1991).
NOTE -- Punctuation marks receive their own tag, and clitics are split into two tags (e.g., "doesn't" is tagged "VBZ RB")

-----------------------------------------------------------------------------------
selfpacedreading.subj.txt

subj_nr     : Subject number
age         : Subject's age in years
age_en      : Age at which subject began learning English (0 for native speakers)
sex         : Subject's sex (f/m)
hand        : Subject's handedness (r/l)
correct     : Fraction of correct responses to comprehension questions

-----------------------------------------------------------------------------------
selfpacedreading.RT.txt

subj_nr     : Subject number
sent_nr     : Sentence number
sent_pos    : Position of sentence in presentation sequence
correct     : Correctness of response to comprehension question (c for correct, e for error, hyphen if there was no question)
answer_time : Time in msec between question presentation and response, or NaN if there was no question
word_pos    : Position of word in sentence
word        : Presented word
RT          : Time in msec between word presentation and key press

-----------------------------------------------------------------------------------
eyetracking.subj.txt

subj_nr     : Subject number
age         : Subject's age in years
age_en      : Age at which subject began learning English (0 for native speakers)
monoling    : Is 1 if the subject is monolingual, 0 for multilinguals
sex         : Subject's sex (f/m)
hand        : Subject's handedness (r/l)
correct     : Fraction of correct responses to comprehension questions

NOTE -- Subject #25 scored only 14.5% correct. Most likely, this subject was misinformed
        (or confused) about the response buttons, so "correct" responses were counted
        as incorrect and vice versa.
NOTE -- Background information is missing for subject #41.

-----------------------------------------------------------------------------------
eyetracking.RT.txt 

subj_nr     : Subject number
sent_nr     : Sentence number
sent_pos    : Position of sentence in presentation sequence
correct     : Correctness of response to comprehension question (c for correct, e for error, hyphen if there was no question)
answer_time : Time in msec between question presentation and response, or NaN if there was no question
word_pos    : Position of current word in sentence
word        : Current word
RTfirstfix  : First fixation time on current word (or 0 if word not fixated)
RTfirstpass : First-pass reading time (total fixation time on current word before first fixation on any other word)
RTrightbound: Right-bounded reading time (total fixation time on current word before first fixation on any word to the right)
RTgopast    : Go-past reading time (total fixation time from first fixation on current word up to first fixation on any word
                                    to the right)

NOTE -- Fixation times are 0 if a word more to the right was fixated before the first fixation on the current word.

-----------------------------------------------------------------------------------
eyetracking.fix.txt

subj_nr      : Subject number
sent_nr      : Sentence number
gaze_x	     : Horizontal pixel coordinate of fixation location (0 = left edge; 39 = left margin; 1024 = right edge)
               Each letter (including space and punctuation) was 14 pixels wide. 
gaze_y	     : Vertical pixal coordinate of fixation location (0 = top of display; 768 = bottom of display)
fix_duration : Duration of fixation in msec
letter_pos   : Fixated letter position in sentence (including spaces), even if the fixation was to the left of the left margin or
               to the right of the last letter. Position 0 is directly to the left of the first letter. 
word_pos     : Position of fixated word in sentence, or NaN if no word fixated. A fixation on the space before a word counts as a
               fixation on that word.
word         : Fixated word, or hyphen if no word fixated.
blink        : Indicates whether a blink was detected directly 'before' or 'after' the current fixation (or 'both' before and after).

NOTE -- Within each sentence, fixations are listed in the order in which they occurred. If any fixation during sentence presentation
        was registered as being outside the display bounds, this was regarded as a tracking error and no data is presented for that
        sentence.
NOTE -- The letter and word position are NaN if gaze_y is outside the range (300,475) because that was considered too far above or
        below the presented sentence.
-----------------------------------------------------------------------------------

