Research in English (historical and second language)
1) Longdale (SLA)
a) Corpus for the Longdale Nijmegen workshop 2014 (zip)
b) List of part-of-speech (POS) tags (web)
c) Description of the whole tagset (web)
d) Notes (pdf) for the hands-on sessions (session1, session2, session3)
e) Suggestions and remarks for queries (pdf)
2) English historical linguistics (Radboud University Nijmegen)
a) Corpus for EHL class 2013 (zip)
3)
Lemmatization
The historical English corpora (YCOE, PPCME2, PPCEME, PPCMBE) are all being
lemmatized. Priority is given to the verbs.
Lists of lemma’s and derived words (in normalized script)
Abbreviation |
Period |
Lemma lists |
Remarks |
OE |
Old English |
excel, html, txt |
Comprehensive |
ME |
Middle English |
excel, html, txt |
Verbs only |
eModE |
Early Modern English |
(none) |
|
LmodE |
Late Modern English |
(none) |
Different scripts are being used to convey several crucial graphemes in OE and ME.
Here is the key to the way these graphemes are conveyed in the historical corpora as well as in the lemma lists provided above.
Grapheme |
Name |
Unicode |
Historical corpora |
Cesax |
Lemma list |
þ |
thorn |
0230 |
+t |
þ |
th |
ð |
eth |
0240 |
+d |
ð |
dh |
æ |
ash |
0254 |
+a |
æ |
ae |
ȝ |
yogh |
021D |
+g |
ġ |
y |
Please direct comments or questions to: E.Komen@Let.ru.nl
History:
15/jan/2014 Information for Longdale workshop
14/jan/2014 Reference to Longdale workshop
31/oct/2013 Added EHL-2013 corpus
21/oct/2013 Added lemma-lists for OE and ME