Research in English (historical and second language)

 

1)      Longdale (SLA)

a)      Corpus for the Longdale Nijmegen workshop 2014 (zip)

b)      List of part-of-speech (POS) tags (web)

c)      Description of the whole tagset (web)

d)      Notes (pdf) for the hands-on sessions (session1, session2, session3)

e)      Suggestions and remarks for queries (pdf)

2)      English historical linguistics  (Radboud University Nijmegen)

a)      Corpus for EHL class 2013 (zip)

3)      Lemmatization
The historical English corpora (YCOE, PPCME2, PPCEME, PPCMBE) are all being lemmatized. Priority is given to the verbs.

 

 

 

Lists of lemma’s and derived words (in normalized script)

Abbreviation

Period

Lemma lists

Remarks

OE

Old English

excel, html, txt

Comprehensive

ME

Middle English

excel, html, txt

Verbs only

eModE

Early Modern English

(none)

LmodE

Late Modern English

(none)

 

 

Normalization of scripts

Different scripts are being used to convey several crucial graphemes in OE and ME.

Here is the key to the way these graphemes are conveyed in the historical corpora as well as in the lemma lists provided above.

 

Grapheme

Name

Unicode

Historical corpora

Cesax

Lemma list

þ

thorn

0230

+t

þ

th

ð

eth

0240

+d

ð

dh

æ

ash

0254

+a

æ

ae

ȝ

yogh

021D

+g

ġ

y

 

 

Questions

 Please direct comments or questions to: E.Komen@Let.ru.nl

 

History:

15/jan/2014             Information for Longdale workshop

14/jan/2014             Reference to Longdale workshop

31/oct/2013            Added EHL-2013 corpus

21/oct/2013            Added lemma-lists for OE and ME