This page contains results of several experiments

 

Coreference chains

 

The coreference chains of several “cesaxed” texts are available:

Text

Period

Chains

Apollonius of Tyre

O3

coapollo

Saint Vincent

O14

covinceB

Euphrosyne (part)

O14

coeuphr

Sawles

M1

cmsawles

Kentish sermons

M2

cmkentse

Horses

M3

cmhorses

Fischer

E1

fisher

Perrot

E2

perrot

Oroonoko

E3

behn

Pinney

E3

pinney

Robinson Crusoe

B1

defoe

English education

B1

brightland

Slave war

B3

long

Horses (Fleming)

B3

fleming

Horses (Skeavington)

B3

skeavington

Texts and translations

The following texts have been processe with Cesax. What can be downloaded here is just the text in the original language, and, if available, a translation into more modern English.

Title

File

Period

Availability

The apostle James

cojames

OE

org, eng

Saint Vincent

covinceB

OE

org, eng

Apollonius of Tyre

coapollo.o3

O3

org, eng

Oroonoko

behn-e3-p1

E3

org

Letter from Pinney

jpinney-e3-p1

E3

org

Robinson Crusoe

defoe-1719

B1

org

English education

brightland-1711

B1

org

Slave war

long-1866

B3

org

 

CGN results compared with English[1]

 

1.      Several types of first constituents in Dutch, as investigated using Corex:

a.       Those containing “d-words” (VNW19,20,21 + d*)

b.      Those containing any of Vnw19-21

c.       Those containing “d-adverbia” (i.e: toen, derhalve, dus)
(those with “toen” as VG2 do not count)

d.      All main clauses (SMAIN) containing a first constituent

2.      A CorpusStudio project on first constituents in Dutch:

a.       The CorpusStudio project used for the CGN

b.      The results of the CGN queries

 

3.      A CorpusStudio project on first constituents in English:

a.       The CorpusStudio project used for English

b.      The results of the English queries

 

4.      Do check out the Tagset used for CGN (the corpus of spoken Dutch)

 

Corex summary, listing the percentage of D-words found in main-clause constituents:

 

Number

Percentage

Any D-words (VNW19-21)

2963

4,91%

The d-adverbia

29

0,05%

Subtotal

2992

4,96%

All main clauses with first constituent

60360

100,00%

 

CorpusStudio comparison of D-words[2] within first constituents of main clauses with D-words anywhere.

 

Dutch

Flemish

Total

CGN

English

(immediately first constituent)

 

Free

Fixed

Free

Fixed

 

OE

ME

eModE

LmodE

anyDword

23570

2187

11648

1356

38761

61028

19887

19935

10174

matFirstConst (Dword)

12535

1094

5346

522

19497

14441

8495

5443

2945

Percentage

53,2%

50,0%

45,9%

38,5%

50,3%

23,7%

42,7%

27,3%

28,9%

 

CorpusStudio comparison of D-adverbs[3] within first constituents of main clauses with D-adverbs anywhere.

 

Dutch

Flemish

Total

CGN

English

(immediately first constituent)

 

Free

Fixed

Free

Fixed

 

OE

ME

eModE

LmodE

anyDadv

9656

760

4032

531

14979

31427

12853

12168

3758

matFirstConst (Dadv)

3801

253

1288

141

5483

17052

6940

5081

1429

Percentage

39,4%

33,3%

31,9%

26,6%

36,6%

39,9%

48,8%

34,9%

24,4%

 

 

Corpusstudio first constituents with D-forms (i.e: D-words and D-adverbs) compared with those without D-forms
            (only within main clauses, and only non-phrasal first constituents):

 

Dutch

Flemish

Total

CGN

English

 

Free

Fixed

Free

Fixed

 

OE

ME

eModE

LmodE

matFirstConst

34118

4399

16583

3746

58846

66425

56805

63969

39677

matFirstConst (Dword)

12535

1094

5346

522

19497

14441

8495

5443

2945

matFirstConst (Dadv)

3801

253

1288

141

5483

12551

6278

4247

917

matFirstConst (Dadv + Dword)

47,9%

30,6%

40,0%

17,7%

42,4%

40,6%

26,0%

15,1%

9,7%

 

 

* Free = a,b,c,d,e,f,g,h,I,m,n; Fixed = j,k,l,o

First constituent PP

1.      PPfirst experiment using CorpusStudio Xquery project

a.       The results of the query.
(This is a large file, and may not open that easily in your web browser.)

b.      The corpus research project

 

PreCore

1.      PreCore experiment using CorpusStudio Xquery project

a.       The results divided over (grouped) time-periods in decreasing frequency of occurrence

b.      The results of the query.
(This is a large file, and may not open that easily in your web browser.)

c.       A summary of the corpus research project

 

 

 

 

 

History:

16/oct/2012     ERK    PreCore results

8/sep/2011      ERK    Adapted D-results, incorporating project FirstConstD_eng_V3

2/sep/2011      ERK    Added PPfirst experiment results

15/jul/2011      ERK    Added html files of Cesaxed texts

12/jul/2011      ERK    CorpusStudio project V3 gives better results

2/jul/2011        ERK    First results with Corex and CorpusStudio (V2)

 



[1] CGN experiments: E.R.Komen. English experiments: R.Hebing & E.R.Komen.

[2] D-words must be understood as all words marked with @cat=VNW19, 20, 21 for the CGN. English “D-words” are those occurring in NPs of type “Dem” or “DemNP”.

[3] D-adverbs are a selection of daar, d’r, hier variants with @cat=BW for the CGN. English “D-adverbs” are words labelled as “ADV…”, they should begin with d, t, ð, or  þ, possibly preceded by  for-, and some obvious ones like forth are excluded.