This page contains results of several experiments
The coreference chains of several “cesaxed” texts are available:
Text |
Period |
Chains |
O3 |
||
Saint Vincent |
O14 |
|
Euphrosyne (part) |
O14 |
|
Sawles |
M1 |
|
Kentish sermons |
M2 |
|
Horses |
M3 |
|
E1 |
||
E2 |
||
E3 |
||
E3 |
||
B1 |
||
B1 |
||
B3 |
||
Horses (Fleming) |
B3 |
|
Horses (Skeavington) |
B3 |
The following texts have been processe with Cesax. What can be downloaded here is just the text in the original language, and, if available, a translation into more modern English.
Title |
File |
Period |
Availability |
The apostle James |
cojames |
OE |
|
Saint Vincent |
covinceB |
OE |
|
Apollonius of Tyre |
coapollo.o3 |
O3 |
|
Oroonoko |
behn-e3-p1 |
E3 |
|
Letter from Pinney |
jpinney-e3-p1 |
E3 |
|
Robinson Crusoe |
defoe-1719 |
B1 |
|
English education |
brightland-1711 |
B1 |
|
Slave war |
long-1866 |
B3 |
1. Several types of first constituents in Dutch, as investigated using Corex:
a. Those containing “d-words” (VNW19,20,21 + d*)
b. Those containing any of Vnw19-21
c.
Those containing “d-adverbia” (i.e: toen, derhalve, dus)
(those with “toen” as VG2 do not count)
d.
All main clauses (SMAIN) containing a
first constituent
2. A CorpusStudio project on first constituents in Dutch:
a. The CorpusStudio project used for the CGN
b. The results of the CGN queries
3. A CorpusStudio project on first constituents in English:
a. The CorpusStudio project used for English
b. The results of the English queries
4. Do check out the Tagset used for CGN (the corpus of spoken Dutch)
Corex summary, listing the percentage of D-words found in main-clause constituents:
|
Number |
Percentage |
Any D-words (VNW19-21) |
2963 |
4,91% |
The d-adverbia |
29 |
0,05% |
Subtotal |
2992 |
4,96% |
All main clauses with first constituent |
60360 |
100,00% |
CorpusStudio comparison of D-words[2] within first constituents of main clauses with D-words anywhere.
|
Dutch |
Flemish |
Total CGN |
English (immediately first constituent) |
|||||
|
Free |
Fixed |
Free |
Fixed |
|
OE |
ME |
eModE |
LmodE |
anyDword |
23570 |
2187 |
11648 |
1356 |
38761 |
61028 |
19887 |
19935 |
10174 |
matFirstConst (Dword) |
12535 |
1094 |
5346 |
522 |
19497 |
14441 |
8495 |
5443 |
2945 |
Percentage |
53,2% |
50,0% |
45,9% |
38,5% |
50,3% |
23,7% |
42,7% |
27,3% |
28,9% |
CorpusStudio comparison of D-adverbs[3] within first constituents of main clauses with D-adverbs anywhere.
|
Dutch |
Flemish |
Total CGN |
English (immediately first constituent) |
|||||
|
Free |
Fixed |
Free |
Fixed |
|
OE |
ME |
eModE |
LmodE |
anyDadv |
9656 |
760 |
4032 |
531 |
14979 |
31427 |
12853 |
12168 |
3758 |
matFirstConst (Dadv) |
3801 |
253 |
1288 |
141 |
5483 |
17052 |
6940 |
5081 |
1429 |
Percentage |
39,4% |
33,3% |
31,9% |
26,6% |
36,6% |
39,9% |
48,8% |
34,9% |
24,4% |
Corpusstudio first constituents with D-forms (i.e: D-words and D-adverbs) compared
with those without D-forms
(only within main clauses, and only non-phrasal first
constituents):
|
Dutch |
Flemish |
Total CGN |
English |
|||||
|
Free |
Fixed |
Free |
Fixed |
|
OE |
ME |
eModE |
LmodE |
matFirstConst |
34118 |
4399 |
16583 |
3746 |
58846 |
66425 |
56805 |
63969 |
39677 |
matFirstConst (Dword) |
12535 |
1094 |
5346 |
522 |
19497 |
14441 |
8495 |
5443 |
2945 |
matFirstConst (Dadv) |
3801 |
253 |
1288 |
141 |
5483 |
12551 |
6278 |
4247 |
917 |
matFirstConst (Dadv + Dword) |
47,9% |
30,6% |
40,0% |
17,7% |
42,4% |
40,6% |
26,0% |
15,1% |
9,7% |
* Free = a,b,c,d,e,f,g,h,I,m,n; Fixed = j,k,l,o
1. PPfirst experiment using CorpusStudio Xquery project
a.
The results of the query.
(This is a large file, and may not open that easily in your web browser.)
b. The corpus research project
1. PreCore experiment using CorpusStudio Xquery project
a. The results divided over (grouped) time-periods in decreasing frequency of occurrence
b.
The results of the query.
(This is a large file, and may not open that easily in your web browser.)
c. A summary of the corpus research project
History:
16/oct/2012 ERK PreCore results
8/sep/2011 ERK Adapted D-results, incorporating project FirstConstD_eng_V3
2/sep/2011 ERK Added PPfirst experiment results
15/jul/2011 ERK Added html files of Cesaxed texts
12/jul/2011 ERK CorpusStudio project V3 gives better results
2/jul/2011 ERK First results with Corex and CorpusStudio (V2)
[1] CGN experiments: E.R.Komen. English experiments: R.Hebing & E.R.Komen.
[2] D-words must be understood as all words marked with @cat=VNW19, 20, 21 for the CGN. English “D-words” are those occurring in NPs of type “Dem” or “DemNP”.
[3] D-adverbs are a selection of daar, d’r, hier variants with @cat=BW for the CGN. English “D-adverbs” are words labelled as “ADV…”, they should begin with d, t, ð, or þ, possibly preceded by for-, and some obvious ones like forth are excluded.