EHL – CorpusStudio

Erwin R. Komen

October 2010

 

1.            Task

Today’s task:

 

 How does SVO versus SOV change in sub clauses?

Our approach:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Step 1: Make a corpus research project

            Fill in the “General” tab.

 

Step 2: Define queries in the “Query” tab.

Name

Task

subS+O+V

subclauses with S,O,V in any order

subS-O-V

subclauses with order S…O…V

subS-V-O

subclauses with order S…V…O

 

Step 3: Put the queries in order using the “Construction” tab.

 

Step 4: Verify the order using the “Hierarchy” tab.

 

Step 5: Run the queries using Tools/ExecuteConstructor (F10)

 

Step 6: Look at the results in the “Results” tab.

 

Step 7: Copy the results to Excel, and make a graph.

 


 

2.            Labels

Clause level labels

Constituent

Examples

Explanation

IP

IP-MAT

IP-SUB

IP-INF

Main clause

Subclause

Infinitival clause

CP

CP-THT

CP-REL

CP-CLF

CP-ADV

That clause

Relative clause

Cleft

Adverbial clause

 

Clause constituent labels

Constituent

Examples

Explanation

NP

NP

NP-SBJ

NP-NOM

NP-GEN

NP-OB1

NP-OB2

NP-LFD

NP-MSR

Undetermined NP

Subject

Nominative NP

Genitive case NP

Direct object

Indirect object

Left dislocated NP

Measure NP

PP

PP

PP-LFD

PP-RSP

Any PP

Left dislocated PP

Resumptive PP

ADVP

ADVP

Adverbial phrase

 

Verb labels

Verb type

Examples

Explanation

to be

BED

BEP

BE

Past tense (“was”, “were”)

Present tense (“is”, “are”)

Infinitive

to have

HVD

HVP

HV

Past tense

Present tense

Infinitive

modals

MDD

MDP

MD

Past tense

Present tense

Infinitive

other verbs

VBD

VBP

VB

Past tense

Present tense

Infinitive

 

All label definitions can be found under “Syntactic Labels” at:

http://www-users.york.ac.uk/~lang22/YCOE/YcoeHome.htm

3.            Query commands

Type

Examples

Explanation

Domination

x iDoms y

x iDomsOnly y

x iDomsFirst y

x Dominates y

y is a child of x

y is the only child of x

y is the first child of x

y is a descendant of x

Order

x iPrecedes y

x iFollows y

x Precedes y

y is the immediately following sibling of x

y is the immediately preceding sibling of x

y is a sibling following on x (but others may intervene)

Further query commands can be found at:

http://corpussearch.sourceforge.net/CS-manual/SearchFunctions.html

4.            Definition file

// --------------------------------------------------------------------

// Name:     OE+MEU.def

// Goal:     Combined definitions for OE and ME processing

// History:

// 10-07-2009 ERK   Combined from Ans van Kemenade

// 18-12-2009 ERK   Added definition of "contrast"

// --------------------------------------------------------------------

 

// --------------------------------------------------------------------------------

// Definitions of different IP categories

// --------------------------------------------------------------------------------

finiteIP: IP-MAT*|IP-SUB*

matrixIP: IP-MAT*

subIP: IP-SUB*

anyCP: CP|CP-*

anyXP: *P-*|*P

negation: NEG-*|NEG

 

// --------------------------------------------------------------------------------

// Definitions of verbal categories

// --------------------------------------------------------------------------------

nonfiniteverb: *BE|*BAG*|*BEN*|*HV|*HVG*|*HVN*|*AX|*AXG*|*AXN*|*VB|*VAG*|*VAN*| VBN*|VBG*|HAN*|HAG*

finiteverb: BEI|BEP*|BED*|UTP|*HVI|*HVP*|*HVD*|*AXI|*AXP*|*AXD*|*MD|VBI|*VBP*| *VBD*|*DOI|*DOP*|*DOD*|NEG+BEI|NEG+BEP*|NEG+BED*|NEG+AXI|NEG+*AXP*|NEG+*AXD*| NEG+*MD|NEG+VBI|NEG+*VBP*|NEG+*VBD

unaccfiniteverb:BEI|BEP*|BED*|NEG+BEI|NEG+BEP*|NEG+BED*

finite_BE: BEP*|BED*|NEG+BEP*|NEG+BED*

progressive: *ing*|*yng*

 

// --------------------------------------------------------------------------------

// Definitions of different AP categories

// --------------------------------------------------------------------------------

anypp: PP|PP-*

someap: ADVP-*|ADJP*

timeap: ADVP-TMP*

then_word: then|+ten|+ta|+tonne|than

 

// --------------------------------------------------------------------------------

// Definitions of different NP categories

// --------------------------------------------------------------------------------

subjectoe: NP-NOM|NP-NOM-#|NP-NOM-RSP

subject: $subjectoe|NP-SBJ*

badsubject: EX

timenp: NP*TMP

anynp: NP|NP-*

leftdisnp: NP-*LFD*

resumpnp: NP-*RSP*

posspro: PRO$|PRO$^*

 

 

// --------------------------------------------------------------------------------

// The following definition of an object NP excludes from the list e.g.

//   NP-DAT-TMP, NP-GEN-TMP from the list

// --------------------------------------------------------------------------------

object: NP-OB*|NP-DAT|NP-DAT-[A-SU-Z]*|NP-GEN|NP-GEN-[A-SU-Z]*|NP-ACC|
NP-ACC-[A-SU-Z]*

argument: $object|$subjectoe

objectorpp: $object|PP-*

 

// --------------------------------------------------------------------------------

// Definitions of contents of NPs

// --------------------------------------------------------------------------------

noun: N-*|NR*|FW|*Q*|D*

dem: D^*|D-*|D

pronoun: PRO^N|PRO^A|PRO^G|PRO^D|PRO|DPRO^N|DPRO^A|DPRO^G|DPRO^D|PRO-*

nonpronominal: D*|ADJ*|N*|*Q*|NUM*|FP|FW|CP*|PTP*|V*|RP+V*|CONJ*

 

 

// -------------------------------------------------------------------

// Default values for ignore_nodes and ignore_words

// -------------------------------------------------------------------

// ignore_nodes: COMMENT|CODE|ID|LB|'|\"|,|E_S|.|/|RMV:*

// ignore_words: COMMENT|CODE|ID|LB|'|\"|,|E_S|.|/|RMV:*|0|\**

// -------------------------------------------------------------------