Research Report AI-1990-01

                           Artificial Intelligence Programs

                              The University of Georgia

                                Athens, Georgia 30602


                                         Available by ftp from

                                           aisun1.ai.uga.edu

                                            (128.192.12.9)


                                            Series editor: 

                                           Michael Covington

                                      mcovingt@aisun1.ai.uga.edu









                                                                                   1


                   A Dependency Parser for Variable-Word-Order Languages


                                         Michael A. Covington


                                   Artificial Intelligence Programs

                                       The University of Georgia

                                         Athens, Georgia 30602

                                            MCOVINGT@UGA.UGA.EDU


                                            January 2, 1990


                  ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ

                  0. Introduction

                          1. Variable word order: the problem

                          2. Dependency grammar (DG)

                          3. Unification-based dependency grammar

                          4. The parsing algorithm

                          5. The implementation

                          6. Evaluation of the IBM 3090 environment

                          7. Remaining issues

                          8. Conclusion

                          ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ


                  0.   Introduction


                        This paper presents a new approach to the recognition of

                  sentence structure by computer in human languages that have

                  variable word order. In a sense, the algorithm is not new; there

                  is good evidence that it was known 700 years ago (Covington 1984).

                  But it has not been implemented on computers, and the modern

                  implementations that are most like it fail to realize its crucial

                  advantage for dealing with variable word order.1 In fact,

                  present-day parsing technology is so tied to the fixed word order

                  of English that researchers in Germany and Japan customarily build

                  parsers for English rather than their own languages.


                        The new algorithm uses dependency grammar. Unlike the more

                  usual phrase structure grammars, a dependency grammar does not

                  divide the sentence up into phrases (constituents); instead, it

                  identifies the grammatical relations that connect one word to

                  another. This is advantageous in languages where the order of

                  words is variable and many of the constituents are discontinuous. 

                                          


                       1Phyllis  McLanahan  provided   invaluable  assistance   with

                  Russian  data. The  early stages  of this  work were  supported by

                  National  Science Foundation  Grant  IST-85-02477.  The  VM/Prolog

                  implementation of  GULP was  developed while visiting  the Seminar

                  fr natrlich-sprachliche Systeme, University of  Tbingen. Norman

                  Fraser and  Richard Hudson provided valuable  encouragement during

                  the later stages  of the project. Responsibility  for opinions and

                  errors rests solely with the author.









                  2


                        The algorithm presented here is implemented in Prolog on an

                  IBM 3090 and has been used successfully to parse Russian and Latin

                  sentences. The IBM 3090 is well suited for this and other

                  applications in artificial intelligence and symbolic computing

                  because of its large address space, memory caching, and ability to

                  prefetch along both alternatives of a conditional branch.

                  Performance is currently limited not by the hardware, but by the

                  VM/Prolog interpreter, which could be replaced by a considerably

                  faster compiler.



                  1.   Variable word order: the problem


                  1.1.  Most human languages have partly variable word order


                        Most of the languages of the world allow considerably more

                  variation of word order than does English. For example, the

                  English sentence


                        The dog sees the cat.


                  has six equally grammatical Russian translations:


                        Sobaka vidit koshku.

                        Sobaka koshku vidit.

                        Vidit sobaka koshku.

                        Vidit koshku sobaka.

                        Koshku vidit sobaka.

                        Koshku sobaka vidit.


                  These differ somewhat in emphasis but not in truth conditions. The

                  subject and object are identified, not by their positions, but by

                  their inflectional endings (-a for nominative case and -u for

                  accusative). By switching the endings, one can say "the cat sees

                  the dog" without changing the word order:


                        Sobaku vidit koshka. (etc.)


                        The languages of the world can be ranked as to the amount of

                  word order variability they allow. For example:


                        Almost no variation:    Chinese, English, French

                        Some variation:         Japanese, German, Finnish

                        Extensive variation:          Russian, Latin, Korean

                        Maximum variation:      Walbiri (Australia)


                  Because English is at one end of the scale -- permitting almost no

                  word order variation -- it is a poor sample of a human language to

                  do research on. A priori, one would expect that parsing techniques

                  developed solely for English might not work at all for other

                  languages, and might not be correct on a deeper level even for









                                                                                   3


                  English. In what follows I shall argue implicitly that this is the

                  case.



                  1.2.  Phrase-structure grammar (PSG) cannot handle variable word

                        order


                        Virtually all present-day linguistic theories analyze the

                  sentence by breaking it into substrings (constituents). For

                  example:



                                S

                   

                   

                          NP         VP

                   

                        D    N    V       NP

                   

                                        D    N

                                              

                        The  dog  sees  the  cat.



                        Here the dog is a noun phrase, sees the cat is a verb

                  phrase, and the grammar consists of phrase-structure rules (PS-

                  rules) such as


                        S Ä  NP + VP


                        NP Ä  D + N


                  supplemented to some extent with rules of other types.


                        This approach has been highly successful with English, but

                  it has trouble with variable word order for two reasons. First,

                  the order of the constituents is variable. Second, and more

                  seriously, variable word order often results in discontinuous

                  constituents. Consider the following sentence, from a short story

                  by Lermontov:


                  Khoroshaya   u       tebia   loshad'.

                  nom.                 pobj.   nom.

                  good         with    you     horse


                                                = `You have a good horse.'

                                                  (lit. `A good horse is with you.')


                  Here `good' and `horse' should, on any reasonable analysis, form a

                  constituent. Yet `you', which is not part of this constituent,









                  4


                  intervenes. A phrase-structure tree for this sentence would have

                  crossing branches, which phrase-structure grammar disallows:



                                     S



                                NP       VP




                           Adj           PP           N



                                      P      NP



                        khoroshaya     u    tebia    loshad'



                  (Russian omits the word is, allowing a VP to consist of a bare

                  predicative NP, PP, or AdjP.)


                        Even the object of a preposition can be broken up, as in the

                  example



                                      PP



                                   P      NP



                         NP                 N


                         N


                        minuty    cherez    dve

                        gen.                acc.

                        minutes   within    two


                              = `within two minutes' 

                                (lit. `within a pair of minutes')


                  from the same short story.


                        In Latin poetry, extremely scrambled sentences are common.

                  For example:









                                                                                   5


                                              S




                                       VP            NP




                   Adj                          NP          N




                           Adj      V       Adv     N



                  ultima  Cumaei  venit     iam  carminis  aetas

                  nom.    gen.                   gen.      nom.

                  last    Cumean  has-come  now  song      age


                              = `The last epoch of the Cumean song has now arrived'

                                 (Vergil, Eclogues IV.4)


                  Siewierska (1988) gives examples of discontinuity in Walbiri and

                  other languages.



                  1.3. Scrambling transformations do not solve the problem


                        One way to handle discontinuous constituents is to write

                  phrase-structure rules that generate them whole and then break

                  them up with syntactic transformations or "scrambling rules."

                  Twenty years ago, this was the preferred analysis (see e.g. Ross

                  1967). But such a theory claims that variable-order languages are

                  intrinsically more complex than fixed-order languages -- in fact

                  that a variable-order language is a fixed-order language plus a

                  possibly immense set of scrambling rules. 


                        If this were so, one would expect languages burdened by word

                  order variability to become "simpler" (i.e., more fixed) over

                  time, but this does not necessarily happen. Though the Indo-

                  European languages are generally drifting toward fixed order,

                  languages in other families, such as Finnish, are developing more

                  elaborate case systems that increase the possibilities for word

                  order variation. 


                        Intuitively, word order variability may well be a

                  simplification of the syntax that is compensated by a more

                  elaborate morphology. That is, a variable-word-order language is

                  one with a lack of word-order rules, not a superabundance of them.


                        More importantly, a transformational analysis is worse than

                  no help at all for parsing. Because transformations are tree-to-









                  6


                  tree mappings, a parser can only undo a transformation if it has

                  already recognized (parsed) the tree structure that represents the

                  output of the transformation. That is, the only way to parse a

                  transformed sentence is to undo the transformation -- but the only

                  way to undo the transformation is to parse the sentence first.



                  1.4.  ID/LP formalism cannot handle discontinuous constituents


                        Recently, several linguists have proposed handling variable

                  word order by splitting phrase-structure rules into two

                  components: immediate dominance rules (ID-rules) that say what a

                  phrase consists of, and linear precedence rules (LP-rules) that

                  state the order in which the constituents appear. 


                        ID/LP framework has been quite successful in capturing

                  generalizations about word order in fixed-order languages. Gazdar

                  et al. (1985), for instance, account for the whole of English word

                  order with just three LP rules. 


                        Another claim often made for ID/LP formalism is that it

                  accounts for word order variability. If the relative order of a

                  set of constituents is unstated by the LP rules, they are allowed

                  to occur in any order. 


                        But this is not enough. Removing the LP rules merely allows

                  reordering the elements within each constituent, i.e., the nodes

                  immediately dominated by a single node. It still does not account

                  for discontinuous constituents. Faced with substantial word order

                  variation, the ID/LP grammarian must replace the

                  transformationalist's scrambling rules with "flattening rules"

                  that simply discard constituent structure and make nearly all

                  words hang directly from the S node. 


                        Flattening was pioneered and then rejected by Uszkoreit

                  (1986a, 1987). The problem is that a "flat" structure is no

                  structure at all -- or to put it differently, a tree that shows

                  all the words hanging from the same node is not really a tree at

                  all; it claims only that the words form a sentence, and begs the

                  question of what relations interconnect them.


                        Because of this, ID/LP parsing algorithms such as those

                  discussed by Evans (1987) are not adequate for variable word order

                  languages. Nor is the variable-word-order ATN parsing technique

                  discussed by Woods (1987), which is equivalent to a form of ID/LP

                  grammar.



                  1.5. Nonconfigurationality does not solve the problem


                        Chomsky (1981), Hale (1983), and others have split the

                  languages of the world sharply into two types: `configurational'









                                                                                   7


                  languages, such as English, in which grammatical relations are

                  defined by tree structure and word order is fixed; and `non-

                  configurational' languages, such as Walbiri, in which tree

                  structure is less important and word order may or may not be

                  variable.


                        Like flattening, nonconfigurationality begs the question of

                  how to represent structure in, and how to parse, a non-

                  configurational language. Kashket (1986) has developed a free-

                  word-order parser consistent with Chomsky's theories, but it works

                  rather like a dependency parser, searching through the input

                  string for arguments of each word.


                        More importantly, many linguists remain unconvinced that

                  there is a distinction between configurational and non-

                  configurational languages. Siewierska (1988) cites extensive

                  evidence that languages form a continuum from fully fixed to fully

                  variable word order, with no sharp transition from one type to the

                  other.



                  1.6.  Current parsing technology is limited by the limitations of

                        phrase-structure grammar


                        Almost all known parsing algorithms are based on

                  constituency grammars. As a result, they have trouble handling

                  variable word order. The normal practice among researchers in

                  Germany and Japan is to build parsers for English -- which fits

                  the constituency model relatively well -- rather than for their

                  own languages, which do not (e.g., Kolb 1987, Matsumoto et al.

                  1983, Tomita 1986). A business magazine recently reported that the

                  Japanese are developing successful computer programs to translate

                  English into Japanese but cannot go the other way because Japanese

                  is much harder to parse (Wood 1987).


                        Constituency grammars are popular for parsing largely

                  because they are easily modeled by the context-free phrase-

                  structure grammars (CF PSGs) of formal language theory. Thus

                  efficient parsing algorithms are available and the complexity of

                  parsing task is easy to study. 


                        CF PSGs are not entirely adequate for human language, and

                  linguists' concern has therefore been how to augment them. The

                  major augmentations include transformations (Chomsky 1957),

                  complex symbols (Chomsky 1965), reentrant feature structures

                  (Kaplan and Bresnan 1982), and slash features to denote missing

                  elements (Gazdar et al. 1985). Gazdar's theory claims as a triumph

                  that it is closer to a CF PSG than any previous viable linguistic

                  theory.


                        However, one should remember that formal language theory --

                  the branch of mathematics from which PSG came -- has nothing to do









                  8


                  with languages, i.e., communication systems. Formal languages are

                  called languages only metaphorically (because they have grammars).

                  They are strings of symbols, and their most important use is to

                  represent sequences of operations performed by a machine. It is by

                  no means obvious that formal systems developed for this purpose

                  should be immediately applicable to human languages.


                        A much more satisfactory approach is to admit that

                  constituents are sometimes discontinuous and to develop a suitable

                  representation and parsing strategy. Recently, discontinuous

                  constituents have become an important issue in linguistic theory

                  (Huck and Ojeda 1987).


                        The advantages of dependency parsers for dealing with

                  discontinuity have not been generally recognized. Early

                  computational linguists worked with dependency grammars, but only

                  in forms that were easily convertible to PSGs and could be parsed

                  by essentially the same techniques (Hays and Ziehe 1960, Hays

                  1964, Bobrow 1967, Robinson 1970). Most present-day dependency

                  parsers impose an adjacency condition that explicitly forbids

                  discontinuous constituents (Starosta and Nomura 1986; Fraser 1989;

                  Hudson 1989; but not Hellwig 1986 and possibly not J„ppinen et al.

                  1986 and Schubert 1987).



                  2.   Dependency grammar (DG) 


                  2.1.  Dependency grammar analyzes structure as word-to-word links


                        The alternative to phrase structure grammar is to analyze a

                  sentence by establishing links between individual words,

                  specifying the type of link in each case. Thus we might say that,

                  in "The dog sees a cat,"


                        dog is the subject of sees

                        cat is the object of sees

                        the modifies dog

                        a modifies cat


                  or, speaking more formally,


                        dog depends on sees as subject

                        cat depends on sees as object

                        the depends on dog as determiner

                        a depends on cat as determiner


                  and sees is the head of the whole structure, since it does not

                  depend on or modify anything else.


                        This is dependency grammar. It has a long and distinguished 

                  history, having been used by traditional grammarians at least

                  since the Middle Ages (Covington 1984). The first modern treatment









                                                                                   9


                  is that of TesniŠre (1953, 1959). Present-day syntacticians who

                  advocate dependency grammar include Baum (1976), Bauer (1979),

                  Hudson (1980, 1984), Tarvainen (1982), Shaumyan (1987),2 Schubert

                  (1987), Mel'cuk (1988), Starosta (1988), and Fraser (1989).

                  Moreover, as I shall point out below, the latest constituency

                  grammars include constraints that bring them closer to dependency grammar.


                        There are several ways to represent a dependency analysis

                  graphically. We can annotate the sentence with arrows pointing

                  from head to dependent: 



                          The big dog sees a little cat.


                  Or we can draw a "dependency tree" or D-tree, in which each head

                  is represented by a node placed higher than that of its

                  dependents:


                                        .


                                   .                 .


                           .   .           .    .



                          The big dog sees a little cat.



                  This is equivalent to:



                                      sees

                   


                                  dog               cat



                          the big          a little



                  Following TesniŠre, we can make the tree neater by discarding the

                  information about word order and centering each node above its

                  dependents:






                                          


                       2Shaumyan uses  a combination of dependency  and constituency

                  representations.









                  10


                                        sees

                     

                     

                                  dog           cat

                      

                      

                              the    big      a    little



                  This, in turn, can be represented in outline-like form using

                  indentation rather than lines to indicate hierarchy:



                              sees

                                 dog

                                   the

                                   big

                                 cat

                                   a

                                   little



                  This last notation is particularly convenient for computer output.

                  It requires no line drawing, and annotations can be printed after

                  each word. 



                  2.2.  Modern linguistic theories are evolving toward dependency

                        grammar


                        Every dependency analysis that specifies word order is

                  equivalent to a constituency analysis that (1) has no labels or

                  features on nonterminal nodes, and (2) picks out one node as the

                  "head" of each phrase (Hays 1964, Gaifman 1965, Robinson 1970). 


                        To see the equivalence, observe that the D-tree


                            .

                                .

                                    .


                            a   b   c  


                  can be converted into the constituency tree:


                              .


                                  .


                            a   b   c









                                                                                  11


                  which has the same branching structure. To convert this back into

                  the D-tree without loss of information, we must know whether b or

                  c is the head of the constituent bc; if we incorrectly take c as

                  head, we will get the incorrect D-tree:


                            .


                                    .

                                .  


                            a   b   c


                  It is obvious that a is the head of the larger constituent because

                  only an individual word, not a phrase, can be a head. 


                        Moreover, the D-tree has no nodes separate from the words

                  themselves, whence the requirement that the corresponding

                  constituency tree contain no features or other information on non-

                  terminal nodes (unless of course it is copied unchanged from

                  terminal nodes).


                        Most present-day syntactic theories have adopted essentially

                  these restrictions. Jackendoff (1977) promoted the concept that

                  every phrase has a head which is a single word. He pointed out

                  that a PS-rule such as


                        noun phrase Ä  verb + adverb


                  ought to be impossible even though classical transformational

                  grammar permits it. 


                        Intuitively, every noun phrase must contain a noun, every

                  verb phrase must contain a verb, and every prepositional phrase

                  must contain a preposition -- or more generally, every X phrase

                  must contain an X. Jackendoff's X-bar theory, which formalizes

                  these intuitive constraints and defines a head for every phrase,

                  has been accepted without controversy by subsequent workers in

                  several different theoretical frameworks (Chomsky 1981, Bresnan

                  1982, Gazdar et al. 1985).3


                        Likewise, it is uncontroversial that most, if not all, of

                  the features on phrasal nodes should be copied from their heads.

                  (A plural noun phrase is headed by a plural noun; a singular verb

                  phrase is headed by a singular verb; and so on.) Indeed, Gazdar et

                  al. (1985) have a rule, the Head Feature Convention, to ensure

                  that this is so. This is almost equivalent to not making a

                                          


                       3Admittedly,  X-bar theory  distinguishes multiple  levels of

                  phrasal structure,  which dependency  grammar cannot  do. However,

                  these  multiple levels  have become  less popular  in recent  work

                  (compare Jackendoff 1977 to Radford 1989). 









                  12


                  distinction between the phrasal node and the head of the phrase

                  (e.g., the noun and the noun phrase of which it is the head).


                        Finally, the latest syntactic theories put less and less

                  emphasis on trees as a representation of structure. The emphasis

                  is shifting toward the grammatical relations that link the head of

                  a phrase to the other constituents of the phrase. Examples of this

                  approach include lexical-functional grammar (Kaplan and Bresnan

                  1982), Chomsky's theory of government and binding (1981), head-

                  driven phrase-structure grammar (Sag 1987), and categorial

                  grammars (Bouma 1985, Uszkoreit 1986c, Flynn 1987, Steedman 1987).



                        Significantly, Uszkoreit argues that the constituents

                  defined by categorial rules need not be continuous, and Bouma

                  specifically addresses the Australian language Walbiri (Warlpiri),

                  whose word order is extremely variable.



                  3.   Unification-based dependency grammar


                  3.1.  Dependency is a theory-dependent concept


                        The relation of head to dependent corresponds roughly to two

                  concepts already well developed in grammatical theory:


                  (1) The dependent presupposes the presence of the head. That is,

                  adjectives depend on nouns, not vice versa. Verbs are heads and

                  their subcategorized arguments (subject, object, etc.) are their

                  dependents.


                  (2) Semantically, functors are heads and arguments are dependents.

                  That is, if the meaning of word X is incomplete without the

                  meaning of word Y, but not vice versa, then X is the head and Y is

                  the dependent. For example, the subject and object are dependents

                  of the verb.


                        As Gazdar et al. (1985:189-192) have shown, this does not

                  settle the issue, because in a semantics that includes partial

                  functions, the choice of functor and argument is somewhat

                  arbitrary. 


                        In the past, dependency grammarians have tried to find some

                  single observable property, such as optionalness, that

                  distinguishes head from dependent in all constructions. My

                  position is that this is a mistake. Dependency is a theoretical

                  abstraction, and its identification depends on multiple criteria;

                  questions about it should be resolved in the way that best

                  captures syntactic and semantic generalizations.



                  3.2.  D-rules specify possible relations









                                                                                  13


                        Following Miller (1985), I formalize a dependency grammar by

                  writing "D-rules," i.e., rules that allow one word to depend on

                  another. However, instead of using symbols like N and V for noun

                  and verb, I use feature structures in the tradition of

                  unification-based grammar (Shieber 1986).


                        Thus the relation of noun to adjective in Russian or Latin

                  is described by the following rule, where G, N, and C are

                  variables:


                        ÚÄ              Ä¿        ÚÄ             Ä¿

                        ³ category: noun ³        ³ category: adj ³

                        ³ gender:   G    ³    Ä   ³ gender:   G   ³

                        ³ number:   N    ³        ³ number:   N   ³

                        ³ case:     C    ³        ³ case:     C   ³

                        ÀÄ              ÄÙ        ÀÄ             ÄÙ


                  That is: "A word with category adj, gender G, number N, and case C

                  can depend on a word with category noun, gender G, number N, and

                  case C."


                        This rule does not specify word order; in the rule, the head

                  is always written first. First approximations to some other rules

                  used by the parser are as follows:


                  Verb and subject:


                        ÚÄ              Ä¿       ÚÄ              Ä¿

                        ³ category: verb ³       ³ category: noun ³

                        ³ number:   N    ³   Ä   ³ number:   N    ³

                        ³ person:   P    ³       ³ person:   P    ³

                        ÀÄ              ÄÙ       ÀÄ              ÄÙ


                  Verb and object:


                        ÚÄ              Ä¿       ÚÄ              Ä¿

                        ³ category: verb ³   Ä   ³ category: noun ³

                        ³                ³       ³ case:     acc  ³

                        ÀÄ              ÄÙ       ÀÄ              ÄÙ


                  Preposition modifying verb:


                        ÚÄ              Ä¿       ÚÄ              Ä¿

                        ³ category: verb ³   Ä   ³ category: prep ³

                        ÀÄ              ÄÙ       ÀÄ              ÄÙ


                  Preposition and object:


                        ÚÄ              Ä¿       ÚÄ              Ä¿

                        ³ category: prep ³   Ä   ³ category: noun ³

                        ³ objcase:  C    ³       ³ case:     C    ³

                        ÀÄ              ÄÙ       ÀÄ              ÄÙ









                  14


                  These rules of course ignore many details.



                  3.3.  Unification builds and tests structures order-independently


                        This formalization relies crucially on unification (matching

                  and/or merging) of feature structures. The power of unification

                  comes from its ability to build complex objects through order-

                  independent operations.


                        Each of the feature structures in the rules above is only

                  partly instantiated -- that is, the values of most features are

                  unspecified. They will become instantiated through matching.

                  Crucially, an uninstantiated feature has no value; it is not

                  equivalent to a feature whose value is 0 or nil.


                        Two feature structures unify if (1) the features within them

                  that are already instantiated have matching values, and (2) every

                  feature that is instantiated in one structure but not in the other

                  becomes instantiated to the same value in both structures. Thus a

                  feature structure is built up with information acquired from many

                  sources -- some of it from the lexicon and some from every D-rule

                  that successfully applies to it.


                        The value of a feature can itself be a feature structure, in

                  which case the matching criterion applies recursively.


                        Unification is a process that succeeds or fails. If the

                  grammar requires two structures to unify and they cannot be

                  unified, the utterance is ungrammatical, i.e., it is not generated

                  or parsed by the grammar.


                        The result of unifying a set of structures is the same

                  regardless of the order in which the unifications are performed.

                  This means that, in a unification-based grammar or computing

                  system, many difficult questions about the order of operations

                  simply fail to arise. The power of unification comes from its

                  ability to make the most of whatever information is available at a

                  particular time, filling in missing details (or even missing

                  superstructures) when it becomes possible to do so.


                        For example, a transformational grammar might copy a feature

                  from one location to another. This implies that the feature is in

                  its original location before the copying takes place. A

                  unification-based grammar, by contrast, will merge features in two

                  locations (whether or not either one has been given a value) and,

                  either beforehand or afterward, give a value to one of them. The

                  question of which comes first, instantiation or copying, is

                  irrelevant if not meaningless.



                  3.4.  Representations of meaning are easy to build









                                                                                  15


                        The unification process can easily build semantic

                  representations as it goes. Consider for example the D-rule


                        ÚÄ                  Ä¿       ÚÄ               Ä¿

                        ³ category:  verb    ³       ³ category:  noun ³

                        ³ number:    N       ³   Ä   ³ number:    N    ³

                        ³ person:    P       ³       ³ person:    P    ³

                        ³ semantics: X(Y,Z)  ³       ³ semantics: Y    ³

                        ÀÄ                  ÄÙ       ÀÄ               ÄÙ


                  This is just the subject-verb rule above with a crude semantic

                  representation added.


                        Suppose this D-rule applies to the two words


                  ÚÄ                    Ä¿        ÚÄ                 Ä¿

                  ³ form:      vidit     ³        ³ form:      sobaka ³

                  ³ category:  verb      ³        ³ category:  noun   ³

                  ³ number:    1         ³        ³ number:    1      ³

                  ³ person:    3         ³        ³ person:    3      ³

                  ³ semantics: sees(U,V) ³        ³ semantics: dog    ³

                  ÀÄ                    ÄÙ        ÀÄ                 ÄÙ

                                                                         

                  Unification will give the following values to the variables:


                        N=1  P=3  X=sees  Y=U=dog  V=Z 


                  and as a result the semantics feature of vidit will have the value

                  sees(dog,V), where V will get a value through further unification

                  when the rule applies that introduces the object.


                        This is far from a complete theory of meaning, and the

                  reader is referred to Shieber (1986) and to Gazdar et al. (1985)

                  for further discussion of semantics in unification-based grammar.

                  The point here is simply that semantics in a dependency grammar

                  can use mechanisms that have already been extensively developed

                  for unification-based phrase-structure grammar.



                  3.5.  The meaning representation can ensure that obligatory

                        dependents are present                         


                        Any dependency grammar must distinguish between optional and

                  obligatory dependents. For instance, in English, the subject of

                  the verb is obligatory but the adverb of manner is not:4


                        John came.

                        John came quickly.

                        *Came.

                                          


                       4Asterisks denote ungrammatical examples.









                  16


                        *Came quickly.


                  Moreover, if an argument is obligatory it is also unique: a verb

                  may take several adverbs but it must have one and only one

                  subject.


                        John invariably came quickly.

                        *John Bill Harry came.


                        Obligatoriness of arguments is most easily handled through

                  the mechanism that builds representations of meaning. After all,

                  in the meaning, each verb can have only one value in each argument

                  position; this ensures there will never be two subjects or two

                  objects on a single verb. 


                        To this one can add a requirement that the complete semantic

                  representation, once built, must not contain any uninstantiated

                  argument positions. This, in turn, will ensure that all the

                  necessary arguments -- subject, object, and the like -- are

                  present.


                        The list-valued subcat or syncat features of Shieber (1986)

                  and others cannot be used to handle subcategorization in

                  dependency grammar, because each D-rule brings in only one

                  argument. Nor are the lists appropriate for a variable-word-order

                  language, since the arguments are distinguished by inflectional

                  form rather than by their proximity to the verb. 



                  3.6.  Restrictions can be imposed on word order


                        Even a variable-word-order language has some constraints on

                  word order. For example, in Russian and Latin, the preposition

                  must precede its object. In the sentence


                  devushka   kladyot   knigu   na    gazetu

                  nom.                 acc.          acc.

                  girl       puts      book    on    newspaper


                  the object of na must be gazetu, not knigu, even though both are

                  in the correct case. That is, the preposition can only combine

                  with a noun that follows it.


                        We can handle this by annotating the preposition-object rule

                  as "head first," i.e., the head must precede the object. Rules are

                  likewise allowed to be annotated "head last," and such rules will

                  be common in Japanese though I know of none in Russian or Latin.


                        Further, the prepositional phrase must be continuous; that

                  is, all the direct or indirect dependents of the preposition must

                  form a continuous string. Thus in









                                                                                  17


                  devushka  kladyot  na    gazetu       knigu

                  nom.                     acc.         acc.

                  girl      puts     on    newspaper    book


                  the object of na must be gazetu, not knigu; the verb can be

                  separated from its object but the preposition cannot. (Recall that

                  in the example minuty cherez dve above, it was the NP, not the PP,

                  that was discontinuous.) 


                        The prototype parser does not handle contiguity

                  requirements. One way of doing so might be to endow the

                  preposition (for example) with a feature contig that is copied

                  recursively to all its dependents, and then insist that the whole

                  string of words bearing the same value of this feature be

                  contiguous.


                        Hudson (1984) has shown that dependency grammars with

                  sufficient contiguity and word order requirements can handle

                  fixed-order languages such as English. 



                  4. The parsing algorithm


                  4.1.  The parser accepts words and tries to link them


                        Unlike other dependency parsers, this parser does not

                  require constituents to be continuous, but merely prefers them so.


                        The parser maintains two lists, PrevWordList (containing all

                  words that have been accepted from input) and HeadList (containing

                  only words that are not dependents of other words). These are

                  initially empty. At the end, HeadList will contain only one word,

                  the head of the sentence.


                        Parsing is done by processing each word in the input string

                  as follows:


                        (1) Search PrevWordList for a word on which the current word

                  can depend. If there is one, establish the dependency; if there is

                  more than one, use the most recent one on the first try; if there

                  is none, add the current word to HeadList.


                        (2) Search HeadList for words that can depend on the current

                  word (there can be any number), and establish dependencies for any

                  that are found, removing them from HeadList as this is done.


                        This algorithm has been used successfully to parse Russian

                  and Latin. To add the adjacency requirement, one would modify the

                  two steps as follows:









                  18


                        (1) When looking for the word on which the current word

                  depends, consider only the previous word and all words on which it

                  directly or indirectly depends. 


                        (2) When looking for potential dependents of the current

                  word, consider only a contiguous series of members of HeadList

                  beginning with the one most recently added.


                        With these requirements added, the algorithm would then be

                  equivalent to that of Hudson (1989).



                  4.2.  The parser prefers continuous phrases but does not require

                  them


                        Comparing the two algorithms just given, it is obvious that

                  the parser for continuous constituents is a special case of the

                  parser for discontinuous constituents, and that, in fact, it

                  usually tries the parses that involve continuous constituents

                  first.


                        This strategy makes the parser's preferences resemble those

                  of human beings in two respects.


                        First, even when discontinuous constituents are allowed, the

                  parser has an easier time parsing continuous ones. Analogously,

                  human languages that allow discontinuity usually do not indulge

                  heavily in it, unless for poetic effect or some other special

                  communicative purpose.


                        Second, because of its search strategy, the parser adheres

                  to the psycholinguistic principle that near attachments are

                  preferred (Frazier 1987). Consider Frazier's example sentence Ken

                  said Dave left yesterday. Hearers prefer to interpret this as



                            .


                        .              .

                                   .          .


                        Ken said Dave left yesterday.



                  with yesterday modifying left, rather than



                            .           


                        .              .      .

                                  .









                                                                                  19


                        Ken said Dave left yesterday.



                  with yesterday modifying said. 


                        The parsing algorithm has the same preference. Both HeadList

                  and PrevWordList are searched beginning with the most recently

                  added element. Thus the parser follows the strategy


                        Attach each word to the nearest potential head or

                        dependent.


                  This is the dependency-grammar counterpart of Frazier's principle,

                  which says


                        Attach each node to the node currently under

                        consideration if possible.


                  Ueda (1984) has argued on independent grounds that this principle

                  applies to Japanese, a variable-word-order language.



                  4.3.  Unavoidably, parsing discontinuous constituents is complex


                        It is well known that phrase-structure parsing of an n-word

                  sentence with the most efficient algorithm takes, at most, time

                  proportional to n3. The same is true, in principle, of dependency

                  parsing with a dependency grammar that is convertible to a phrase-

                  structure grammar -- i.e., one that does not allow discontinuity.


                        Parsing with discontinuity is unavoidably more complex.

                  After all, it allows more possibilities; the parser can never be

                  completely sure that a constituent is over or that a subsequent

                  constituent has not yet begun. An exact analysis has not yet been

                  carried out, but complexity of parsing with discontinuity may, in

                  the worst case, be as high as nn.


                        Three things should be emphasized. First, the extra

                  complexity comes from allowing discontinuous constituents, not

                  from using dependency grammar. Discontinuous constituents are

                  necessary in some human languages, and hence unavoidable. Second,

                  as will be shown below, worst-case complexity is irrelevant to

                  natural language processing. Third, the complexity can be reduced

                  by putting arbitrary limits on how far away from the current word

                  a head or dependent can be sought. There is every reason to

                  believe that the human brain imposes such limits on hearers'

                  ability to understand speech, and therefore that all human

                  languages are thus constrained.



                  4.4.  With ambiguity and features, natural language parsing is NP-

                        complete









                  20


                        Barton, Berwick, and Ristad (1987:89-96) prove that parsing

                  is NP-complete in any phrase structure grammar that includes (1)

                  agreement features that are copied from some nodes to others (like

                  the agreement of subject and verb in natural language), and (2)

                  lexical ambiguity (the ability to rewrite more than one complex

                  terminal symbol as the same surface form). They do this by

                  reducing 3SAT (a well-understood theorem proving problem) to a

                  parsing problem for such a grammar.


                        The dependency grammars proposed here have agreement

                  features and lexical ambiguity. Although the details have not been

                  worked out, it should be obvious that the same reduction can be

                  carried out for a dependency grammar that has order and contiguity

                  requirements. In this respect, dependency parsing is no better and

                  no worse than phrase structure parsing.



                  4.5.  Average-case performance is what matters


                        The Barton-Berwick-Ristad proof indicates that all adequate

                  natural-language parsing algorithms have the same worst-case

                  complexity, i.e., they are NP-complete (unless of course some of

                  them turn out to be worse). Fortunately, worst cases in natural

                  language are quite rare. They do exist; an example is the English

                  sentence


                        BUFFALO BUFFALO BUFFALO BUFFALO BUFFALO


                  which has the same structure as "Boston cattle bewilder Boston

                  cattle." Once given the structure, human beings have no difficulty

                  interpreting the sentence and seeing that it is grammatical,

                  though they find it extremely difficult to discover the structure

                  without help.


                        The moral is that even the parsers in our heads do not

                  perform well in the worst case. Average-case complexity is much

                  more important. One way of limiting the average-case complexity of

                  dependency parsing is to place a limit on, for example, the

                  maximum size of HeadList and/or PrevWordList. This will prohibit

                  massive inversions of word order and wide separation of related

                  constituents -- exactly the things that are rare or impermissible

                  even in free-word-order languages.



                  5. The implementation


                  5.1.  The parser is written in IBM VM/Prolog


                        The present implementation uses IBM VM/Programming in Logic

                  (`VM/Prolog' for short) on the IBM 3090 Model 400-2VF at the

                  University of Georgia. It uses the IBM `Mixed' syntax, which









                                                                                  21


                  closely resembles the standard Edinburgh dialect of Prolog

                  described by Clocksin and Mellish (1981).


                        Prolog is usually thought of as a language for automated

                  reasoning and expert systems (Kowalski 1979, Walker 1987).

                  Nonetheless, it originated as a language for writing parsers

                  (Colmerauer 1973) and remains eminently suited for this purpose.


                        Three factors make Prolog ideal for natural language

                  processing. First, Prolog is a language for data as well as

                  operations. Every data object that can exist in the language has a

                  written representation.5 Complex data structures can be created

                  gradually and straightforwardly, with no need to declare them in

                  advance or perform system calls to allocate memory. Lisp-like

                  lists, decomposable into head and tail, are one of the many

                  structuring devices available.


                        Second, Prolog is designed to consider multiple alternative

                  paths of computation. Parsers typically have to try many

                  alternatives in order to parse a sentence successfully, and in

                  Prolog this is almost automatic. A procedure can be given multiple

                  definitions to express alternative ways of solving a problem.

                  Every computation either succeeds or fails, and if a computation

                  fails, execution backs up to the most recent untried alternative

                  and proceeds forward again. The programmer can put in instructions

                  (`cuts') to suppress backtracking where it is not wanted.


                        Third, a form of unification is built into Prolog. Prolog

                  unification is not identical to feature structure unification, but

                  the basic idea is the same: make the structures alike by

                  instantiating (giving values to) variables. For example, the list

                  [a,b,X] can unify with [Y,b,c] with the instantiations X=c, Y=a.

                  However, [X,b,X] cannot unify with [a,b,c] because X cannot take

                  on two values at once.



                  5.2.  GULP extends Prolog by adding feature structure unification


                        Prolog unification differs from feature structure

                  unification in one crucial way: Prolog identifies corresponding

                  features by position, whereas in feature structures, features are

                  identified by name. 


                        This is a substantial obstacle for implementing unification-

                  based grammars. The grammatical theory requires that any feature

                  structure should be unifiable with any other unless feature values

                  prevent it. This means every feature structure must reserve a

                                          


                       5Except  for  certain pathological  structures  that  contain

                  pointers to themselves. These  structures are not normally created

                  and some Prolog implementations treat them all as runtime errors.









                  22


                  position for every feature that occurs in the grammar, even though

                  only one or two of them are mentioned in any specific instance. To

                  represent


                        ÚÄ          Ä¿

                        ³ case: nom  ³

                        ³ num:  sing ³

                        ÀÄ          ÄÙ


                  the programmer has to write something like


                  [V1,V2,V3,V4,V5,V6,nom,V8,V9,V10,V11,sing,V13,V14,V15]


                  if the grammar uses a total of 15 features. Typographical errors

                  are inevitable.


                        There are two ways out of the dilemma: modify the Prolog

                  unifier, or modify Prolog syntax. If the unifier were modified, it

                  would be possible to write something like [case(nom),num(sing)]

                  and have the unifier figure out that, for example, this is

                  supposed to match [num(N),case(nom),pers(3)]. The trouble is that

                  this approach really slows down the program; all the extra work

                  has to be done at run time whenever a unification is attempted. 


                        GULP (Covington 1987, 1989) is an extension of Prolog that

                  modifies the syntax instead. The programmer writes feature

                  structures such as:6


                        case % nom %% num % sing


                  and GULP preprocesses the Prolog program to convert these into

                  list-like structures in which features are identified by position

                  rather than by name. The value of a feature can be any Prolog

                  object, including another feature structure. A routine is provided

                  to display feature structures in a neatly indented style. 












                                          


                       6In the original ASCII version of GULP, case:nom :: num:sing.

                  The colon is  already used  for another purpose  in VM/Prolog  and

                  although   VM/Prolog   is   highly   modifiable,   the   requisite

                  modifications have not yet been done.









                                                                                  23


                  5.3.  The implementation consists of a grammar, a lexicon, and a

                  parser


                  5.3.1.            Feature set


                        In the prototype parsers for Russian and Latin,7 each word

                  is represented by a feature structure. The features used are: 


                  phon -- The phonological or orthographic form of the word.


                  cat -- The syntactic category (noun, verb, etc.).


                  case, num, gen, pers -- Grammatical agreement features (case,

                        number, gender, and person). For brevity, case is used on

                        the preposition to mark the case required in the object;

                        this lacks generality, because other words (e.g.,

                        participles or the adjective `similar') can be in one case

                        while taking a complement in another.


                  id -- An identifying number assigned to each word in the input

                        string so that separate occurrences of the same word form

                        will not unify with each other.


                  subj, obj -- If the word is a verb, these become instantiated to

                        the values of the id features of its subject and object

                        respectively. These are merely stand-ins for argument

                        positions in the semantic component that has not been

                        implemented.


                  gloss -- An English translation of the word, for annotating

                        output.


                  gr -- The grammatical relation borne by the word to its head

                        (subject, object, modifier, etc.); for annotating output.

                        This identifies the D-rule that was used to establish the

                        relationship.


                  dep -- An open list (i.e., a list with an uninstantiated tail)

                        containing pointers to the full feature structures of all of

                        the word's dependents. The whole dependency tree can be

                        traced by recursively following the dep feature of the main

                        verb.



                  5.3.2.            Lexical entries


                        Since this parser was built purely to examine a syntactic

                  problem, its lexical component ignores morphology and simply lists

                                          


                       7Only the Russian  parser runs on the 3090; the  Latin parser

                  runs (considerably slower!) on a PS/2 Model 50 using ALS Prolog.









                  24


                  every form of every word. Part of the lexicon is shown in Listing

                  1 (at the end of this paper). In conventional notation, the

                  lexical entry for Russian sobaka, for example, is:


                        ÚÄ             Ä¿

                        ³ phon:  sobaka ³

                        ³ cat:   noun   ³

                        ³ gloss: 'dog'  ³

                        ³ case:  nom    ³

                        ³ num:   sg     ³

                        ³ gen:   fem    ³

                        ÀÄ             ÄÙ

                                   

                  On input, a string of words is converted by the lexical scan

                  procedure into a list of feature structures, each of them only

                  partly instantiated. (For instance, the id and dep features are

                  not instantiated in the structure above.) The parser instantiates

                  the structures further as it does its work.



                  5.3.3.            D-rules


                        D-rules are stored as Prolog clauses with the principal

                  functor `<<', which is written between two feature structures.

                  Recall that a feature structure is a series of feature-value pairs

                  linked by `%%' and a feature is linked to its value by `%'. Using

                  all these notational devices, the adjective-noun rule


                        ÚÄ              Ä¿        ÚÄ             Ä¿

                        ³ category: noun ³        ³ category: adj ³

                        ³ gender:   G    ³    Ä   ³ gender:   G   ³

                        ³ number:   N    ³        ³ number:   N   ³

                        ³ case:     C    ³        ³ case:     C   ³

                        ÀÄ              ÄÙ        ÀÄ             ÄÙ


                  would be written in Prolog as:


                  cat%noun %% gen%G %% num%N %% case%C

                                             << cat%adj %% gen%G %% num%N %% case%C.


                  The complete set of D-rules is shown in Listing 2.



                  5.3.4.            Parsing process and output


                        The parser implements the algorithm described in section 4.1

                  above. Crucially, HeadList and PrevWordList are really lists of

                  pointers. The same word can appear in both lists, and when this

                  happens, there is actually only one copy of the word in memory.

                  Thus its features can be instantiated regardless of which list the

                  word was accessed through. At the end, HeadList has only one

                  element, the main verb.









                                                                                  25


                        Figure 3 shows a sample of the parser's output, which is

                  displayed by following the dep features from word to word to

                  obtain the complete dependency network. The values of the phon,

                  gloss, and gr features are displayed for each word.


                        Because it is written in Prolog, the parser automatically

                  has the ability to backtrack and try alternatives. In Figure 3,

                  this is put to good use to find two parses for an ambiguous

                  sentence.



                  6. Evaluation of the IBM 3090 environment


                  6.1.  Parsing (and natural language processing generally) have

                        specific machine requirements


                        Parsing is one of many applications that fall into the realm

                  of symbolic computing because the objects manipulated are not

                  numbers, nor character strings, but rather abstract symbols to

                  which the programmer assigns a meaning. The leading languages for

                  symbolic computation are Prolog and Lisp. Symbolic computation can

                  be done in many other languages, but it is cumbersome.


                        Symbolic computation requires the ability to create complex

                  data structures of arbitrary shape at run time -- parse trees,

                  feature structures, nested lists, and the like. Necessarily, such

                  structures occupy noncontiguous memory locations and are held

                  together by pointers. For example, the simplest way to represent

                  the list (a . (b . (c . nil))) (more commonly written [a,b,c]) is

                  the following:


                         ÚÄÄÄÂÄÄÄ¿     ÚÄÄÄÂÄÄÄ¿     ÚÄÄÄÂÄÄÄ¿

                  ÄÄÄÄÄÄ ³ ³ ³ ÄÄÅÄÄÄÄ ³ ³ ³ ÄÄÅÄÄÄÄ ³ ³ ³ ³ ³

                         ÀÄÅÄÁÄÄÄÙ     ÀÄÅÄÁÄÄÄÙ     ÀÄÅÄÁÄÅÄÙ

                           ³             ³             ³   ³

                                                            

                           a             b             c   nil


                  This preserves the essential properties of the list: it can be

                  recognized and processed one element at a time, without knowing

                  the total length; it can be broken into head and tail at any

                  point; and if the last link is uninstantiated, the list can be

                  lengthened by instantiating it. (Prolog actually uses a more

                  complex representation of a list with additional pointers to the

                  dot.)


                        Far from wasting time, pointer references normally speed up

                  symbolic computation. The reason is that it is faster to compare

                  pointers to atomic symbols than to compare the symbols themselves.

                  Accordingly, it is standard practice to tokenize all programs and

                  data, i.e., replace all atoms with pointers to a symbol table.









                  26


                        The execution of a symbolic program consists largely of

                  pointer dereferencing, comparisons, and conditional branches.

                  There is very little arithmetic; the objects being processed are

                  not numbers, and even their addresses are not calculated but

                  rather looked up via pointers.


                        These requirements are not artifacts of using Prolog or

                  Lisp; they are imposed by the applications. Small non-numeric

                  programs, including some compilers, have been written using

                  conventional data structures (arrays, character strings, etc.),

                  but for larger applications, the advantages of symbolic

                  computation are overwhelming. This is particularly the case in

                  parsing natural language because, unlike a programming language,

                  English cannot be designed to fit a simple transition network or

                  require only one character of lookahead.



                  6.2.  The IBM 3090 is well suited to symbolic computing


                        The IBM 3090 is a promising machine for symbolic computing

                  because, compared to other supercomputers, it is much less

                  narrowly specialized for non-symbolic applications. By contrast,

                  other supercomputers are specialized for vector arithmetic (Cray-

                  1, ETA-10) or for multiprocessing with relatively small amounts of

                  fast memory on each CPU (Intel iPSC, Control Data CYBERPLUS; see

                  Stearns and Covington 1987).


                        By contrast, the IBM 3090 reflects the System/360 and

                  System/370 heritage of design for all-purpose computing (Padegs

                  1981, Tucker 1986). Some of its "super" features are designed to

                  speed up all types of programs, not just numeric applications.

                  Perhaps more importantly, the 3090 imposes no penalty for the use

                  of instructions for which it was not specifically optimized.


                        Symbolic computing, and especially natural language

                  processing, requires a machine with large memory, fast access to

                  arbitrary non-consecutive memory locations, and the ability to

                  execute conditional branches rapidly.


                        The IBM 3090 meets all these requirements. At the University

                  of Georgia, a program can run in a 100-megabyte virtual memory

                  region if needed. The prototype parser uses only 6 megabytes and

                  could probably get by with less, but it is only a small part of a

                  foreseeable integrated natural language processing system.

                  (Natural language processing is not an end in itself -- it will

                  ultimately be the user interface to some other application, which

                  needs space of its own to run in.)


                        The large address space of the 3090 and the ease of access

                  to non-contiguous locations facilitate the use of the data

                  structures needed in symbolic computation. Non-contiguity of

                  memory locations is no impediment to caching. This is important









                                                                                  27


                  because data structres are allocated piece by piece and accessed

                  by pointers. There are no arrays or vectors in Prolog, and arrays

                  are uncommon in other symbolic processing languages.


                        Finally, the 3090 instruction pipeline can prefetch along

                  both alternatives of a conditional branch instruction. This is

                  important because parsing consists largely of decision making;

                  prefetching speeds up execution considerably.



                  6.3.  VM/Prolog is fast, but a much faster Prolog is possible


                        VM/Prolog is an interpreter, not a compiler. It was designed

                  for versatility, not speed, and is respectably fast, but it dates

                  from the first generation of full-featured Prolog implementations

                  in the mid-1980s. Prolog implementations for other CPUs have

                  advanced considerably since then. Table 1 (at the end of this

                  paper) shows that, on an IBM PS/2, there is a factor of 20

                  difference in speed between interpreted and compiled Prolog. 

                        Whether a comparable speedup could be achieved on the 3090

                  is uncertain, but there is clearly room for improvement. The

                  internals of the present interpreter have not been made public,

                  but the interpreter may gain some of its speed from a tight inner

                  loop that resides entirely in the 64K cache. 


                        Even so, a compiler has to be faster than an interpreter. If

                  a good compiler were available, and the speedup factor were indeed

                  10 to 20, then the 3090 would execute Prolog, not at 500 kLIPS

                  (which is itself impressive), but at an unprecedented speed of 5

                  to 10 megaLIPS.



                  7. Remaining issues


                  7.1.  Dependency is problematic in some constructions


                        There are constructions in which it is not clear which word

                  is the head and which is the dependent. Prepositional phrases are

                  an example. In a sentence like He came after lunch, it makes sense

                  to treat after as a modifier of the verb (it says he came after

                  something), and lunch as a required dependent of after. 


                        But in some constructions the preposition seems to be much

                  more tightly bound to the noun. For example, in Spanish the direct

                  object of the verb is sometimes marked with the preposition a

                  (which in other contexts means `to'). Does such a direct object

                  depend directly on the verb -- in which case the preposition

                  depends on the noun, rather than vice versa -- or do Spanish verbs

                  sometimes take a preposition instead of a noun as direct object?


                        There are other problematic constructions. Is the verb the

                  head of the sentence? We have assumed so, but there is a time-









                  28


                  honored traditional analysis that treats the subject rather than

                  the verb as the head (Covington 1984). And what about relative

                  clauses and other embedded sentences?


                        Fortunately, none of these problems is daunting. The

                  question is which is the best analysis, not whether there is a

                  possible analysis. In any case, the same questions of headship

                  arise within X-bar theory and are the object of vigorous research

                  efforts there (see e.g. Radford 1989).



                  7.2.  Conjunctions are problematic for both DG and PSG


                        Conjunctions pose a special problem. In a sentence like Joe

                  and Max arrived, the verb seems to have two subjects. Intuitively,

                  Joe and Max forms a single unit that serves as the subject. But

                  dependency grammar cannot describe this unit; dependency grammar

                  can only connect the verb to one single word. Phrase-structure

                  grammar seems to have the upper hand.


                        But Hudson (1988) has shown that PSG is not much better off

                  than DG. Consider for example the sentence


                        John drank coffee at breakfast and tea at lunch.


                  Here and joins coffee at breakfast with tea at lunch. Yet neither

                  coffee at breakfast nor tea at lunch is a constituent or a

                  grammatical unit. No reasonable constituency analysis comes out

                  any better than dependency grammar.


                        From this, Hudson argues that conjunctions license doubling

                  up of grammatical relations -- that is, because of and, the verb

                  can take two objects and two prepositional phrases, instead of

                  just one of each. Clearly, this analysis works just as well in DG

                  as in PSG.


                        The alternative is to argue that, at least some of the time,

                  conjunctions show the effect of a post-syntactic operation on the

                  string of words -- some kind of ellipsis or rearrangement not

                  based on grammatical relations, analogous to the insertion of

                  parenthetical remarks.



                  7.3.  Word order variation affects emphasis and cohesion


                        As Karttunen and Kay (1984) have noted, word order is

                  significant even when it is variable. The first element in the

                  sentence is most likely to be the topic (the continuing concern of

                  the discourse), and new information is introduced later in the

                  sentence.









                                                                                  29


                        A dependency parser can easily keep track of the actual word

                  order, or the position of various words relative to each other, by

                  means of additional features. The semantic component of the

                  grammar can use these features to identify topic and comment and

                  to recognize other effects of word order.



                  8. Conclusions


                        Variable-word-order parsing is an important but neglected

                  problem; progress on it is necessary if natural language

                  processing is ever going to deal with a wide variety of languages

                  other than English. The work reported here has shown that

                  dependency parsing is a feasible approach to the handling of

                  variable word order. The apparently high worst-case computational

                  complexity of dependency parsing is not an objection because

                  average-case rather than worst-case complexity is what matters;

                  even the human brain does not process `worst cases' successfully.


                        The technique presented here derives much of its power from

                  unification-based grammar, a formalism developed to augment

                  phrase-structure grammar but equally applicable to dependency

                  grammar. By unifying feature structures, the grammar can build

                  representations of syntax and meaning in a powerful, order-

                  independent way.


                        Some questions remain to be answered -- such as how to

                  handle conjunctions and subordinate clauses in dependency

                  grammar -- but the work of Hudson, Starosta, and others has shown

                  that satisfactory treatments are possible, and the question is now

                  which analysis is best, rather than whether a satisfactory

                  analysis can be obtained.




               References


                  Barton, G. E.; Berwick, R. C.; and Ristad, E. S. (1987)

                        Computational complexity and natural language. Cambridge,

                        Massachusetts: MIT Press.


                  Bauer, L. (1979) Some thoughts on dependency grammar. Linguistics

                        17 (new series) 301-315.


                  Baum, R. (1976) `Dependenzgrammatik': TesniŠres Modell der

                        Sprachbeschreibung in wissenschaftsgeschichtlicher und

                        kritischer Sicht. (Zeitschrift fr romanische Philologie,

                        Beiheft 151.) Tbingen: Max Niemeyer.


                  Bobrow, D. G. (1967) Syntactic theories in computer

                        implementations. Automated language processing, ed. H.

                        Borko, 215-251.









                  30


                  Bouma, G. (1985) Kategoriale grammatiek en het Warlpiri. Glot

                        8:227-256.


                  Bresnan, J. (1982a) Control and complementation. Bresnan

                        1982b:282-390.


                  Bresnan, J., ed. (1982b) The mental representation of grammatical

                        relations. Cambridge, Massachusetts: MIT Press.


                  Chomsky, N. (1957) Syntactic structures. The Hague: Mouton.


                  --- (1965) Aspects of the theory of syntax. Cambridge,

                        Massachusetts: MIT Press.


                  --- (1981) Lectures on government and binding. Dordrecht: Foris.


                  Clocksin, W. F., and Mellish, C. S. (1981) Programming in Prolog.

                        Berlin: Springer.


                  Colmerauer, A. (1973) Les Systemes-Q ou un formalisme pour

                        analyser et synth‚tiser des phrases sur ordinateur.

                        Publication interne No. 43, D‚partement d'Informatique,

                        Universit‚ de Montr‚al.


                  Covington, M. A. (1984) Syntactic theory in the High Middle Ages.

                        Cambridge University Press.


                  --- (1987) GULP 1.1: an extension of Prolog for unification-based

                        grammar. Research report 00-0021, Advanced Computational

                        Methods Center, University of Georgia.


                  --- (1989) GULP 2.0: an extension of Prolog for unification-based

                        grammar. Research report AI-1989-01, Artificial Intelligence

                        Programs, University of Georgia.


                  Covington, M. A., and Vellino, A. (1986) Prolog arrives. PC Tech

                        Journal 4.11:52-69. 


                  Evans, R. (1987) Direct interpretations of the GPSG formalism. J.

                        Hallam and C. Mellish, eds., Advances in artificial

                        intelligence (Proceedings of the 1987 AISB conference).

                        Chichester: Wiley.


                  Flynn, M. (1987) Categorial grammar and the domain specificity of

                        universal grammar. Garfield 1987:173-186.


                  Fraser, N. M. (1989) Parsing and dependency grammar. UCL Working

                        Papers in Linguistics, vol. 1, 296-319. University College

                        London.


                  Frazier, L. (1987) Theories of sentence processing. Garfield

                        1987:291-307.









                                                                                  31


                  Gaifman, H. (1965) Dependency systems and phrase-structure

                        systems. Information and Control 8:304-337.


                  Garfield, J. L., ed. (1987) Modularity in knowledge representation

                        and natural-langauge understanding. Cambridge,

                        Massachusetts: MIT Press.


                  Gazdar, G.; Klein, E.; Pullum, G.; and Sag, I. (1985) Generalized

                        phrase structure grammar. Cambridge, Massachusetts: Harvard

                        University Press.


                  Hale, K. (1983) Warlpiri and the grammar of non-configurational

                        languages. Natural Language and Linguistic Theory 1:5-47.


                  Hays, D. G. (1964) Dependency theory: a formalism and some

                        observations. Language 40:511-525.


                  --- (1966) Parsing. Readings in automatic language processing, ed.

                        D. G. Hays, 73-82. New York: American Elsevier.


                  Hays, D. G., and Ziehe, T. W. (1960) Studies in machine

                        translation -- 10: Russian sentence-structure determination.

                        Project RAND Research Memorandum RM-2538. Santa Monica: RAND

                        Corporation.


                  Hellwig, P. (1986) Dependency unification grammar. Proceedings,

                        COLING 86, 195-198.


                  Huck, G. J., and Ojeda, A. E., eds. (1987) Syntax and semantics,

                        vol. 20: Discontinuous constituency. Orlando: Academic

                        Press.


                  Hudson, R. A. (1980) Constituency and dependency. Linguistics 18

                        (n.s.) 179-198.


                  --- (1984) Word grammar. Oxford: Blackwell.


                  --- (1988) Coordination and grammatical relations. Journal of

                        Linguistics 24:303-342.


                  --- (1989) Towards a computer-testable Word Grammar of English.

                        UCL Working Papers in Linguistics, vol. 1, 321-339.

                        University College London.


                  IBM (1985) VM/Programming in Logic program description/operations

                        manual. IBM publication SH20-6541-0.


                  Jackendoff, R. (1977) X  syntax: a study of phrase structure.

                        Cambridge, Massachusetts: MIT Press.









                  32


                  J„ppinen, H.; Lehtola, A.; and Valkonen, K. (1986) Functional

                        structures for parsing dependency constraints. Proceedings,

                        COLING 86, 461-463.


                  Kaplan, R. M., and Bresnan, J. (1982) Lexical-functional grammar:

                        a formal system for grammatical representation. Bresnan

                        1982b:173-281.


                  Karttunen, L., and Kay, M. (1984) Parsing in a free word order

                        language. Dowty, D. R., et al., eds., Natural language

                        parsing, 279-306. Cambridge University Press.


                  Kashket, M. B. (1986) Parsing a free-word-order language:

                        Warlpiri. Proceedings, 24th Annual Meeting of the

                        Association for Computational Linguistics, 60-66. 


                  Kilbury, J. (1984) Earley-basierte Algorithmen fr direktes Parsen

                        mit ID/LP-Grammatiken. KIT-Report 16, Technische Universit„t

                        Berlin.


                  Kolb, H.-P. (1987) Diskursrepr„sentationstheorie und Deduktion.

                        Linguistische Berichte 110:247-282.


                  Kowalski, R. (1979) Logic for problem solving. New York: North-

                        Holland.


                  Matsumoto, Y.; Tanaka, H.; Hirakawa, H.; Miyoshi, H.; and

                        Yasukawa, H. BUP: a bottom-up parser embedded in Prolog. New

                        Generation Computing 1:145-158.


                  Mel'cuk, I. A. (1988) Dependency syntax: theory and practice.

                        Albany: State University Press of New York.


                  Miller, J. (1985) Semantics and syntax. Cambridge University

                        Press.


                  Padegs, A. (1981) System/360 and beyond. IBM Journal of Research

                        and Development 25:377-390.


                  Radford, A. (1989) Transformational grammar. Cambridge University

                  Press.


                  Robinson, J. J. (1970) Dependency structures and transformational

                        rules. Language 46:259-285.


                  Ross, J. R. (1967) Constraints on variables in syntax.

                        Dissertation, M.I.T. Published as Infinite syntax, Norwood,

                        N.J.: Ablex, 1986.


                  Sag, I. (1987) Grammatical hierarchy and linear precedence. Huck

                        and Ojeda 1987:303-340.









                                                                                  33


                  Schubert, K. (1987) Metataxis: contrastive dependency syntax for

                        machine translation. Dordrecht: Foris.


                  Shaumyan, S. (1987) A semiotic theory of language. Bloomington:

                        Indiana University Press.


                  Shieber, S. M. (1984) Direct parsing of ID/LP grammars.

                        Linguistics and Philosophy 7:135-154.


                  --- (1986) An introduction to unification-based approaches to

                        grammar. (CSLI Lecture Notes, 4.) Stanford: CSLI.


                  Siewierska, A. (1988) Word order rules. London: Croom Helm.


                  Starosta, S. (1988) The case for lexicase. London: Pinter.


                  Starosta, S., and Nomura, H. (1986) Lexicase parsing: a lexicon-

                        driven approach to syntactic analysis. Proceedings, COLING

                        86, 127-132.


                  Stearns, R. E., and Covington, M. (1987) Prolog on the CYBERPLUS:

                        a feasibility study. Research report 01-0019, Advanced

                        Computational Methods Center, University of Georgia.


                  Steedman, M. (1987) Combinatory grammar and human language

                        processing. Garfield 1987:187-210.


                  Tarvainen, K. (1982) Einfhrung in die Dependenzgrammatik.

                        Tbingen: Niemeyer.


                  TesniŠre, L. (1953) Esquisse d'une syntaxe structurale. Paris:

                        Klincksieck. Cited by Robinson (1970).


                  --- (1959) l‚ments de la syntaxe structurale. Paris: Klincksieck.


                  Tomita, M. (1986) Efficient parsing for natural language. Boston:

                        Kluwer.


                  Tucker, S. G. (1986) The IBM 3090 system: an overview. IBM Systems

                        Journal 25.1:4-19.


                  Ueda, M. (1984) Notes on parsing in Japanese. Unpublished.

                        Department of Linguistics, University of Massachusetts, Amherst.


                  Uszkoreit, H. (1986a) Word order and constituent structure in

                        German. (CSLI Lecture Notes, 8.) Stanford: CSLI.


                  --- (1986b) Constraints on order. Report No. CSLI-86-46. Stanford

                        University.


                  --- (1986c) Categorial unification grammars. Report No. CSLI-86-

                        66. Stanford University.









                  34


                  --- (1987) Linear precedence in discontinuous constituents:

                        complex fronting in German. Huck and Ojeda 1987:405-425.


                  Walker, A., ed. (1987) Knowledge systems and Prolog. Reading,

                        Massachusetts: Addison-Wesley.


                  Wood, R. C. (1987) The language advantage: Japan's machine

                        translators rule the market. High Technology Business 7.11

                        (November), p. 17.


                  Woods, W. A. (1987) Grammar, augmented transition network.

                        Shapiro, S. C., ed., Encyclopedia of artificial intelligence

                        1:323-333. New York: Wiley.









                                                                                  35


                  Listing 1. Part of the lexicon, which ignores morphology and

                  simply lists every form of every word. This is a stand-in for the

                  morphological component that would be needed in a practical

                  system.



            word(phon%koshka %% cat%noun %% gloss%"'cat'"  %% case%nom %% num%sg %% gen%fem).

                 word(phon%koshku %% cat%noun %% gloss%"'cat'"  %% case%acc %% num%sg %% gen%fem).

                 word(phon%koshki %% cat%noun %% gloss%"'cats'" %% case%nom %% num%pl %% gen%fem).

                 word(phon%koshki %% cat%noun %% gloss%"'cats'" %% case%acc %% num%pl %% gen%fem).


                 word(phon%sobaka %% cat%noun %% gloss%"'dog'"  %% case%nom %% num%sg %% gen%fem).

                 word(phon%sobaku %% cat%noun %% gloss%"'dog'"  %% case%acc %% num%sg %% gen%fem).

                 word(phon%sobaki %% cat%noun %% gloss%"'dogs'" %% case%nom %% num%pl %% gen%fem).

                 word(phon%sobaki %% cat%noun %% gloss%"'dogs'" %% case%acc %% num%pl %% gen%fem).


                 word(phon%vidit %% cat%verb %% gloss%"'sees'" %% num%sg %% pers%3).

                 word(phon%vidut %% cat%verb %% gloss%"'see'"  %% num%pl %% pers%3).


                 word(phon%presleduet  %% cat%verb %% gloss%"'pursues'" %% num%sg %% pers%3).

                 word(phon%presleduyut %% cat%verb %% gloss%"'pursue'"  %% num%pl %% pers%3).


                 word(phon%cherez   %% cat%prep %% gloss%"'through'" %% case%acc).









                  36


                  Listing 2. D-rules used by the prototype Russian parser.


                   

            cat%verb %% pers%P %% num%N %% subj%S 

              << cat%noun %% case%nom %% pers%P %% num%N %% gr%subject %% id%S.

             

            cat%verb %% obj%Ob

              << cat%noun %% case%acc %% gr%direct_object %% id%Ob.

             

            cat%verb

              << cat%prep %% gr%modifier.

             

            cat%noun %% case%C %% num%N %% gen%G

              <<  cat%adj %% case%C %% num%N %% gen%G %% gr%modifier.

             

            cat%prep %% case%C %% obj%Ob %% posn%1

              <<  cat%noun %% case%C %% gr%object_of_preposition %% id%Ob.









                                                                                  37


                  Listing  3. Output  of  a typical  parsing  run. The  sentence  is

                  ambiguous  as  to whether  belye modifies  sobaki or  koshki; both

                  parses are found.



                  [vidut,sobaki,belye,koshki,v,chornom,lesu]


                  Parsed structure:

                  vidut 'see'

                    sobaki 'dogs' subject

                      belye 'white' modifier

                    koshki 'cats' direct_object

                    v 'in' modifier

                      lesu 'forest' object_of_preposition

                        chornom 'black' modifier


                  Parsed structure:

                  vidut 'see'

                    sobaki 'dogs' subject

                    koshki 'cats' direct_object

                      belye 'white' modifier

                    v 'in' modifier

                      lesu 'forest' object_of_preposition

                        chornom 'black' modifier









                  38


                  Table 1. Comparative speed of several Prolog implementations.



                                               Search-and         256-element

                                                       backtrack          list reversal



                          IBM PS/2-50

                            Arity Prolog 4.1           21.6 sec           47.1 sec

                              (interpreted)            1.5 kLIPS          0.7 kLIPS


                          IBM PS/2-50

                            ALS Prolog 1.2             1.2 sec            2.5 sec

                              (compiled)               27.5 kLIPS         13.2 kLIPS


                          Sun Sparcstation

                            Quintus Prolog             0.166 sec          0.133 sec

                              (compiled)               197 kLIPS          130 kLIPS


                          IBM 3090-400

                            VM/Prolog                  0.066 sec          0.297 sec

                              (interpreted)            496 kLIPS          112 kLIPS




                  Benchmark programs are from Covington and Vellino (1986). 

                  kLIPS = thousand logical inferences per second.