Add neural-symbolic parsing with Prolog validation
Browse filesIntegrate Prolog-based grammatical validation layer on top of Stanza
neural parser for enhanced error detection and grammatical analysis.
Changes:
1. Added Prolog files:
- coptic_grammar.pl (13K) - DCG grammar rules
- coptic_lexicon.pl (486K) - Coptic lexicon
- coptic_prolog_rules.py (28K) - Python-Prolog interface
2. Updated Dockerfile:
- Install SWI-Prolog (swi-prolog package)
- Required for symbolic grammatical validation
3. Updated requirements.txt:
- Added pyswip>=0.2.10 for Python-Prolog integration
4. Extended coptic_parser_core.py:
- Added _init_prolog() to initialize Prolog engine
- Added _validate_with_prolog() for grammatical validation
- Modified parse_text() to include optional Prolog validation
- Returns validation results (patterns detected, warnings, errors)
Neural-Symbolic Architecture:
- Neural layer (Stanza): Dependency parsing, POS tagging, lemmatization
- Symbolic layer (Prolog): Grammatical rule validation, error detection
- Hybrid output: Parse tree + validation feedback
This creates a scholarly tool combining statistical and rule-based approaches,
ideal for Coptic language research and education.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Dockerfile +5 -0
- coptic_grammar.pl +400 -0
- coptic_lexicon.pl +0 -0
- coptic_parser_core.py +80 -6
- coptic_prolog_rules.py +671 -0
- requirements.txt +1 -0
|
@@ -1,5 +1,10 @@
|
|
| 1 |
FROM python:3.9
|
| 2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
WORKDIR /code
|
| 4 |
|
| 5 |
COPY requirements.txt .
|
|
|
|
| 1 |
FROM python:3.9
|
| 2 |
|
| 3 |
+
# Install SWI-Prolog for neural-symbolic parsing
|
| 4 |
+
RUN apt-get update && apt-get install -y \
|
| 5 |
+
swi-prolog \
|
| 6 |
+
&& rm -rf /var/lib/apt/lists/*
|
| 7 |
+
|
| 8 |
WORKDIR /code
|
| 9 |
|
| 10 |
COPY requirements.txt .
|
|
@@ -0,0 +1,400 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
%******************************************************************************
|
| 2 |
+
% COPTIC_DEPENDENCY_RULES.PL - Prolog Dependency Grammar for Coptic
|
| 3 |
+
%******************************************************************************
|
| 4 |
+
%
|
| 5 |
+
% This module demonstrates the adaptation from DCG (DETECT5.PRO style)
|
| 6 |
+
% to modern dependency grammar formalism.
|
| 7 |
+
%
|
| 8 |
+
% PARADIGM SHIFT:
|
| 9 |
+
% DCG: sentence --> NP, VP. (hierarchical constituents)
|
| 10 |
+
% Dependency: dep(verb, subject, nsubj). (head-dependent relations)
|
| 11 |
+
%
|
| 12 |
+
% Based on Universal Dependencies annotation scheme adapted for Coptic
|
| 13 |
+
% linguistic patterns (VSO word order, tripartite sentences, etc.)
|
| 14 |
+
%
|
| 15 |
+
% Author: Adapted from DETECT5.PRO (André Linden, 1989-91)
|
| 16 |
+
% Date: 2025
|
| 17 |
+
%
|
| 18 |
+
%******************************************************************************
|
| 19 |
+
|
| 20 |
+
:- module(coptic_dependency_rules, [
|
| 21 |
+
dependency_pattern/3,
|
| 22 |
+
validate_dependency/4,
|
| 23 |
+
suggest_parse/3,
|
| 24 |
+
apply_dependency_rules/3
|
| 25 |
+
]).
|
| 26 |
+
|
| 27 |
+
:- ensure_loaded(coptic_lexicon).
|
| 28 |
+
|
| 29 |
+
%******************************************************************************
|
| 30 |
+
% CORE DEPENDENCY PATTERNS
|
| 31 |
+
%******************************************************************************
|
| 32 |
+
|
| 33 |
+
% Pattern 1: VSO Transitive Sentence
|
| 34 |
+
% Example: ⲥⲱⲧⲙ ⲡⲣⲱⲙⲉ ⲡϣⲁϫⲉ (hear the-man the-word = "The man hears the word")
|
| 35 |
+
%
|
| 36 |
+
% Dependency structure:
|
| 37 |
+
% ⲥⲱⲧⲙ (VERB, root)
|
| 38 |
+
% ├── ⲡⲣⲱⲙⲉ (NOUN, nsubj)
|
| 39 |
+
% └── ⲡϣⲁϫⲉ (NOUN, obj)
|
| 40 |
+
%
|
| 41 |
+
dependency_pattern(vso_transitive,
|
| 42 |
+
Words,
|
| 43 |
+
[dep(Subj, SubjPOS, SIdx, Verb, VIdx, nsubj),
|
| 44 |
+
dep(Obj, ObjPOS, OIdx, Verb, VIdx, obj)]) :-
|
| 45 |
+
% Verb at position VIdx
|
| 46 |
+
nth1(VIdx, Words, word(Verb, VerbPOS, _)),
|
| 47 |
+
member(VerbPOS, ['VERB', 'AUX']),
|
| 48 |
+
|
| 49 |
+
% Subject at position SIdx
|
| 50 |
+
nth1(SIdx, Words, word(Subj, SubjPOS, _)),
|
| 51 |
+
member(SubjPOS, ['NOUN', 'PRON', 'PROPN']),
|
| 52 |
+
|
| 53 |
+
% Object at position OIdx
|
| 54 |
+
nth1(OIdx, Words, word(Obj, ObjPOS, _)),
|
| 55 |
+
member(ObjPOS, ['NOUN', 'PRON', 'PROPN']),
|
| 56 |
+
|
| 57 |
+
% VSO word order constraint (crucial for Coptic!)
|
| 58 |
+
VIdx < SIdx,
|
| 59 |
+
SIdx < OIdx,
|
| 60 |
+
|
| 61 |
+
% Verify verb is transitive
|
| 62 |
+
is_transitive(Verb).
|
| 63 |
+
|
| 64 |
+
% Pattern 2: VS Intransitive Sentence
|
| 65 |
+
% Example: ⲃⲱⲕ ⲡⲣⲱⲙⲉ (go the-man = "The man goes")
|
| 66 |
+
%
|
| 67 |
+
dependency_pattern(vs_intransitive,
|
| 68 |
+
Words,
|
| 69 |
+
[dep(Subj, SubjPOS, SIdx, Verb, VIdx, nsubj)]) :-
|
| 70 |
+
% Verb
|
| 71 |
+
nth1(VIdx, Words, word(Verb, VerbPOS, _)),
|
| 72 |
+
member(VerbPOS, ['VERB', 'AUX']),
|
| 73 |
+
|
| 74 |
+
% Subject
|
| 75 |
+
nth1(SIdx, Words, word(Subj, SubjPOS, _)),
|
| 76 |
+
member(SubjPOS, ['NOUN', 'PRON', 'PROPN']),
|
| 77 |
+
|
| 78 |
+
% VS word order
|
| 79 |
+
VIdx < SIdx,
|
| 80 |
+
|
| 81 |
+
% Verify verb is intransitive
|
| 82 |
+
is_intransitive(Verb).
|
| 83 |
+
|
| 84 |
+
% Pattern 3: Tripartite Nominal Sentence
|
| 85 |
+
% Example: ⲁⲛⲟⲕ ⲡⲉ ⲡⲛⲟⲩⲧⲉ (I am the-god = "I am God")
|
| 86 |
+
%
|
| 87 |
+
% Structure: Subject + Copula + Predicate
|
| 88 |
+
% In UD: Predicate is head, Subject and Copula depend on it
|
| 89 |
+
%
|
| 90 |
+
% ⲡⲛⲟⲩⲧⲉ (NOUN, root)
|
| 91 |
+
% ├── ⲁⲛⲟⲕ (PRON, nsubj)
|
| 92 |
+
% └── ⲡⲉ (AUX, cop)
|
| 93 |
+
%
|
| 94 |
+
dependency_pattern(tripartite,
|
| 95 |
+
Words,
|
| 96 |
+
[dep(Subj, SubjPOS, SIdx, Pred, PIdx, nsubj),
|
| 97 |
+
dep(Cop, 'AUX', CIdx, Pred, PIdx, cop)]) :-
|
| 98 |
+
% Subject (first position, typically)
|
| 99 |
+
nth1(SIdx, Words, word(Subj, SubjPOS, _)),
|
| 100 |
+
member(SubjPOS, ['NOUN', 'PRON', 'PROPN']),
|
| 101 |
+
|
| 102 |
+
% Copula (ⲡⲉ, ⲧⲉ, ⲛⲉ)
|
| 103 |
+
nth1(CIdx, Words, word(Cop, 'AUX', _)),
|
| 104 |
+
member(Cop, ['ⲡⲉ', 'ⲧⲉ', 'ⲛⲉ']),
|
| 105 |
+
|
| 106 |
+
% Predicate (nominal or adjectival)
|
| 107 |
+
nth1(PIdx, Words, word(Pred, PredPOS, _)),
|
| 108 |
+
member(PredPOS, ['NOUN', 'ADJ', 'PROPN']),
|
| 109 |
+
|
| 110 |
+
% Typical order: S - Cop - Pred (but can vary)
|
| 111 |
+
SIdx < PIdx,
|
| 112 |
+
|
| 113 |
+
% Gender/number agreement between copula and predicate
|
| 114 |
+
copula_agrees_with_predicate(Cop, Pred).
|
| 115 |
+
|
| 116 |
+
% Pattern 4: Converted Tripartite (Predicate-Subject-Copula)
|
| 117 |
+
% Example: ⲡⲛⲟⲩⲧⲉ ⲁⲛⲟⲕ ⲡⲉ (God I am = "I am God" - emphatic)
|
| 118 |
+
%
|
| 119 |
+
dependency_pattern(tripartite_converted,
|
| 120 |
+
Words,
|
| 121 |
+
[dep(Subj, SubjPOS, SIdx, Pred, PIdx, nsubj),
|
| 122 |
+
dep(Cop, 'AUX', CIdx, Pred, PIdx, cop)]) :-
|
| 123 |
+
nth1(PIdx, Words, word(Pred, PredPOS, _)),
|
| 124 |
+
member(PredPOS, ['NOUN', 'ADJ', 'PROPN']),
|
| 125 |
+
|
| 126 |
+
nth1(SIdx, Words, word(Subj, SubjPOS, _)),
|
| 127 |
+
member(SubjPOS, ['NOUN', 'PRON', 'PROPN']),
|
| 128 |
+
|
| 129 |
+
nth1(CIdx, Words, word(Cop, 'AUX', _)),
|
| 130 |
+
member(Cop, ['ⲡⲉ', 'ⲧⲉ', 'ⲛⲉ']),
|
| 131 |
+
|
| 132 |
+
% Converted order: Pred before Subj
|
| 133 |
+
PIdx < SIdx,
|
| 134 |
+
|
| 135 |
+
copula_agrees_with_predicate(Cop, Pred).
|
| 136 |
+
|
| 137 |
+
% Pattern 5: Determiner + Noun
|
| 138 |
+
% Example: ⲡⲣⲱⲙⲉ (the-man)
|
| 139 |
+
%
|
| 140 |
+
% In Coptic, articles often attach as prefixes, but in tokenized form:
|
| 141 |
+
% ⲡⲣⲱⲙⲉ
|
| 142 |
+
% ├── ⲡ (DET, det)
|
| 143 |
+
%
|
| 144 |
+
dependency_pattern(determiner_noun,
|
| 145 |
+
Words,
|
| 146 |
+
[dep(Det, 'DET', DIdx, Noun, NIdx, det)]) :-
|
| 147 |
+
nth1(DIdx, Words, word(Det, 'DET', _)),
|
| 148 |
+
nth1(NIdx, Words, word(Noun, 'NOUN', _)),
|
| 149 |
+
|
| 150 |
+
% Determiner precedes noun in Coptic
|
| 151 |
+
DIdx < NIdx,
|
| 152 |
+
|
| 153 |
+
% Adjacent or nearly adjacent
|
| 154 |
+
NIdx - DIdx =< 2,
|
| 155 |
+
|
| 156 |
+
% Gender agreement
|
| 157 |
+
determiner_gender_agrees(Det, Noun).
|
| 158 |
+
|
| 159 |
+
% Pattern 6: Adjective Modification
|
| 160 |
+
% Example: ⲡⲣⲱⲙⲉ ⲛⲁⲛⲟⲩϥ (the-man good = "the good man")
|
| 161 |
+
%
|
| 162 |
+
% In Coptic, adjectives typically follow nouns
|
| 163 |
+
% ⲣⲱⲙⲉ (NOUN)
|
| 164 |
+
% └── ⲛⲁⲛⲟⲩϥ (ADJ, amod)
|
| 165 |
+
%
|
| 166 |
+
dependency_pattern(noun_adjective,
|
| 167 |
+
Words,
|
| 168 |
+
[dep(Adj, 'ADJ', AIdx, Noun, NIdx, amod)]) :-
|
| 169 |
+
nth1(NIdx, Words, word(Noun, 'NOUN', _)),
|
| 170 |
+
nth1(AIdx, Words, word(Adj, 'ADJ', _)),
|
| 171 |
+
|
| 172 |
+
% Coptic: Adjective follows noun (typically)
|
| 173 |
+
NIdx < AIdx,
|
| 174 |
+
|
| 175 |
+
% Should be adjacent or nearly so
|
| 176 |
+
AIdx - NIdx =< 2,
|
| 177 |
+
|
| 178 |
+
% Gender/number agreement
|
| 179 |
+
adjective_agrees(Adj, Noun).
|
| 180 |
+
|
| 181 |
+
% Pattern 7: Prepositional Phrase
|
| 182 |
+
% Example: ϩⲛ ⲧⲡⲟⲗⲓⲥ (in the-city)
|
| 183 |
+
%
|
| 184 |
+
% ⲧⲡⲟⲗⲓⲥ (NOUN, head in larger structure)
|
| 185 |
+
% ├── ϩⲛ (ADP, case)
|
| 186 |
+
%
|
| 187 |
+
dependency_pattern(prepositional_phrase,
|
| 188 |
+
Words,
|
| 189 |
+
[dep(Prep, 'ADP', PIdx, Noun, NIdx, case)]) :-
|
| 190 |
+
nth1(PIdx, Words, word(Prep, 'ADP', _)),
|
| 191 |
+
nth1(NIdx, Words, word(Noun, NounPOS, _)),
|
| 192 |
+
member(NounPOS, ['NOUN', 'PRON', 'PROPN']),
|
| 193 |
+
|
| 194 |
+
% Preposition before noun
|
| 195 |
+
PIdx < NIdx,
|
| 196 |
+
|
| 197 |
+
% Adjacent
|
| 198 |
+
NIdx - PIdx =< 2.
|
| 199 |
+
|
| 200 |
+
% Pattern 8: Conjunction
|
| 201 |
+
% Example: ⲡⲣⲱⲙⲉ ⲙⲛ ⲧⲉϣⲓⲙⲉ (the-man and the-woman)
|
| 202 |
+
%
|
| 203 |
+
dependency_pattern(coordination,
|
| 204 |
+
Words,
|
| 205 |
+
[dep(Conj, 'CCONJ', CIdx, Head, HIdx, cc),
|
| 206 |
+
dep(Coord2, Coord2POS, C2Idx, Head, HIdx, conj)]) :-
|
| 207 |
+
nth1(HIdx, Words, word(Head, HeadPOS, _)),
|
| 208 |
+
member(HeadPOS, ['NOUN', 'VERB', 'ADJ']),
|
| 209 |
+
|
| 210 |
+
nth1(CIdx, Words, word(Conj, 'CCONJ', _)),
|
| 211 |
+
|
| 212 |
+
nth1(C2Idx, Words, word(Coord2, Coord2POS, _)),
|
| 213 |
+
Coord2POS = HeadPOS, % Same POS as head
|
| 214 |
+
|
| 215 |
+
% Order: Head < Conj < Coord2
|
| 216 |
+
HIdx < CIdx,
|
| 217 |
+
CIdx < C2Idx.
|
| 218 |
+
|
| 219 |
+
%******************************************************************************
|
| 220 |
+
% CONSTRAINT CHECKING
|
| 221 |
+
%******************************************************************************
|
| 222 |
+
|
| 223 |
+
% Check if verb is transitive (requires object)
|
| 224 |
+
is_transitive(Verb) :-
|
| 225 |
+
coptic_verb(Verb, Features),
|
| 226 |
+
member(transitive, Features), !.
|
| 227 |
+
is_transitive(_). % Default: assume transitive if unknown
|
| 228 |
+
|
| 229 |
+
% Check if verb is intransitive (no object)
|
| 230 |
+
is_intransitive(Verb) :-
|
| 231 |
+
coptic_verb(Verb, Features),
|
| 232 |
+
member(intransitive, Features), !.
|
| 233 |
+
is_intransitive(_). % Default: allow intransitive
|
| 234 |
+
|
| 235 |
+
% Copula-predicate agreement
|
| 236 |
+
copula_agrees_with_predicate(Cop, Pred) :-
|
| 237 |
+
coptic_noun(Pred, Gender, Number), !,
|
| 238 |
+
copula_form(Cop, Gender, Number).
|
| 239 |
+
copula_agrees_with_predicate(_, _). % Allow if not in lexicon
|
| 240 |
+
|
| 241 |
+
copula_form('ⲡⲉ', masc, sing).
|
| 242 |
+
copula_form('ⲧⲉ', fem, sing).
|
| 243 |
+
copula_form('ⲛⲉ', _, plur).
|
| 244 |
+
copula_form('ⲛⲉ', masc, plur).
|
| 245 |
+
copula_form('ⲛⲉ', fem, plur).
|
| 246 |
+
|
| 247 |
+
% Determiner-noun gender agreement
|
| 248 |
+
determiner_gender_agrees(Det, Noun) :-
|
| 249 |
+
coptic_noun(Noun, Gender, Number), !,
|
| 250 |
+
determiner_form(Det, Gender, Number).
|
| 251 |
+
determiner_gender_agrees(_, _). % Allow if not in lexicon
|
| 252 |
+
|
| 253 |
+
determiner_form('ⲡ', masc, sing).
|
| 254 |
+
determiner_form('ⲧ', fem, sing).
|
| 255 |
+
determiner_form('ⲛ', _, plur).
|
| 256 |
+
determiner_form('ⲟⲩ', _, _). % Indefinite: any gender/number
|
| 257 |
+
|
| 258 |
+
% Adjective-noun agreement
|
| 259 |
+
adjective_agrees(Adj, Noun) :-
|
| 260 |
+
coptic_noun(Noun, Gender, Number),
|
| 261 |
+
coptic_adjective(Adj, Gender, Number), !.
|
| 262 |
+
adjective_agrees(_, _). % Allow if not in lexicon
|
| 263 |
+
|
| 264 |
+
%******************************************************************************
|
| 265 |
+
% VALIDATION AND ERROR DETECTION
|
| 266 |
+
%******************************************************************************
|
| 267 |
+
|
| 268 |
+
% validate_dependency(+Token, +Head, +Relation, +Words)
|
| 269 |
+
% Check if a proposed dependency is valid according to Coptic grammar
|
| 270 |
+
validate_dependency(Token, Head, Relation, Words) :-
|
| 271 |
+
% Find positions
|
| 272 |
+
nth1(TokenIdx, Words, word(Token, TokenPOS, _)),
|
| 273 |
+
nth1(HeadIdx, Words, word(Head, HeadPOS, _)),
|
| 274 |
+
|
| 275 |
+
% Check if relation is valid for this POS pair
|
| 276 |
+
valid_relation(TokenPOS, HeadPOS, Relation),
|
| 277 |
+
|
| 278 |
+
% Check linguistic constraints
|
| 279 |
+
check_constraints(Token, TokenPOS, TokenIdx, Head, HeadPOS, HeadIdx, Relation, Words).
|
| 280 |
+
|
| 281 |
+
% Valid dependency relations (simplified from UD)
|
| 282 |
+
valid_relation('NOUN', 'VERB', nsubj).
|
| 283 |
+
valid_relation('PRON', 'VERB', nsubj).
|
| 284 |
+
valid_relation('PROPN', 'VERB', nsubj).
|
| 285 |
+
valid_relation('NOUN', 'VERB', obj).
|
| 286 |
+
valid_relation('PRON', 'VERB', obj).
|
| 287 |
+
valid_relation('NOUN', 'NOUN', nmod).
|
| 288 |
+
valid_relation('ADJ', 'NOUN', amod).
|
| 289 |
+
valid_relation('DET', 'NOUN', det).
|
| 290 |
+
valid_relation('ADP', 'NOUN', case).
|
| 291 |
+
valid_relation('ADP', 'PRON', case).
|
| 292 |
+
valid_relation('AUX', 'NOUN', cop).
|
| 293 |
+
valid_relation('AUX', 'ADJ', cop).
|
| 294 |
+
valid_relation('CCONJ', 'NOUN', cc).
|
| 295 |
+
valid_relation('CCONJ', 'VERB', cc).
|
| 296 |
+
valid_relation(_, _, root). % Root can be anything
|
| 297 |
+
|
| 298 |
+
% Constraint checking
|
| 299 |
+
check_constraints(_Token, _TokenPOS, TokenIdx, _Head, HeadPOS, HeadIdx, Relation, _Words) :-
|
| 300 |
+
% Word order constraints
|
| 301 |
+
( Relation = nsubj,
|
| 302 |
+
member(HeadPOS, ['VERB', 'AUX'])
|
| 303 |
+
-> % In VSO, subject follows verb
|
| 304 |
+
TokenIdx > HeadIdx
|
| 305 |
+
; true
|
| 306 |
+
),
|
| 307 |
+
|
| 308 |
+
( Relation = obj,
|
| 309 |
+
HeadPOS = 'VERB'
|
| 310 |
+
-> % Object follows subject in VSO
|
| 311 |
+
TokenIdx > HeadIdx
|
| 312 |
+
; true
|
| 313 |
+
),
|
| 314 |
+
|
| 315 |
+
( Relation = det
|
| 316 |
+
-> % Determiner precedes noun
|
| 317 |
+
TokenIdx < HeadIdx
|
| 318 |
+
; true
|
| 319 |
+
),
|
| 320 |
+
|
| 321 |
+
( Relation = amod
|
| 322 |
+
-> % Adjective typically follows noun in Coptic
|
| 323 |
+
TokenIdx > HeadIdx
|
| 324 |
+
; true
|
| 325 |
+
).
|
| 326 |
+
|
| 327 |
+
%******************************************************************************
|
| 328 |
+
% PARSING WITH DEPENDENCY RULES
|
| 329 |
+
%******************************************************************************
|
| 330 |
+
|
| 331 |
+
% suggest_parse(+Words, +POSTags, -Dependencies)
|
| 332 |
+
% Use dependency rules to suggest a parse
|
| 333 |
+
suggest_parse(Words, POSTags, Dependencies) :-
|
| 334 |
+
% Build word structures
|
| 335 |
+
length(Words, N),
|
| 336 |
+
build_word_list(Words, POSTags, 1, N, WordList),
|
| 337 |
+
|
| 338 |
+
% Try to match patterns
|
| 339 |
+
findall(Deps, dependency_pattern(_, WordList, Deps), AllDeps),
|
| 340 |
+
|
| 341 |
+
% Combine non-overlapping dependencies
|
| 342 |
+
flatten(AllDeps, FlatDeps),
|
| 343 |
+
sort(FlatDeps, Dependencies).
|
| 344 |
+
|
| 345 |
+
build_word_list([], [], _, _, []).
|
| 346 |
+
build_word_list([W|Ws], [P|Ps], Idx, N, [word(W, P, Idx)|Rest]) :-
|
| 347 |
+
NextIdx is Idx + 1,
|
| 348 |
+
build_word_list(Ws, Ps, NextIdx, N, Rest).
|
| 349 |
+
|
| 350 |
+
% apply_dependency_rules(+Tokens, +POSTags, -ParseTree)
|
| 351 |
+
% Full parsing using dependency rules
|
| 352 |
+
apply_dependency_rules(Tokens, POSTags, ParseTree) :-
|
| 353 |
+
suggest_parse(Tokens, POSTags, Dependencies),
|
| 354 |
+
|
| 355 |
+
% Find root
|
| 356 |
+
( select(dep(Root, RootPOS, RootIdx, _, 0, root), Dependencies, OtherDeps)
|
| 357 |
+
-> true
|
| 358 |
+
; % No root found - pick first verb or noun
|
| 359 |
+
nth1(RootIdx, POSTags, RootPOS),
|
| 360 |
+
member(RootPOS, ['VERB', 'NOUN', 'AUX']),
|
| 361 |
+
nth1(RootIdx, Tokens, Root),
|
| 362 |
+
OtherDeps = Dependencies
|
| 363 |
+
),
|
| 364 |
+
|
| 365 |
+
ParseTree = dep_tree{
|
| 366 |
+
root: Root,
|
| 367 |
+
root_pos: RootPOS,
|
| 368 |
+
root_index: RootIdx,
|
| 369 |
+
dependencies: OtherDeps,
|
| 370 |
+
parser: 'Dependency Rules'
|
| 371 |
+
}.
|
| 372 |
+
|
| 373 |
+
%******************************************************************************
|
| 374 |
+
% COMPARISON: DCG vs DEPENDENCY
|
| 375 |
+
%******************************************************************************
|
| 376 |
+
|
| 377 |
+
% EXAMPLE: How DETECT5.PRO might have encoded a rule
|
| 378 |
+
%
|
| 379 |
+
% DCG Style (old):
|
| 380 |
+
% sentence --> verb_phrase.
|
| 381 |
+
% verb_phrase --> verb(V, trans), noun_phrase(Subj), noun_phrase(Obj),
|
| 382 |
+
% {vso_order(V, Subj, Obj)}.
|
| 383 |
+
% noun_phrase --> determiner(D), noun(N), {gender_agrees(D, N)}.
|
| 384 |
+
%
|
| 385 |
+
% Dependency Style (new):
|
| 386 |
+
% dependency_pattern(vso,
|
| 387 |
+
% [verb(V, VIdx), noun(S, SIdx), noun(O, OIdx)],
|
| 388 |
+
% [dep(S, SIdx, V, VIdx, nsubj),
|
| 389 |
+
% dep(O, OIdx, V, VIdx, obj)]) :-
|
| 390 |
+
% VIdx < SIdx, SIdx < OIdx.
|
| 391 |
+
%
|
| 392 |
+
% KEY DIFFERENCES:
|
| 393 |
+
% 1. DCG builds hierarchical structure (VP contains NPs)
|
| 394 |
+
% 2. Dependency expresses direct relations (verb governs subject)
|
| 395 |
+
% 3. Dependency is more flexible for free word order
|
| 396 |
+
% 4. Dependency better matches modern neural parser output
|
| 397 |
+
|
| 398 |
+
%******************************************************************************
|
| 399 |
+
% END OF MODULE
|
| 400 |
+
%******************************************************************************
|
|
The diff for this file is too large to render.
See raw diff
|
|
|
|
@@ -2,6 +2,9 @@
|
|
| 2 |
"""
|
| 3 |
Coptic Dependency Parser - Core Module (Web-Compatible)
|
| 4 |
|
|
|
|
|
|
|
|
|
|
| 5 |
Extracted from coptic-parser.py for integration with web interfaces.
|
| 6 |
Author: André Linden (2025)
|
| 7 |
License: CC BY-NC-SA 4.0
|
|
@@ -12,11 +15,25 @@ import warnings
|
|
| 12 |
warnings.filterwarnings('ignore')
|
| 13 |
|
| 14 |
class CopticParserCore:
|
| 15 |
-
"""Lightweight Coptic parser for web applications"""
|
| 16 |
|
| 17 |
def __init__(self):
|
| 18 |
self.nlp = None
|
| 19 |
self.diaparser = None
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
def load_parser(self):
|
| 22 |
"""Initialize Stanza parser with Coptic models"""
|
|
@@ -33,7 +50,7 @@ class CopticParserCore:
|
|
| 33 |
download_method=None,
|
| 34 |
verbose=False
|
| 35 |
)
|
| 36 |
-
print("✓ Coptic parser loaded successfully")
|
| 37 |
|
| 38 |
except Exception as e:
|
| 39 |
# If models not found, download them
|
|
@@ -58,12 +75,13 @@ class CopticParserCore:
|
|
| 58 |
print(f"❌ Failed to load parser: {e}")
|
| 59 |
raise
|
| 60 |
|
| 61 |
-
def parse_text(self, text):
|
| 62 |
"""
|
| 63 |
-
Parse Coptic text and return structured results
|
| 64 |
|
| 65 |
Args:
|
| 66 |
text: Coptic text to parse
|
|
|
|
| 67 |
|
| 68 |
Returns:
|
| 69 |
dict with:
|
|
@@ -71,6 +89,7 @@ class CopticParserCore:
|
|
| 71 |
- total_sentences: int
|
| 72 |
- total_tokens: int
|
| 73 |
- text: original text
|
|
|
|
| 74 |
"""
|
| 75 |
if not text or not text.strip():
|
| 76 |
return None
|
|
@@ -78,7 +97,7 @@ class CopticParserCore:
|
|
| 78 |
# Ensure parser is loaded
|
| 79 |
self.load_parser()
|
| 80 |
|
| 81 |
-
# Parse with Stanza
|
| 82 |
doc = self.nlp(text)
|
| 83 |
|
| 84 |
if not doc.sentences:
|
|
@@ -112,13 +131,68 @@ class CopticParserCore:
|
|
| 112 |
'words': words_data
|
| 113 |
})
|
| 114 |
|
| 115 |
-
|
| 116 |
'sentences': sentences,
|
| 117 |
'total_sentences': len(sentences),
|
| 118 |
'total_tokens': total_tokens,
|
| 119 |
'text': text
|
| 120 |
}
|
| 121 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 122 |
def format_conllu(self, parse_result):
|
| 123 |
"""Format parse result as CoNLL-U"""
|
| 124 |
if not parse_result:
|
|
|
|
| 2 |
"""
|
| 3 |
Coptic Dependency Parser - Core Module (Web-Compatible)
|
| 4 |
|
| 5 |
+
Neural-Symbolic Hybrid Parser combining Stanza (neural) with Prolog (symbolic)
|
| 6 |
+
for enhanced grammatical validation and error detection.
|
| 7 |
+
|
| 8 |
Extracted from coptic-parser.py for integration with web interfaces.
|
| 9 |
Author: André Linden (2025)
|
| 10 |
License: CC BY-NC-SA 4.0
|
|
|
|
| 15 |
warnings.filterwarnings('ignore')
|
| 16 |
|
| 17 |
class CopticParserCore:
|
| 18 |
+
"""Lightweight neural-symbolic Coptic parser for web applications"""
|
| 19 |
|
| 20 |
def __init__(self):
|
| 21 |
self.nlp = None
|
| 22 |
self.diaparser = None
|
| 23 |
+
self.prolog = None # Prolog engine for grammatical validation
|
| 24 |
+
self._init_prolog()
|
| 25 |
+
|
| 26 |
+
def _init_prolog(self):
|
| 27 |
+
"""Initialize Prolog engine for grammatical validation (optional)"""
|
| 28 |
+
try:
|
| 29 |
+
from coptic_prolog_rules import create_prolog_engine
|
| 30 |
+
self.prolog = create_prolog_engine()
|
| 31 |
+
if self.prolog and self.prolog.prolog_initialized:
|
| 32 |
+
print("✓ Prolog engine initialized successfully")
|
| 33 |
+
except Exception as e:
|
| 34 |
+
print(f"ℹ Prolog validation not available: {e}")
|
| 35 |
+
print(" Parser will continue with neural-only mode")
|
| 36 |
+
self.prolog = None
|
| 37 |
|
| 38 |
def load_parser(self):
|
| 39 |
"""Initialize Stanza parser with Coptic models"""
|
|
|
|
| 50 |
download_method=None,
|
| 51 |
verbose=False
|
| 52 |
)
|
| 53 |
+
print("✓ Coptic neural parser loaded successfully")
|
| 54 |
|
| 55 |
except Exception as e:
|
| 56 |
# If models not found, download them
|
|
|
|
| 75 |
print(f"❌ Failed to load parser: {e}")
|
| 76 |
raise
|
| 77 |
|
| 78 |
+
def parse_text(self, text, include_prolog_validation=True):
|
| 79 |
"""
|
| 80 |
+
Parse Coptic text and return structured results with optional Prolog validation
|
| 81 |
|
| 82 |
Args:
|
| 83 |
text: Coptic text to parse
|
| 84 |
+
include_prolog_validation: Whether to run Prolog grammatical validation (default: True)
|
| 85 |
|
| 86 |
Returns:
|
| 87 |
dict with:
|
|
|
|
| 89 |
- total_sentences: int
|
| 90 |
- total_tokens: int
|
| 91 |
- text: original text
|
| 92 |
+
- prolog_validation: dict with validation results (if enabled and available)
|
| 93 |
"""
|
| 94 |
if not text or not text.strip():
|
| 95 |
return None
|
|
|
|
| 97 |
# Ensure parser is loaded
|
| 98 |
self.load_parser()
|
| 99 |
|
| 100 |
+
# Parse with Stanza (neural)
|
| 101 |
doc = self.nlp(text)
|
| 102 |
|
| 103 |
if not doc.sentences:
|
|
|
|
| 131 |
'words': words_data
|
| 132 |
})
|
| 133 |
|
| 134 |
+
result = {
|
| 135 |
'sentences': sentences,
|
| 136 |
'total_sentences': len(sentences),
|
| 137 |
'total_tokens': total_tokens,
|
| 138 |
'text': text
|
| 139 |
}
|
| 140 |
|
| 141 |
+
# Add Prolog validation (symbolic) if available and requested
|
| 142 |
+
if include_prolog_validation and self.prolog and hasattr(self.prolog, 'prolog_initialized') and self.prolog.prolog_initialized:
|
| 143 |
+
try:
|
| 144 |
+
validation = self._validate_with_prolog(sentences)
|
| 145 |
+
result['prolog_validation'] = validation
|
| 146 |
+
except Exception as e:
|
| 147 |
+
print(f"ℹ Prolog validation skipped: {e}")
|
| 148 |
+
result['prolog_validation'] = None
|
| 149 |
+
|
| 150 |
+
return result
|
| 151 |
+
|
| 152 |
+
def _validate_with_prolog(self, sentences):
|
| 153 |
+
"""
|
| 154 |
+
Validate parsed sentences using Prolog grammatical rules
|
| 155 |
+
|
| 156 |
+
Args:
|
| 157 |
+
sentences: List of parsed sentence data
|
| 158 |
+
|
| 159 |
+
Returns:
|
| 160 |
+
dict with validation results including patterns detected and warnings
|
| 161 |
+
"""
|
| 162 |
+
if not self.prolog:
|
| 163 |
+
return None
|
| 164 |
+
|
| 165 |
+
validation_results = {
|
| 166 |
+
'patterns_detected': [],
|
| 167 |
+
'warnings': [],
|
| 168 |
+
'has_errors': False
|
| 169 |
+
}
|
| 170 |
+
|
| 171 |
+
for sentence in sentences:
|
| 172 |
+
# Extract tokens, POS tags, heads, and dependency relations
|
| 173 |
+
tokens = [word['form'] for word in sentence['words']]
|
| 174 |
+
pos_tags = [word['upos'] for word in sentence['words']]
|
| 175 |
+
heads = [word['head'] for word in sentence['words']]
|
| 176 |
+
deprels = [word['deprel'] for word in sentence['words']]
|
| 177 |
+
|
| 178 |
+
# Validate with Prolog
|
| 179 |
+
try:
|
| 180 |
+
sent_validation = self.prolog.validate_parse_tree(tokens, pos_tags, heads, deprels)
|
| 181 |
+
|
| 182 |
+
if sent_validation:
|
| 183 |
+
# Merge results
|
| 184 |
+
if sent_validation.get('patterns'):
|
| 185 |
+
validation_results['patterns_detected'].extend(sent_validation['patterns'])
|
| 186 |
+
|
| 187 |
+
if sent_validation.get('warnings'):
|
| 188 |
+
validation_results['warnings'].extend(sent_validation['warnings'])
|
| 189 |
+
validation_results['has_errors'] = True
|
| 190 |
+
|
| 191 |
+
except Exception as e:
|
| 192 |
+
print(f"ℹ Prolog validation error for sentence: {e}")
|
| 193 |
+
|
| 194 |
+
return validation_results
|
| 195 |
+
|
| 196 |
def format_conllu(self, parse_result):
|
| 197 |
"""Format parse result as CoNLL-U"""
|
| 198 |
if not parse_result:
|
|
@@ -0,0 +1,671 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Coptic Prolog Rules - Neural-Symbolic Integration
|
| 4 |
+
==================================================
|
| 5 |
+
|
| 6 |
+
Integrates Prolog logic programming with neural dependency parsing
|
| 7 |
+
to enhance parsing accuracy through explicit grammatical rules.
|
| 8 |
+
|
| 9 |
+
Uses janus (SWI-Prolog Python interface) for bidirectional integration.
|
| 10 |
+
|
| 11 |
+
Author: Coptic NLP Project
|
| 12 |
+
License: CC BY-NC-SA 4.0
|
| 13 |
+
"""
|
| 14 |
+
|
| 15 |
+
from pyswip import Prolog
|
| 16 |
+
import warnings
|
| 17 |
+
warnings.filterwarnings('ignore')
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
class CopticPrologRules:
|
| 21 |
+
"""
|
| 22 |
+
Prolog-based grammatical rule engine for Coptic parsing validation
|
| 23 |
+
and enhancement.
|
| 24 |
+
"""
|
| 25 |
+
|
| 26 |
+
def __init__(self):
|
| 27 |
+
"""Initialize Prolog engine and load Coptic grammar rules"""
|
| 28 |
+
self.prolog_initialized = False
|
| 29 |
+
self.prolog = None
|
| 30 |
+
self._initialize_prolog()
|
| 31 |
+
|
| 32 |
+
def _initialize_prolog(self):
|
| 33 |
+
"""Initialize SWI-Prolog and define Coptic grammatical rules"""
|
| 34 |
+
try:
|
| 35 |
+
# Initialize pyswip Prolog instance
|
| 36 |
+
self.prolog = Prolog()
|
| 37 |
+
|
| 38 |
+
# Define Coptic-specific grammatical rules
|
| 39 |
+
self._load_coptic_grammar()
|
| 40 |
+
|
| 41 |
+
self.prolog_initialized = True
|
| 42 |
+
print("✓ Prolog engine initialized successfully")
|
| 43 |
+
|
| 44 |
+
except Exception as e:
|
| 45 |
+
print(f"⚠️ Warning: Prolog initialization failed: {e}")
|
| 46 |
+
print(" Parser will continue without Prolog validation")
|
| 47 |
+
self.prolog_initialized = False
|
| 48 |
+
|
| 49 |
+
def _load_dcg_grammar(self):
|
| 50 |
+
"""
|
| 51 |
+
Load DCG-based grammar rules from coptic_grammar.pl
|
| 52 |
+
and Coptic lexicon from coptic_lexicon.pl
|
| 53 |
+
|
| 54 |
+
This adds more sophisticated pattern matching using Definite Clause Grammars,
|
| 55 |
+
adapted from the French DETECT5.PRO error detector.
|
| 56 |
+
"""
|
| 57 |
+
try:
|
| 58 |
+
from pathlib import Path
|
| 59 |
+
|
| 60 |
+
# Get path to DCG grammar file
|
| 61 |
+
# Note: The grammar file will load the lexicon automatically via ensure_loaded
|
| 62 |
+
current_dir = Path(__file__).parent
|
| 63 |
+
grammar_file = current_dir / "coptic_grammar.pl"
|
| 64 |
+
|
| 65 |
+
# Load grammar rules (which will load the lexicon)
|
| 66 |
+
if grammar_file.exists():
|
| 67 |
+
# Convert path to Prolog-compatible format
|
| 68 |
+
grammar_path = str(grammar_file.absolute()).replace('\\', '/')
|
| 69 |
+
|
| 70 |
+
# Load the module
|
| 71 |
+
query = f"consult('{grammar_path}')"
|
| 72 |
+
list(self.prolog.query(query))
|
| 73 |
+
|
| 74 |
+
print(f"✓ DCG grammar rules and lexicon loaded from {grammar_file.name}")
|
| 75 |
+
self.dcg_loaded = True
|
| 76 |
+
else:
|
| 77 |
+
print(f"ℹ DCG grammar file not found at {grammar_file}")
|
| 78 |
+
self.dcg_loaded = False
|
| 79 |
+
|
| 80 |
+
except Exception as e:
|
| 81 |
+
print(f"⚠️ Warning: Could not load DCG grammar: {e}")
|
| 82 |
+
self.dcg_loaded = False
|
| 83 |
+
|
| 84 |
+
def _load_coptic_grammar(self):
|
| 85 |
+
"""Load Coptic linguistic rules into Prolog"""
|
| 86 |
+
|
| 87 |
+
# Try to load DCG grammar file if it exists
|
| 88 |
+
self._load_dcg_grammar()
|
| 89 |
+
|
| 90 |
+
# ===================================================================
|
| 91 |
+
# COPTIC MORPHOLOGICAL RULES
|
| 92 |
+
# ===================================================================
|
| 93 |
+
|
| 94 |
+
# Article system: definite articles
|
| 95 |
+
self.prolog.assertz("definite_article('ⲡ')") # masculine singular
|
| 96 |
+
self.prolog.assertz("definite_article('ⲧ')") # feminine singular
|
| 97 |
+
self.prolog.assertz("definite_article('ⲛ')") # plural
|
| 98 |
+
self.prolog.assertz("definite_article('ⲡⲉ')") # masculine singular (variant)
|
| 99 |
+
self.prolog.assertz("definite_article('ⲧⲉ')") # feminine singular (variant)
|
| 100 |
+
self.prolog.assertz("definite_article('ⲛⲉ')") # plural (variant)
|
| 101 |
+
|
| 102 |
+
# Pronominal system - Independent pronouns
|
| 103 |
+
self.prolog.assertz("independent_pronoun('ⲁⲛⲟⲕ')") # I
|
| 104 |
+
self.prolog.assertz("independent_pronoun('ⲛⲧⲟⲕ')") # you (m.sg)
|
| 105 |
+
self.prolog.assertz("independent_pronoun('ⲛⲧⲟ')") # you (f.sg)
|
| 106 |
+
self.prolog.assertz("independent_pronoun('ⲛⲧⲟϥ')") # he
|
| 107 |
+
self.prolog.assertz("independent_pronoun('ⲛⲧⲟⲥ')") # she
|
| 108 |
+
self.prolog.assertz("independent_pronoun('ⲁⲛⲟⲛ')") # we
|
| 109 |
+
self.prolog.assertz("independent_pronoun('ⲛⲧⲱⲧⲛ')") # you (pl)
|
| 110 |
+
self.prolog.assertz("independent_pronoun('ⲛⲧⲟⲟⲩ')") # they
|
| 111 |
+
|
| 112 |
+
# Suffix pronouns (enclitic)
|
| 113 |
+
self.prolog.assertz("suffix_pronoun('ⲓ')") # my/me
|
| 114 |
+
self.prolog.assertz("suffix_pronoun('ⲕ')") # your (m.sg)
|
| 115 |
+
self.prolog.assertz("suffix_pronoun('ϥ')") # his/him
|
| 116 |
+
self.prolog.assertz("suffix_pronoun('ⲥ')") # her
|
| 117 |
+
self.prolog.assertz("suffix_pronoun('ⲛ')") # our/us
|
| 118 |
+
self.prolog.assertz("suffix_pronoun('ⲧⲛ')") # your (pl)
|
| 119 |
+
self.prolog.assertz("suffix_pronoun('ⲟⲩ')") # their/them
|
| 120 |
+
|
| 121 |
+
# Coptic verbal system - Conjugation bases (tense/aspect markers)
|
| 122 |
+
self.prolog.assertz("conjugation_base('ⲁ')") # Perfect (aorist)
|
| 123 |
+
self.prolog.assertz("conjugation_base('ⲛⲉ')") # Imperfect/past
|
| 124 |
+
self.prolog.assertz("conjugation_base('ϣⲁ')") # Future/conditional
|
| 125 |
+
self.prolog.assertz("conjugation_base('ⲙⲡⲉ')") # Negative perfect
|
| 126 |
+
self.prolog.assertz("conjugation_base('ⲙⲛ')") # Negative existential
|
| 127 |
+
self.prolog.assertz("conjugation_base('ⲉⲣϣⲁⲛ')") # Conditional
|
| 128 |
+
|
| 129 |
+
# Auxiliary verbs (copulas)
|
| 130 |
+
self.prolog.assertz("copula('ⲡⲉ')") # is (m.sg)
|
| 131 |
+
self.prolog.assertz("copula('ⲧⲉ')") # is (f.sg)
|
| 132 |
+
self.prolog.assertz("copula('ⲛⲉ')") # are (pl)
|
| 133 |
+
|
| 134 |
+
# ===================================================================
|
| 135 |
+
# COPTIC SYNTACTIC RULES
|
| 136 |
+
# ===================================================================
|
| 137 |
+
|
| 138 |
+
# Noun phrase structure rules
|
| 139 |
+
# Valid NP structure: Article + Noun
|
| 140 |
+
self.prolog.assertz("valid_np(Article, Noun) :- definite_article(Article), noun_compatible(Noun)")
|
| 141 |
+
|
| 142 |
+
# Helper: Any word can be a noun (simplified)
|
| 143 |
+
self.prolog.assertz("noun_compatible(_)")
|
| 144 |
+
|
| 145 |
+
# Definiteness agreement rule - In Coptic, definiteness is marked by articles
|
| 146 |
+
self.prolog.assertz("requires_definiteness(Noun, Article) :- definite_article(Article)")
|
| 147 |
+
|
| 148 |
+
# Tripartite nominal sentence pattern
|
| 149 |
+
# Coptic tripartite pattern: Subject - Copula - Predicate
|
| 150 |
+
# Example: ⲁⲛⲟⲕ ⲡⲉ ⲡⲛⲟⲩⲧⲉ (I am God)
|
| 151 |
+
self.prolog.assertz("tripartite_sentence(Subject, Copula, Predicate) :- independent_pronoun(Subject), copula(Copula), noun_compatible(Predicate)")
|
| 152 |
+
|
| 153 |
+
# Verbal sentence patterns
|
| 154 |
+
# Verbal sentence: Conjugation + Subject + Verb
|
| 155 |
+
self.prolog.assertz("verbal_sentence(Conj, Subject, Verb) :- conjugation_base(Conj), (independent_pronoun(Subject) ; definite_article(Subject)), verb_compatible(Verb)")
|
| 156 |
+
|
| 157 |
+
# Helper: Any word can be a verb (simplified)
|
| 158 |
+
self.prolog.assertz("verb_compatible(_)")
|
| 159 |
+
|
| 160 |
+
# ===================================================================
|
| 161 |
+
# DEPENDENCY VALIDATION RULES
|
| 162 |
+
# ===================================================================
|
| 163 |
+
|
| 164 |
+
# Validate subject-verb relationship
|
| 165 |
+
self.prolog.assertz("valid_subject_verb(Subject, Verb, SubjPOS, VerbPOS) :- member(SubjPOS, ['PRON', 'NOUN', 'PROPN']), member(VerbPOS, ['VERB', 'AUX'])")
|
| 166 |
+
|
| 167 |
+
# Validate determiner-noun relationship
|
| 168 |
+
self.prolog.assertz("valid_det_noun(Det, Noun, DetPOS, NounPOS) :- DetPOS = 'DET', member(NounPOS, ['NOUN', 'PROPN'])")
|
| 169 |
+
|
| 170 |
+
# Validate modifier relationships
|
| 171 |
+
self.prolog.assertz("valid_modifier(Head, Modifier, ModPOS) :- member(ModPOS, ['ADJ', 'ADV', 'DET'])")
|
| 172 |
+
|
| 173 |
+
# Validate punctuation assignments - content words should NOT be punct
|
| 174 |
+
# Only actual punctuation marks (PUNCT POS tag) should have punct relation
|
| 175 |
+
self.prolog.assertz("invalid_punct(Word, POS, Relation) :- Relation = 'punct', member(POS, ['VERB', 'NOUN', 'PRON', 'PROPN', 'DET', 'ADJ', 'ADV', 'AUX', 'NUM'])")
|
| 176 |
+
|
| 177 |
+
# ===================================================================
|
| 178 |
+
# ERROR CORRECTION RULES
|
| 179 |
+
# ===================================================================
|
| 180 |
+
|
| 181 |
+
# Suggest correct relation for DET (determiner)
|
| 182 |
+
# DET before NOUN should be 'det' relation
|
| 183 |
+
self.prolog.assertz("suggest_correction('DET', _, 'det')")
|
| 184 |
+
|
| 185 |
+
# Suggest correct relation for PRON (pronoun)
|
| 186 |
+
# PRON is typically subject (nsubj), object (obj), or possessive
|
| 187 |
+
self.prolog.assertz("suggest_correction('PRON', 'VERB', 'nsubj')") # Pronoun before verb = subject
|
| 188 |
+
self.prolog.assertz("suggest_correction('PRON', 'AUX', 'nsubj')") # Pronoun before aux = subject
|
| 189 |
+
self.prolog.assertz("suggest_correction('PRON', _, 'nsubj')") # Default for pronoun
|
| 190 |
+
|
| 191 |
+
# Suggest correct relation for NOUN
|
| 192 |
+
self.prolog.assertz("suggest_correction('NOUN', 'VERB', 'obj')") # Noun after verb = object
|
| 193 |
+
self.prolog.assertz("suggest_correction('NOUN', 'AUX', 'nsubj')") # Noun after copula = predicate nominal
|
| 194 |
+
self.prolog.assertz("suggest_correction('NOUN', _, 'obl')") # Default for noun
|
| 195 |
+
|
| 196 |
+
# Suggest correct relation for VERB
|
| 197 |
+
# Main verbs are often root, ccomp (complement clause), or advcl (adverbial clause)
|
| 198 |
+
self.prolog.assertz("suggest_correction('VERB', 'SCONJ', 'ccomp')") # Verb after subordinator = complement
|
| 199 |
+
self.prolog.assertz("suggest_correction('VERB', 'VERB', 'ccomp')") # Verb after verb = complement
|
| 200 |
+
self.prolog.assertz("suggest_correction('VERB', _, 'root')") # Default for verb
|
| 201 |
+
|
| 202 |
+
# Suggest correct relation for AUX (auxiliary/copula)
|
| 203 |
+
self.prolog.assertz("suggest_correction('AUX', _, 'cop')") # Copula relation
|
| 204 |
+
|
| 205 |
+
# Suggest correct relation for ADJ (adjective)
|
| 206 |
+
self.prolog.assertz("suggest_correction('ADJ', 'NOUN', 'amod')") # Adjective modifying noun
|
| 207 |
+
|
| 208 |
+
# Suggest correct relation for ADV (adverb)
|
| 209 |
+
self.prolog.assertz("suggest_correction('ADV', _, 'advmod')") # Adverbial modifier
|
| 210 |
+
|
| 211 |
+
# Suggest correct relation for NUM (number)
|
| 212 |
+
self.prolog.assertz("suggest_correction('NUM', 'NOUN', 'nummod')") # Number modifying noun
|
| 213 |
+
self.prolog.assertz("suggest_correction('NUM', _, 'obl')") # Default for number (temporal/oblique)
|
| 214 |
+
|
| 215 |
+
# ===================================================================
|
| 216 |
+
# MORPHOLOGICAL ANALYSIS RULES
|
| 217 |
+
# ===================================================================
|
| 218 |
+
|
| 219 |
+
# Clitic attachment patterns
|
| 220 |
+
self.prolog.assertz("has_suffix_pronoun(Word, Base, Suffix) :- atom_concat(Base, Suffix, Word), suffix_pronoun(Suffix), atom_length(Base, BaseLen), BaseLen > 0")
|
| 221 |
+
|
| 222 |
+
# Article stripping for lemmatization
|
| 223 |
+
self.prolog.assertz("strip_article(Word, Lemma) :- definite_article(Article), atom_concat(Article, Lemma, Word), atom_length(Lemma, LemmaLen), LemmaLen > 0")
|
| 224 |
+
|
| 225 |
+
# If no article found, word is its own lemma
|
| 226 |
+
self.prolog.assertz("strip_article(Word, Word) :- \\+ (definite_article(Article), atom_concat(Article, _, Word))")
|
| 227 |
+
|
| 228 |
+
print("✓ Coptic grammatical rules loaded into Prolog")
|
| 229 |
+
|
| 230 |
+
# ===================================================================
|
| 231 |
+
# PYTHON INTERFACE METHODS
|
| 232 |
+
# ===================================================================
|
| 233 |
+
|
| 234 |
+
def validate_dependency(self, head_word, dep_word, head_pos, dep_pos, relation):
|
| 235 |
+
"""
|
| 236 |
+
Validate a dependency relation using Prolog rules
|
| 237 |
+
|
| 238 |
+
Args:
|
| 239 |
+
head_word: The head word text
|
| 240 |
+
dep_word: The dependent word text
|
| 241 |
+
head_pos: POS tag of head
|
| 242 |
+
dep_pos: POS tag of dependent
|
| 243 |
+
relation: Dependency relation (nsubj, obj, det, etc.)
|
| 244 |
+
|
| 245 |
+
Returns:
|
| 246 |
+
dict: Validation result with status and suggestions
|
| 247 |
+
"""
|
| 248 |
+
if not self.prolog_initialized:
|
| 249 |
+
return {"valid": True, "message": "Prolog not available"}
|
| 250 |
+
|
| 251 |
+
try:
|
| 252 |
+
result = {"valid": True, "warnings": [], "suggestions": []}
|
| 253 |
+
|
| 254 |
+
# Check subject-verb relationships
|
| 255 |
+
if relation in ['nsubj', 'csubj']:
|
| 256 |
+
query = f"valid_subject_verb('{dep_word}', '{head_word}', '{dep_pos}', '{head_pos}')"
|
| 257 |
+
query_result = list(self.prolog.query(query))
|
| 258 |
+
if not query_result:
|
| 259 |
+
result["warnings"].append(
|
| 260 |
+
f"Unusual subject-verb: {dep_word} ({dep_pos}) → {head_word} ({head_pos})"
|
| 261 |
+
)
|
| 262 |
+
|
| 263 |
+
# Check determiner-noun relationships
|
| 264 |
+
elif relation == 'det':
|
| 265 |
+
query = f"valid_det_noun('{dep_word}', '{head_word}', '{dep_pos}', '{head_pos}')"
|
| 266 |
+
query_result = list(self.prolog.query(query))
|
| 267 |
+
if not query_result:
|
| 268 |
+
result["warnings"].append(
|
| 269 |
+
f"Unusual det-noun: {dep_word} → {head_word}"
|
| 270 |
+
)
|
| 271 |
+
|
| 272 |
+
# Check for incorrect punctuation assignments and suggest corrections
|
| 273 |
+
query = f"invalid_punct('{dep_word}', '{dep_pos}', '{relation}')"
|
| 274 |
+
query_result = list(self.prolog.query(query))
|
| 275 |
+
if query_result:
|
| 276 |
+
# Query for suggested correction
|
| 277 |
+
correction_query = f"suggest_correction('{dep_pos}', '{head_pos}', Suggestion)"
|
| 278 |
+
correction_result = list(self.prolog.query(correction_query))
|
| 279 |
+
|
| 280 |
+
if correction_result and 'Suggestion' in correction_result[0]:
|
| 281 |
+
suggested_rel = correction_result[0]['Suggestion']
|
| 282 |
+
result["warnings"].append(
|
| 283 |
+
f"⚠️ PARSER ERROR: '{dep_word}' ({dep_pos}) incorrectly labeled as 'punct' → SUGGESTED: '{suggested_rel}'"
|
| 284 |
+
)
|
| 285 |
+
result["suggestions"].append({
|
| 286 |
+
"word": dep_word,
|
| 287 |
+
"pos": dep_pos,
|
| 288 |
+
"incorrect": relation,
|
| 289 |
+
"suggested": suggested_rel,
|
| 290 |
+
"head_pos": head_pos
|
| 291 |
+
})
|
| 292 |
+
else:
|
| 293 |
+
result["warnings"].append(
|
| 294 |
+
f"⚠️ PARSER ERROR: '{dep_word}' ({dep_pos}) incorrectly labeled as 'punct' - should be a content relation"
|
| 295 |
+
)
|
| 296 |
+
|
| 297 |
+
return result
|
| 298 |
+
|
| 299 |
+
except Exception as e:
|
| 300 |
+
return {"valid": True, "message": f"Validation error: {e}"}
|
| 301 |
+
|
| 302 |
+
def check_tripartite_pattern(self, words, pos_tags):
|
| 303 |
+
"""
|
| 304 |
+
Check if a sentence follows the Coptic tripartite nominal pattern
|
| 305 |
+
|
| 306 |
+
Args:
|
| 307 |
+
words: List of word forms
|
| 308 |
+
pos_tags: List of POS tags
|
| 309 |
+
|
| 310 |
+
Returns:
|
| 311 |
+
dict: Pattern analysis results
|
| 312 |
+
"""
|
| 313 |
+
if not self.prolog_initialized or len(words) < 3:
|
| 314 |
+
return {"is_tripartite": False}
|
| 315 |
+
|
| 316 |
+
try:
|
| 317 |
+
# Check for tripartite pattern: Pronoun - Copula - Noun
|
| 318 |
+
subj, cop, pred = words[0], words[1], words[2]
|
| 319 |
+
|
| 320 |
+
query = f"tripartite_sentence('{subj}', '{cop}', '{pred}')"
|
| 321 |
+
query_result = list(self.prolog.query(query))
|
| 322 |
+
is_tripartite = len(query_result) > 0
|
| 323 |
+
|
| 324 |
+
return {
|
| 325 |
+
"is_tripartite": is_tripartite,
|
| 326 |
+
"pattern": f"{subj} - {cop} - {pred}" if is_tripartite else None,
|
| 327 |
+
"description": "Tripartite nominal sentence" if is_tripartite else None
|
| 328 |
+
}
|
| 329 |
+
|
| 330 |
+
except Exception as e:
|
| 331 |
+
return {"is_tripartite": False, "error": str(e)}
|
| 332 |
+
|
| 333 |
+
def analyze_morphology(self, word):
|
| 334 |
+
"""
|
| 335 |
+
Analyze word morphology using Prolog rules
|
| 336 |
+
|
| 337 |
+
Args:
|
| 338 |
+
word: Coptic word to analyze
|
| 339 |
+
|
| 340 |
+
Returns:
|
| 341 |
+
dict: Morphological analysis
|
| 342 |
+
"""
|
| 343 |
+
if not self.prolog_initialized:
|
| 344 |
+
return {"word": word, "analyzed": False}
|
| 345 |
+
|
| 346 |
+
try:
|
| 347 |
+
analysis = {"word": word, "components": []}
|
| 348 |
+
|
| 349 |
+
# Check for definite article
|
| 350 |
+
article_query = f"strip_article('{word}', Lemma)"
|
| 351 |
+
results = list(self.prolog.query(article_query))
|
| 352 |
+
if results:
|
| 353 |
+
result = results[0]
|
| 354 |
+
if 'Lemma' in result:
|
| 355 |
+
lemma = result['Lemma']
|
| 356 |
+
if lemma != word:
|
| 357 |
+
analysis["has_article"] = True
|
| 358 |
+
analysis["lemma"] = lemma
|
| 359 |
+
analysis["article"] = word.replace(lemma, '')
|
| 360 |
+
|
| 361 |
+
# Check for suffix pronouns
|
| 362 |
+
suffix_query = f"has_suffix_pronoun('{word}', Base, Suffix)"
|
| 363 |
+
results = list(self.prolog.query(suffix_query))
|
| 364 |
+
if results:
|
| 365 |
+
result = results[0]
|
| 366 |
+
analysis["has_suffix"] = True
|
| 367 |
+
analysis["base"] = result.get('Base')
|
| 368 |
+
analysis["suffix"] = result.get('Suffix')
|
| 369 |
+
|
| 370 |
+
return analysis
|
| 371 |
+
|
| 372 |
+
except Exception as e:
|
| 373 |
+
return {"word": word, "error": str(e)}
|
| 374 |
+
|
| 375 |
+
def validate_parse_tree(self, words, pos_tags, heads, deprels):
|
| 376 |
+
"""
|
| 377 |
+
Validate an entire parse tree using Prolog constraints
|
| 378 |
+
|
| 379 |
+
Args:
|
| 380 |
+
words: List of word forms
|
| 381 |
+
pos_tags: List of POS tags
|
| 382 |
+
heads: List of head indices
|
| 383 |
+
deprels: List of dependency relations
|
| 384 |
+
|
| 385 |
+
Returns:
|
| 386 |
+
dict: Overall validation results with warnings and suggestions
|
| 387 |
+
"""
|
| 388 |
+
if not self.prolog_initialized:
|
| 389 |
+
return {"validated": False, "reason": "Prolog not available"}
|
| 390 |
+
|
| 391 |
+
try:
|
| 392 |
+
results = {
|
| 393 |
+
"validated": True,
|
| 394 |
+
"warnings": [],
|
| 395 |
+
"suggestions": [],
|
| 396 |
+
"patterns_found": []
|
| 397 |
+
}
|
| 398 |
+
|
| 399 |
+
# Check for tripartite pattern (basic assertz-based)
|
| 400 |
+
tripartite = self.check_tripartite_pattern(words, pos_tags)
|
| 401 |
+
if tripartite.get("is_tripartite"):
|
| 402 |
+
results["patterns_found"].append(tripartite)
|
| 403 |
+
|
| 404 |
+
# If DCG grammar is loaded, use advanced pattern matching
|
| 405 |
+
if hasattr(self, 'dcg_loaded') and self.dcg_loaded:
|
| 406 |
+
try:
|
| 407 |
+
dcg_results = self._validate_with_dcg(words, pos_tags, heads, deprels)
|
| 408 |
+
if dcg_results and isinstance(dcg_results, dict):
|
| 409 |
+
# Merge DCG results
|
| 410 |
+
if "patterns_found" in dcg_results and dcg_results["patterns_found"]:
|
| 411 |
+
results["patterns_found"].extend(dcg_results["patterns_found"])
|
| 412 |
+
if "warnings" in dcg_results and dcg_results["warnings"]:
|
| 413 |
+
results["warnings"].extend(dcg_results["warnings"])
|
| 414 |
+
except Exception as e:
|
| 415 |
+
print(f"Warning: DCG validation failed: {e}")
|
| 416 |
+
# Continue with basic validation even if DCG fails
|
| 417 |
+
|
| 418 |
+
# Validate each dependency (existing validation)
|
| 419 |
+
for i, (word, pos, head, rel) in enumerate(zip(words, pos_tags, heads, deprels)):
|
| 420 |
+
if head > 0 and head <= len(words): # Not root
|
| 421 |
+
head_word = words[head - 1]
|
| 422 |
+
head_pos = pos_tags[head - 1]
|
| 423 |
+
|
| 424 |
+
validation = self.validate_dependency(head_word, word, head_pos, pos, rel)
|
| 425 |
+
if validation.get("warnings"):
|
| 426 |
+
results["warnings"].extend(validation["warnings"])
|
| 427 |
+
|
| 428 |
+
return results
|
| 429 |
+
|
| 430 |
+
except Exception as e:
|
| 431 |
+
return {"validated": False, "error": str(e)}
|
| 432 |
+
|
| 433 |
+
def _validate_with_dcg(self, words, pos_tags, heads, deprels):
|
| 434 |
+
"""
|
| 435 |
+
Validate parse tree using DCG grammar rules
|
| 436 |
+
|
| 437 |
+
Args:
|
| 438 |
+
words: List of word tokens
|
| 439 |
+
pos_tags: List of POS tags
|
| 440 |
+
heads: List of head indices
|
| 441 |
+
deprels: List of dependency relations
|
| 442 |
+
|
| 443 |
+
Returns:
|
| 444 |
+
dict: DCG validation results
|
| 445 |
+
"""
|
| 446 |
+
try:
|
| 447 |
+
# Convert Python lists to Prolog format
|
| 448 |
+
words_pl = self._list_to_prolog_atoms(words)
|
| 449 |
+
pos_pl = self._list_to_prolog_atoms(pos_tags)
|
| 450 |
+
heads_pl = '[' + ','.join(map(str, heads)) + ']'
|
| 451 |
+
deprels_pl = self._list_to_prolog_atoms(deprels)
|
| 452 |
+
|
| 453 |
+
# Query the DCG validation predicate
|
| 454 |
+
query = f"coptic_grammar:validate_parse_tree({words_pl}, {pos_pl}, {heads_pl}, {deprels_pl})"
|
| 455 |
+
|
| 456 |
+
# Execute query - it asserts patterns and warnings
|
| 457 |
+
list(self.prolog.query(query))
|
| 458 |
+
|
| 459 |
+
# Retrieve patterns
|
| 460 |
+
patterns = []
|
| 461 |
+
pattern_query = "coptic_grammar:pattern_found(P)"
|
| 462 |
+
try:
|
| 463 |
+
for result in self.prolog.query(pattern_query):
|
| 464 |
+
if isinstance(result, dict) and 'P' in result:
|
| 465 |
+
pattern_data = result.get('P')
|
| 466 |
+
if pattern_data:
|
| 467 |
+
formatted = self._format_prolog_term(pattern_data)
|
| 468 |
+
patterns.append(formatted)
|
| 469 |
+
except Exception as e:
|
| 470 |
+
print(f"Warning: Error retrieving patterns: {e}")
|
| 471 |
+
|
| 472 |
+
# Retrieve warnings
|
| 473 |
+
warnings = []
|
| 474 |
+
warning_query = "coptic_grammar:warning(W)"
|
| 475 |
+
try:
|
| 476 |
+
for result in self.prolog.query(warning_query):
|
| 477 |
+
if isinstance(result, dict) and 'W' in result:
|
| 478 |
+
warning_data = result.get('W')
|
| 479 |
+
if warning_data:
|
| 480 |
+
formatted = self._format_prolog_term(warning_data)
|
| 481 |
+
warnings.append(formatted)
|
| 482 |
+
except Exception as e:
|
| 483 |
+
print(f"Warning: Error retrieving warnings: {e}")
|
| 484 |
+
|
| 485 |
+
# Clean up dynamic predicates
|
| 486 |
+
try:
|
| 487 |
+
list(self.prolog.query("coptic_grammar:retractall(pattern_found(_))"))
|
| 488 |
+
list(self.prolog.query("coptic_grammar:retractall(warning(_))"))
|
| 489 |
+
except Exception as e:
|
| 490 |
+
print(f"Warning: Error cleaning up Prolog predicates: {e}")
|
| 491 |
+
|
| 492 |
+
return {
|
| 493 |
+
"patterns_found": patterns,
|
| 494 |
+
"warnings": warnings
|
| 495 |
+
}
|
| 496 |
+
|
| 497 |
+
except Exception as e:
|
| 498 |
+
print(f"DCG validation error: {e}")
|
| 499 |
+
import traceback
|
| 500 |
+
traceback.print_exc()
|
| 501 |
+
return {
|
| 502 |
+
"patterns_found": [],
|
| 503 |
+
"warnings": []
|
| 504 |
+
}
|
| 505 |
+
|
| 506 |
+
def _list_to_prolog_atoms(self, python_list):
|
| 507 |
+
"""
|
| 508 |
+
Convert Python list of strings to Prolog list with properly quoted atoms
|
| 509 |
+
|
| 510 |
+
Args:
|
| 511 |
+
python_list: Python list of strings
|
| 512 |
+
|
| 513 |
+
Returns:
|
| 514 |
+
str: Prolog list syntax
|
| 515 |
+
"""
|
| 516 |
+
if not python_list:
|
| 517 |
+
return "[]"
|
| 518 |
+
|
| 519 |
+
# Quote and escape each string
|
| 520 |
+
items = []
|
| 521 |
+
for item in python_list:
|
| 522 |
+
# Escape single quotes
|
| 523 |
+
escaped = str(item).replace("'", "\\'")
|
| 524 |
+
items.append(f"'{escaped}'")
|
| 525 |
+
|
| 526 |
+
return '[' + ','.join(items) + ']'
|
| 527 |
+
|
| 528 |
+
def _format_prolog_term(self, term):
|
| 529 |
+
"""
|
| 530 |
+
Format a Prolog term for Python display
|
| 531 |
+
|
| 532 |
+
Args:
|
| 533 |
+
term: Prolog term (can be atom, list, or compound)
|
| 534 |
+
|
| 535 |
+
Returns:
|
| 536 |
+
dict: Formatted representation (always a dict)
|
| 537 |
+
"""
|
| 538 |
+
if isinstance(term, list):
|
| 539 |
+
result = {}
|
| 540 |
+
for item in term:
|
| 541 |
+
if hasattr(item, 'name') and hasattr(item, 'args'):
|
| 542 |
+
# Compound term like pattern_name('...')
|
| 543 |
+
key = item.name
|
| 544 |
+
value = item.args[0] if len(item.args) > 0 else None
|
| 545 |
+
result[key] = str(value) if value is not None else ''
|
| 546 |
+
return result if result else {'data': str(term)}
|
| 547 |
+
elif isinstance(term, str):
|
| 548 |
+
# Simple string/atom - wrap in dict
|
| 549 |
+
return {'type': term, 'data': term}
|
| 550 |
+
else:
|
| 551 |
+
# Other types - convert to string and wrap
|
| 552 |
+
return {'data': str(term)}
|
| 553 |
+
|
| 554 |
+
def query_prolog(self, query_string):
|
| 555 |
+
"""
|
| 556 |
+
Direct Prolog query interface for custom queries
|
| 557 |
+
|
| 558 |
+
Args:
|
| 559 |
+
query_string: Prolog query as string
|
| 560 |
+
|
| 561 |
+
Returns:
|
| 562 |
+
Query result or None
|
| 563 |
+
"""
|
| 564 |
+
if not self.prolog_initialized:
|
| 565 |
+
return None
|
| 566 |
+
|
| 567 |
+
try:
|
| 568 |
+
results = list(self.prolog.query(query_string))
|
| 569 |
+
return results[0] if results else None
|
| 570 |
+
except Exception as e:
|
| 571 |
+
print(f"Prolog query error: {e}")
|
| 572 |
+
return None
|
| 573 |
+
|
| 574 |
+
def cleanup(self):
|
| 575 |
+
"""
|
| 576 |
+
Cleanup Prolog engine and threads properly
|
| 577 |
+
"""
|
| 578 |
+
if self.prolog_initialized and self.prolog is not None:
|
| 579 |
+
try:
|
| 580 |
+
# Try to properly halt the Prolog engine
|
| 581 |
+
# This attempts to stop all Prolog threads
|
| 582 |
+
try:
|
| 583 |
+
# Query halt to stop Prolog cleanly
|
| 584 |
+
list(self.prolog.query("halt"))
|
| 585 |
+
except:
|
| 586 |
+
# halt will raise an exception as Prolog stops, which is expected
|
| 587 |
+
pass
|
| 588 |
+
|
| 589 |
+
# Clean up the Prolog instance
|
| 590 |
+
self.prolog = None
|
| 591 |
+
self.prolog_initialized = False
|
| 592 |
+
print("✓ Prolog engine cleaned up successfully")
|
| 593 |
+
except Exception as e:
|
| 594 |
+
print(f"Warning: Error during Prolog cleanup: {e}")
|
| 595 |
+
|
| 596 |
+
|
| 597 |
+
# ===================================================================
|
| 598 |
+
# CONVENIENCE FUNCTIONS
|
| 599 |
+
# ===================================================================
|
| 600 |
+
|
| 601 |
+
def create_prolog_engine():
|
| 602 |
+
"""Factory function to create and initialize Prolog engine"""
|
| 603 |
+
return CopticPrologRules()
|
| 604 |
+
|
| 605 |
+
|
| 606 |
+
# ===================================================================
|
| 607 |
+
# EXAMPLE USAGE
|
| 608 |
+
# ===================================================================
|
| 609 |
+
|
| 610 |
+
if __name__ == "__main__":
|
| 611 |
+
print("="*70)
|
| 612 |
+
print("Coptic Prolog Rules - Test Suite")
|
| 613 |
+
print("="*70)
|
| 614 |
+
|
| 615 |
+
# Initialize engine
|
| 616 |
+
prolog = create_prolog_engine()
|
| 617 |
+
|
| 618 |
+
if not prolog.prolog_initialized:
|
| 619 |
+
print("\n⚠️ Prolog not available. Cannot run tests.")
|
| 620 |
+
exit(1)
|
| 621 |
+
|
| 622 |
+
print("\n" + "="*70)
|
| 623 |
+
print("TEST 1: Tripartite Pattern Recognition")
|
| 624 |
+
print("="*70)
|
| 625 |
+
|
| 626 |
+
# Test tripartite sentence: ⲁⲛⲟⲕ ⲡⲉ ⲡⲛⲟⲩⲧⲉ (I am God)
|
| 627 |
+
words = ['ⲁⲛⲟⲕ', 'ⲡⲉ', 'ⲡⲛⲟⲩⲧⲉ']
|
| 628 |
+
pos_tags = ['PRON', 'AUX', 'NOUN']
|
| 629 |
+
|
| 630 |
+
result = prolog.check_tripartite_pattern(words, pos_tags)
|
| 631 |
+
print(f"\nInput: {' '.join(words)}")
|
| 632 |
+
print(f"Result: {result}")
|
| 633 |
+
|
| 634 |
+
print("\n" + "="*70)
|
| 635 |
+
print("TEST 2: Morphological Analysis")
|
| 636 |
+
print("="*70)
|
| 637 |
+
|
| 638 |
+
# Test article stripping
|
| 639 |
+
test_words = ['ⲡⲛⲟⲩⲧⲉ', 'ⲧⲃⲁϣⲟⲣ', 'ⲛⲣⲱⲙⲉ']
|
| 640 |
+
for word in test_words:
|
| 641 |
+
analysis = prolog.analyze_morphology(word)
|
| 642 |
+
print(f"\nWord: {word}")
|
| 643 |
+
print(f"Analysis: {analysis}")
|
| 644 |
+
|
| 645 |
+
print("\n" + "="*70)
|
| 646 |
+
print("TEST 3: Dependency Validation")
|
| 647 |
+
print("="*70)
|
| 648 |
+
|
| 649 |
+
# Test subject-verb relationship
|
| 650 |
+
validation = prolog.validate_dependency(
|
| 651 |
+
head_word='ⲡⲉ',
|
| 652 |
+
dep_word='ⲁⲛⲟⲕ',
|
| 653 |
+
head_pos='AUX',
|
| 654 |
+
dep_pos='PRON',
|
| 655 |
+
relation='nsubj'
|
| 656 |
+
)
|
| 657 |
+
print(f"\nDependency: ⲁⲛⲟⲕ (PRON) --nsubj--> ⲡⲉ (AUX)")
|
| 658 |
+
print(f"Validation: {validation}")
|
| 659 |
+
|
| 660 |
+
print("\n" + "="*70)
|
| 661 |
+
print("TEST 4: Custom Prolog Query")
|
| 662 |
+
print("="*70)
|
| 663 |
+
|
| 664 |
+
# Test custom query
|
| 665 |
+
result = prolog.query_prolog("definite_article(X)")
|
| 666 |
+
print(f"\nQuery: definite_article(X)")
|
| 667 |
+
print(f"Result: {result}")
|
| 668 |
+
|
| 669 |
+
print("\n" + "="*70)
|
| 670 |
+
print("All tests completed!")
|
| 671 |
+
print("="*70)
|
|
@@ -5,3 +5,4 @@ stanza
|
|
| 5 |
torch
|
| 6 |
transformers>=4.30.0
|
| 7 |
sentencepiece>=0.1.99
|
|
|
|
|
|
| 5 |
torch
|
| 6 |
transformers>=4.30.0
|
| 7 |
sentencepiece>=0.1.99
|
| 8 |
+
pyswip>=0.2.10
|