Rogaton Claude commited on
Commit
1fe0b70
·
1 Parent(s): fbdd2e7

Add neural-symbolic parsing with Prolog validation

Browse files

Integrate Prolog-based grammatical validation layer on top of Stanza
neural parser for enhanced error detection and grammatical analysis.

Changes:
1. Added Prolog files:
- coptic_grammar.pl (13K) - DCG grammar rules
- coptic_lexicon.pl (486K) - Coptic lexicon
- coptic_prolog_rules.py (28K) - Python-Prolog interface

2. Updated Dockerfile:
- Install SWI-Prolog (swi-prolog package)
- Required for symbolic grammatical validation

3. Updated requirements.txt:
- Added pyswip>=0.2.10 for Python-Prolog integration

4. Extended coptic_parser_core.py:
- Added _init_prolog() to initialize Prolog engine
- Added _validate_with_prolog() for grammatical validation
- Modified parse_text() to include optional Prolog validation
- Returns validation results (patterns detected, warnings, errors)

Neural-Symbolic Architecture:
- Neural layer (Stanza): Dependency parsing, POS tagging, lemmatization
- Symbolic layer (Prolog): Grammatical rule validation, error detection
- Hybrid output: Parse tree + validation feedback

This creates a scholarly tool combining statistical and rule-based approaches,
ideal for Coptic language research and education.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

Dockerfile CHANGED
@@ -1,5 +1,10 @@
1
  FROM python:3.9
2
 
 
 
 
 
 
3
  WORKDIR /code
4
 
5
  COPY requirements.txt .
 
1
  FROM python:3.9
2
 
3
+ # Install SWI-Prolog for neural-symbolic parsing
4
+ RUN apt-get update && apt-get install -y \
5
+ swi-prolog \
6
+ && rm -rf /var/lib/apt/lists/*
7
+
8
  WORKDIR /code
9
 
10
  COPY requirements.txt .
coptic_grammar.pl ADDED
@@ -0,0 +1,400 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ %******************************************************************************
2
+ % COPTIC_DEPENDENCY_RULES.PL - Prolog Dependency Grammar for Coptic
3
+ %******************************************************************************
4
+ %
5
+ % This module demonstrates the adaptation from DCG (DETECT5.PRO style)
6
+ % to modern dependency grammar formalism.
7
+ %
8
+ % PARADIGM SHIFT:
9
+ % DCG: sentence --> NP, VP. (hierarchical constituents)
10
+ % Dependency: dep(verb, subject, nsubj). (head-dependent relations)
11
+ %
12
+ % Based on Universal Dependencies annotation scheme adapted for Coptic
13
+ % linguistic patterns (VSO word order, tripartite sentences, etc.)
14
+ %
15
+ % Author: Adapted from DETECT5.PRO (André Linden, 1989-91)
16
+ % Date: 2025
17
+ %
18
+ %******************************************************************************
19
+
20
+ :- module(coptic_dependency_rules, [
21
+ dependency_pattern/3,
22
+ validate_dependency/4,
23
+ suggest_parse/3,
24
+ apply_dependency_rules/3
25
+ ]).
26
+
27
+ :- ensure_loaded(coptic_lexicon).
28
+
29
+ %******************************************************************************
30
+ % CORE DEPENDENCY PATTERNS
31
+ %******************************************************************************
32
+
33
+ % Pattern 1: VSO Transitive Sentence
34
+ % Example: ⲥⲱⲧⲙ ⲡⲣⲱⲙⲉ ⲡϣⲁϫⲉ (hear the-man the-word = "The man hears the word")
35
+ %
36
+ % Dependency structure:
37
+ % ⲥⲱⲧⲙ (VERB, root)
38
+ % ├── ⲡⲣⲱⲙⲉ (NOUN, nsubj)
39
+ % └── ⲡϣⲁϫⲉ (NOUN, obj)
40
+ %
41
+ dependency_pattern(vso_transitive,
42
+ Words,
43
+ [dep(Subj, SubjPOS, SIdx, Verb, VIdx, nsubj),
44
+ dep(Obj, ObjPOS, OIdx, Verb, VIdx, obj)]) :-
45
+ % Verb at position VIdx
46
+ nth1(VIdx, Words, word(Verb, VerbPOS, _)),
47
+ member(VerbPOS, ['VERB', 'AUX']),
48
+
49
+ % Subject at position SIdx
50
+ nth1(SIdx, Words, word(Subj, SubjPOS, _)),
51
+ member(SubjPOS, ['NOUN', 'PRON', 'PROPN']),
52
+
53
+ % Object at position OIdx
54
+ nth1(OIdx, Words, word(Obj, ObjPOS, _)),
55
+ member(ObjPOS, ['NOUN', 'PRON', 'PROPN']),
56
+
57
+ % VSO word order constraint (crucial for Coptic!)
58
+ VIdx < SIdx,
59
+ SIdx < OIdx,
60
+
61
+ % Verify verb is transitive
62
+ is_transitive(Verb).
63
+
64
+ % Pattern 2: VS Intransitive Sentence
65
+ % Example: ⲃⲱⲕ ⲡⲣⲱⲙⲉ (go the-man = "The man goes")
66
+ %
67
+ dependency_pattern(vs_intransitive,
68
+ Words,
69
+ [dep(Subj, SubjPOS, SIdx, Verb, VIdx, nsubj)]) :-
70
+ % Verb
71
+ nth1(VIdx, Words, word(Verb, VerbPOS, _)),
72
+ member(VerbPOS, ['VERB', 'AUX']),
73
+
74
+ % Subject
75
+ nth1(SIdx, Words, word(Subj, SubjPOS, _)),
76
+ member(SubjPOS, ['NOUN', 'PRON', 'PROPN']),
77
+
78
+ % VS word order
79
+ VIdx < SIdx,
80
+
81
+ % Verify verb is intransitive
82
+ is_intransitive(Verb).
83
+
84
+ % Pattern 3: Tripartite Nominal Sentence
85
+ % Example: ⲁⲛⲟⲕ ⲡⲉ ⲡⲛⲟⲩⲧⲉ (I am the-god = "I am God")
86
+ %
87
+ % Structure: Subject + Copula + Predicate
88
+ % In UD: Predicate is head, Subject and Copula depend on it
89
+ %
90
+ % ⲡⲛⲟⲩⲧⲉ (NOUN, root)
91
+ % ├── ⲁⲛⲟⲕ (PRON, nsubj)
92
+ % └── ⲡⲉ (AUX, cop)
93
+ %
94
+ dependency_pattern(tripartite,
95
+ Words,
96
+ [dep(Subj, SubjPOS, SIdx, Pred, PIdx, nsubj),
97
+ dep(Cop, 'AUX', CIdx, Pred, PIdx, cop)]) :-
98
+ % Subject (first position, typically)
99
+ nth1(SIdx, Words, word(Subj, SubjPOS, _)),
100
+ member(SubjPOS, ['NOUN', 'PRON', 'PROPN']),
101
+
102
+ % Copula (ⲡⲉ, ⲧⲉ, ⲛⲉ)
103
+ nth1(CIdx, Words, word(Cop, 'AUX', _)),
104
+ member(Cop, ['ⲡⲉ', 'ⲧⲉ', 'ⲛⲉ']),
105
+
106
+ % Predicate (nominal or adjectival)
107
+ nth1(PIdx, Words, word(Pred, PredPOS, _)),
108
+ member(PredPOS, ['NOUN', 'ADJ', 'PROPN']),
109
+
110
+ % Typical order: S - Cop - Pred (but can vary)
111
+ SIdx < PIdx,
112
+
113
+ % Gender/number agreement between copula and predicate
114
+ copula_agrees_with_predicate(Cop, Pred).
115
+
116
+ % Pattern 4: Converted Tripartite (Predicate-Subject-Copula)
117
+ % Example: ⲡⲛⲟⲩⲧⲉ ⲁⲛⲟⲕ ⲡⲉ (God I am = "I am God" - emphatic)
118
+ %
119
+ dependency_pattern(tripartite_converted,
120
+ Words,
121
+ [dep(Subj, SubjPOS, SIdx, Pred, PIdx, nsubj),
122
+ dep(Cop, 'AUX', CIdx, Pred, PIdx, cop)]) :-
123
+ nth1(PIdx, Words, word(Pred, PredPOS, _)),
124
+ member(PredPOS, ['NOUN', 'ADJ', 'PROPN']),
125
+
126
+ nth1(SIdx, Words, word(Subj, SubjPOS, _)),
127
+ member(SubjPOS, ['NOUN', 'PRON', 'PROPN']),
128
+
129
+ nth1(CIdx, Words, word(Cop, 'AUX', _)),
130
+ member(Cop, ['ⲡⲉ', 'ⲧⲉ', 'ⲛⲉ']),
131
+
132
+ % Converted order: Pred before Subj
133
+ PIdx < SIdx,
134
+
135
+ copula_agrees_with_predicate(Cop, Pred).
136
+
137
+ % Pattern 5: Determiner + Noun
138
+ % Example: ⲡⲣⲱⲙⲉ (the-man)
139
+ %
140
+ % In Coptic, articles often attach as prefixes, but in tokenized form:
141
+ % ⲡⲣⲱⲙⲉ
142
+ % ├── ⲡ (DET, det)
143
+ %
144
+ dependency_pattern(determiner_noun,
145
+ Words,
146
+ [dep(Det, 'DET', DIdx, Noun, NIdx, det)]) :-
147
+ nth1(DIdx, Words, word(Det, 'DET', _)),
148
+ nth1(NIdx, Words, word(Noun, 'NOUN', _)),
149
+
150
+ % Determiner precedes noun in Coptic
151
+ DIdx < NIdx,
152
+
153
+ % Adjacent or nearly adjacent
154
+ NIdx - DIdx =< 2,
155
+
156
+ % Gender agreement
157
+ determiner_gender_agrees(Det, Noun).
158
+
159
+ % Pattern 6: Adjective Modification
160
+ % Example: ⲡⲣⲱⲙⲉ ⲛⲁⲛⲟⲩϥ (the-man good = "the good man")
161
+ %
162
+ % In Coptic, adjectives typically follow nouns
163
+ % ⲣⲱⲙⲉ (NOUN)
164
+ % └── ⲛⲁⲛⲟⲩϥ (ADJ, amod)
165
+ %
166
+ dependency_pattern(noun_adjective,
167
+ Words,
168
+ [dep(Adj, 'ADJ', AIdx, Noun, NIdx, amod)]) :-
169
+ nth1(NIdx, Words, word(Noun, 'NOUN', _)),
170
+ nth1(AIdx, Words, word(Adj, 'ADJ', _)),
171
+
172
+ % Coptic: Adjective follows noun (typically)
173
+ NIdx < AIdx,
174
+
175
+ % Should be adjacent or nearly so
176
+ AIdx - NIdx =< 2,
177
+
178
+ % Gender/number agreement
179
+ adjective_agrees(Adj, Noun).
180
+
181
+ % Pattern 7: Prepositional Phrase
182
+ % Example: ϩⲛ ⲧⲡⲟⲗⲓⲥ (in the-city)
183
+ %
184
+ % ⲧⲡⲟⲗⲓⲥ (NOUN, head in larger structure)
185
+ % ├── ϩⲛ (ADP, case)
186
+ %
187
+ dependency_pattern(prepositional_phrase,
188
+ Words,
189
+ [dep(Prep, 'ADP', PIdx, Noun, NIdx, case)]) :-
190
+ nth1(PIdx, Words, word(Prep, 'ADP', _)),
191
+ nth1(NIdx, Words, word(Noun, NounPOS, _)),
192
+ member(NounPOS, ['NOUN', 'PRON', 'PROPN']),
193
+
194
+ % Preposition before noun
195
+ PIdx < NIdx,
196
+
197
+ % Adjacent
198
+ NIdx - PIdx =< 2.
199
+
200
+ % Pattern 8: Conjunction
201
+ % Example: ⲡⲣⲱⲙⲉ ⲙⲛ ⲧⲉϣⲓⲙⲉ (the-man and the-woman)
202
+ %
203
+ dependency_pattern(coordination,
204
+ Words,
205
+ [dep(Conj, 'CCONJ', CIdx, Head, HIdx, cc),
206
+ dep(Coord2, Coord2POS, C2Idx, Head, HIdx, conj)]) :-
207
+ nth1(HIdx, Words, word(Head, HeadPOS, _)),
208
+ member(HeadPOS, ['NOUN', 'VERB', 'ADJ']),
209
+
210
+ nth1(CIdx, Words, word(Conj, 'CCONJ', _)),
211
+
212
+ nth1(C2Idx, Words, word(Coord2, Coord2POS, _)),
213
+ Coord2POS = HeadPOS, % Same POS as head
214
+
215
+ % Order: Head < Conj < Coord2
216
+ HIdx < CIdx,
217
+ CIdx < C2Idx.
218
+
219
+ %******************************************************************************
220
+ % CONSTRAINT CHECKING
221
+ %******************************************************************************
222
+
223
+ % Check if verb is transitive (requires object)
224
+ is_transitive(Verb) :-
225
+ coptic_verb(Verb, Features),
226
+ member(transitive, Features), !.
227
+ is_transitive(_). % Default: assume transitive if unknown
228
+
229
+ % Check if verb is intransitive (no object)
230
+ is_intransitive(Verb) :-
231
+ coptic_verb(Verb, Features),
232
+ member(intransitive, Features), !.
233
+ is_intransitive(_). % Default: allow intransitive
234
+
235
+ % Copula-predicate agreement
236
+ copula_agrees_with_predicate(Cop, Pred) :-
237
+ coptic_noun(Pred, Gender, Number), !,
238
+ copula_form(Cop, Gender, Number).
239
+ copula_agrees_with_predicate(_, _). % Allow if not in lexicon
240
+
241
+ copula_form('ⲡⲉ', masc, sing).
242
+ copula_form('ⲧⲉ', fem, sing).
243
+ copula_form('ⲛⲉ', _, plur).
244
+ copula_form('ⲛⲉ', masc, plur).
245
+ copula_form('ⲛⲉ', fem, plur).
246
+
247
+ % Determiner-noun gender agreement
248
+ determiner_gender_agrees(Det, Noun) :-
249
+ coptic_noun(Noun, Gender, Number), !,
250
+ determiner_form(Det, Gender, Number).
251
+ determiner_gender_agrees(_, _). % Allow if not in lexicon
252
+
253
+ determiner_form('ⲡ', masc, sing).
254
+ determiner_form('ⲧ', fem, sing).
255
+ determiner_form('ⲛ', _, plur).
256
+ determiner_form('ⲟⲩ', _, _). % Indefinite: any gender/number
257
+
258
+ % Adjective-noun agreement
259
+ adjective_agrees(Adj, Noun) :-
260
+ coptic_noun(Noun, Gender, Number),
261
+ coptic_adjective(Adj, Gender, Number), !.
262
+ adjective_agrees(_, _). % Allow if not in lexicon
263
+
264
+ %******************************************************************************
265
+ % VALIDATION AND ERROR DETECTION
266
+ %******************************************************************************
267
+
268
+ % validate_dependency(+Token, +Head, +Relation, +Words)
269
+ % Check if a proposed dependency is valid according to Coptic grammar
270
+ validate_dependency(Token, Head, Relation, Words) :-
271
+ % Find positions
272
+ nth1(TokenIdx, Words, word(Token, TokenPOS, _)),
273
+ nth1(HeadIdx, Words, word(Head, HeadPOS, _)),
274
+
275
+ % Check if relation is valid for this POS pair
276
+ valid_relation(TokenPOS, HeadPOS, Relation),
277
+
278
+ % Check linguistic constraints
279
+ check_constraints(Token, TokenPOS, TokenIdx, Head, HeadPOS, HeadIdx, Relation, Words).
280
+
281
+ % Valid dependency relations (simplified from UD)
282
+ valid_relation('NOUN', 'VERB', nsubj).
283
+ valid_relation('PRON', 'VERB', nsubj).
284
+ valid_relation('PROPN', 'VERB', nsubj).
285
+ valid_relation('NOUN', 'VERB', obj).
286
+ valid_relation('PRON', 'VERB', obj).
287
+ valid_relation('NOUN', 'NOUN', nmod).
288
+ valid_relation('ADJ', 'NOUN', amod).
289
+ valid_relation('DET', 'NOUN', det).
290
+ valid_relation('ADP', 'NOUN', case).
291
+ valid_relation('ADP', 'PRON', case).
292
+ valid_relation('AUX', 'NOUN', cop).
293
+ valid_relation('AUX', 'ADJ', cop).
294
+ valid_relation('CCONJ', 'NOUN', cc).
295
+ valid_relation('CCONJ', 'VERB', cc).
296
+ valid_relation(_, _, root). % Root can be anything
297
+
298
+ % Constraint checking
299
+ check_constraints(_Token, _TokenPOS, TokenIdx, _Head, HeadPOS, HeadIdx, Relation, _Words) :-
300
+ % Word order constraints
301
+ ( Relation = nsubj,
302
+ member(HeadPOS, ['VERB', 'AUX'])
303
+ -> % In VSO, subject follows verb
304
+ TokenIdx > HeadIdx
305
+ ; true
306
+ ),
307
+
308
+ ( Relation = obj,
309
+ HeadPOS = 'VERB'
310
+ -> % Object follows subject in VSO
311
+ TokenIdx > HeadIdx
312
+ ; true
313
+ ),
314
+
315
+ ( Relation = det
316
+ -> % Determiner precedes noun
317
+ TokenIdx < HeadIdx
318
+ ; true
319
+ ),
320
+
321
+ ( Relation = amod
322
+ -> % Adjective typically follows noun in Coptic
323
+ TokenIdx > HeadIdx
324
+ ; true
325
+ ).
326
+
327
+ %******************************************************************************
328
+ % PARSING WITH DEPENDENCY RULES
329
+ %******************************************************************************
330
+
331
+ % suggest_parse(+Words, +POSTags, -Dependencies)
332
+ % Use dependency rules to suggest a parse
333
+ suggest_parse(Words, POSTags, Dependencies) :-
334
+ % Build word structures
335
+ length(Words, N),
336
+ build_word_list(Words, POSTags, 1, N, WordList),
337
+
338
+ % Try to match patterns
339
+ findall(Deps, dependency_pattern(_, WordList, Deps), AllDeps),
340
+
341
+ % Combine non-overlapping dependencies
342
+ flatten(AllDeps, FlatDeps),
343
+ sort(FlatDeps, Dependencies).
344
+
345
+ build_word_list([], [], _, _, []).
346
+ build_word_list([W|Ws], [P|Ps], Idx, N, [word(W, P, Idx)|Rest]) :-
347
+ NextIdx is Idx + 1,
348
+ build_word_list(Ws, Ps, NextIdx, N, Rest).
349
+
350
+ % apply_dependency_rules(+Tokens, +POSTags, -ParseTree)
351
+ % Full parsing using dependency rules
352
+ apply_dependency_rules(Tokens, POSTags, ParseTree) :-
353
+ suggest_parse(Tokens, POSTags, Dependencies),
354
+
355
+ % Find root
356
+ ( select(dep(Root, RootPOS, RootIdx, _, 0, root), Dependencies, OtherDeps)
357
+ -> true
358
+ ; % No root found - pick first verb or noun
359
+ nth1(RootIdx, POSTags, RootPOS),
360
+ member(RootPOS, ['VERB', 'NOUN', 'AUX']),
361
+ nth1(RootIdx, Tokens, Root),
362
+ OtherDeps = Dependencies
363
+ ),
364
+
365
+ ParseTree = dep_tree{
366
+ root: Root,
367
+ root_pos: RootPOS,
368
+ root_index: RootIdx,
369
+ dependencies: OtherDeps,
370
+ parser: 'Dependency Rules'
371
+ }.
372
+
373
+ %******************************************************************************
374
+ % COMPARISON: DCG vs DEPENDENCY
375
+ %******************************************************************************
376
+
377
+ % EXAMPLE: How DETECT5.PRO might have encoded a rule
378
+ %
379
+ % DCG Style (old):
380
+ % sentence --> verb_phrase.
381
+ % verb_phrase --> verb(V, trans), noun_phrase(Subj), noun_phrase(Obj),
382
+ % {vso_order(V, Subj, Obj)}.
383
+ % noun_phrase --> determiner(D), noun(N), {gender_agrees(D, N)}.
384
+ %
385
+ % Dependency Style (new):
386
+ % dependency_pattern(vso,
387
+ % [verb(V, VIdx), noun(S, SIdx), noun(O, OIdx)],
388
+ % [dep(S, SIdx, V, VIdx, nsubj),
389
+ % dep(O, OIdx, V, VIdx, obj)]) :-
390
+ % VIdx < SIdx, SIdx < OIdx.
391
+ %
392
+ % KEY DIFFERENCES:
393
+ % 1. DCG builds hierarchical structure (VP contains NPs)
394
+ % 2. Dependency expresses direct relations (verb governs subject)
395
+ % 3. Dependency is more flexible for free word order
396
+ % 4. Dependency better matches modern neural parser output
397
+
398
+ %******************************************************************************
399
+ % END OF MODULE
400
+ %******************************************************************************
coptic_lexicon.pl ADDED
The diff for this file is too large to render. See raw diff
 
coptic_parser_core.py CHANGED
@@ -2,6 +2,9 @@
2
  """
3
  Coptic Dependency Parser - Core Module (Web-Compatible)
4
 
 
 
 
5
  Extracted from coptic-parser.py for integration with web interfaces.
6
  Author: André Linden (2025)
7
  License: CC BY-NC-SA 4.0
@@ -12,11 +15,25 @@ import warnings
12
  warnings.filterwarnings('ignore')
13
 
14
  class CopticParserCore:
15
- """Lightweight Coptic parser for web applications"""
16
 
17
  def __init__(self):
18
  self.nlp = None
19
  self.diaparser = None
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
  def load_parser(self):
22
  """Initialize Stanza parser with Coptic models"""
@@ -33,7 +50,7 @@ class CopticParserCore:
33
  download_method=None,
34
  verbose=False
35
  )
36
- print("✓ Coptic parser loaded successfully")
37
 
38
  except Exception as e:
39
  # If models not found, download them
@@ -58,12 +75,13 @@ class CopticParserCore:
58
  print(f"❌ Failed to load parser: {e}")
59
  raise
60
 
61
- def parse_text(self, text):
62
  """
63
- Parse Coptic text and return structured results
64
 
65
  Args:
66
  text: Coptic text to parse
 
67
 
68
  Returns:
69
  dict with:
@@ -71,6 +89,7 @@ class CopticParserCore:
71
  - total_sentences: int
72
  - total_tokens: int
73
  - text: original text
 
74
  """
75
  if not text or not text.strip():
76
  return None
@@ -78,7 +97,7 @@ class CopticParserCore:
78
  # Ensure parser is loaded
79
  self.load_parser()
80
 
81
- # Parse with Stanza
82
  doc = self.nlp(text)
83
 
84
  if not doc.sentences:
@@ -112,13 +131,68 @@ class CopticParserCore:
112
  'words': words_data
113
  })
114
 
115
- return {
116
  'sentences': sentences,
117
  'total_sentences': len(sentences),
118
  'total_tokens': total_tokens,
119
  'text': text
120
  }
121
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
122
  def format_conllu(self, parse_result):
123
  """Format parse result as CoNLL-U"""
124
  if not parse_result:
 
2
  """
3
  Coptic Dependency Parser - Core Module (Web-Compatible)
4
 
5
+ Neural-Symbolic Hybrid Parser combining Stanza (neural) with Prolog (symbolic)
6
+ for enhanced grammatical validation and error detection.
7
+
8
  Extracted from coptic-parser.py for integration with web interfaces.
9
  Author: André Linden (2025)
10
  License: CC BY-NC-SA 4.0
 
15
  warnings.filterwarnings('ignore')
16
 
17
  class CopticParserCore:
18
+ """Lightweight neural-symbolic Coptic parser for web applications"""
19
 
20
  def __init__(self):
21
  self.nlp = None
22
  self.diaparser = None
23
+ self.prolog = None # Prolog engine for grammatical validation
24
+ self._init_prolog()
25
+
26
+ def _init_prolog(self):
27
+ """Initialize Prolog engine for grammatical validation (optional)"""
28
+ try:
29
+ from coptic_prolog_rules import create_prolog_engine
30
+ self.prolog = create_prolog_engine()
31
+ if self.prolog and self.prolog.prolog_initialized:
32
+ print("✓ Prolog engine initialized successfully")
33
+ except Exception as e:
34
+ print(f"ℹ Prolog validation not available: {e}")
35
+ print(" Parser will continue with neural-only mode")
36
+ self.prolog = None
37
 
38
  def load_parser(self):
39
  """Initialize Stanza parser with Coptic models"""
 
50
  download_method=None,
51
  verbose=False
52
  )
53
+ print("✓ Coptic neural parser loaded successfully")
54
 
55
  except Exception as e:
56
  # If models not found, download them
 
75
  print(f"❌ Failed to load parser: {e}")
76
  raise
77
 
78
+ def parse_text(self, text, include_prolog_validation=True):
79
  """
80
+ Parse Coptic text and return structured results with optional Prolog validation
81
 
82
  Args:
83
  text: Coptic text to parse
84
+ include_prolog_validation: Whether to run Prolog grammatical validation (default: True)
85
 
86
  Returns:
87
  dict with:
 
89
  - total_sentences: int
90
  - total_tokens: int
91
  - text: original text
92
+ - prolog_validation: dict with validation results (if enabled and available)
93
  """
94
  if not text or not text.strip():
95
  return None
 
97
  # Ensure parser is loaded
98
  self.load_parser()
99
 
100
+ # Parse with Stanza (neural)
101
  doc = self.nlp(text)
102
 
103
  if not doc.sentences:
 
131
  'words': words_data
132
  })
133
 
134
+ result = {
135
  'sentences': sentences,
136
  'total_sentences': len(sentences),
137
  'total_tokens': total_tokens,
138
  'text': text
139
  }
140
 
141
+ # Add Prolog validation (symbolic) if available and requested
142
+ if include_prolog_validation and self.prolog and hasattr(self.prolog, 'prolog_initialized') and self.prolog.prolog_initialized:
143
+ try:
144
+ validation = self._validate_with_prolog(sentences)
145
+ result['prolog_validation'] = validation
146
+ except Exception as e:
147
+ print(f"ℹ Prolog validation skipped: {e}")
148
+ result['prolog_validation'] = None
149
+
150
+ return result
151
+
152
+ def _validate_with_prolog(self, sentences):
153
+ """
154
+ Validate parsed sentences using Prolog grammatical rules
155
+
156
+ Args:
157
+ sentences: List of parsed sentence data
158
+
159
+ Returns:
160
+ dict with validation results including patterns detected and warnings
161
+ """
162
+ if not self.prolog:
163
+ return None
164
+
165
+ validation_results = {
166
+ 'patterns_detected': [],
167
+ 'warnings': [],
168
+ 'has_errors': False
169
+ }
170
+
171
+ for sentence in sentences:
172
+ # Extract tokens, POS tags, heads, and dependency relations
173
+ tokens = [word['form'] for word in sentence['words']]
174
+ pos_tags = [word['upos'] for word in sentence['words']]
175
+ heads = [word['head'] for word in sentence['words']]
176
+ deprels = [word['deprel'] for word in sentence['words']]
177
+
178
+ # Validate with Prolog
179
+ try:
180
+ sent_validation = self.prolog.validate_parse_tree(tokens, pos_tags, heads, deprels)
181
+
182
+ if sent_validation:
183
+ # Merge results
184
+ if sent_validation.get('patterns'):
185
+ validation_results['patterns_detected'].extend(sent_validation['patterns'])
186
+
187
+ if sent_validation.get('warnings'):
188
+ validation_results['warnings'].extend(sent_validation['warnings'])
189
+ validation_results['has_errors'] = True
190
+
191
+ except Exception as e:
192
+ print(f"ℹ Prolog validation error for sentence: {e}")
193
+
194
+ return validation_results
195
+
196
  def format_conllu(self, parse_result):
197
  """Format parse result as CoNLL-U"""
198
  if not parse_result:
coptic_prolog_rules.py ADDED
@@ -0,0 +1,671 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Coptic Prolog Rules - Neural-Symbolic Integration
4
+ ==================================================
5
+
6
+ Integrates Prolog logic programming with neural dependency parsing
7
+ to enhance parsing accuracy through explicit grammatical rules.
8
+
9
+ Uses janus (SWI-Prolog Python interface) for bidirectional integration.
10
+
11
+ Author: Coptic NLP Project
12
+ License: CC BY-NC-SA 4.0
13
+ """
14
+
15
+ from pyswip import Prolog
16
+ import warnings
17
+ warnings.filterwarnings('ignore')
18
+
19
+
20
+ class CopticPrologRules:
21
+ """
22
+ Prolog-based grammatical rule engine for Coptic parsing validation
23
+ and enhancement.
24
+ """
25
+
26
+ def __init__(self):
27
+ """Initialize Prolog engine and load Coptic grammar rules"""
28
+ self.prolog_initialized = False
29
+ self.prolog = None
30
+ self._initialize_prolog()
31
+
32
+ def _initialize_prolog(self):
33
+ """Initialize SWI-Prolog and define Coptic grammatical rules"""
34
+ try:
35
+ # Initialize pyswip Prolog instance
36
+ self.prolog = Prolog()
37
+
38
+ # Define Coptic-specific grammatical rules
39
+ self._load_coptic_grammar()
40
+
41
+ self.prolog_initialized = True
42
+ print("✓ Prolog engine initialized successfully")
43
+
44
+ except Exception as e:
45
+ print(f"⚠️ Warning: Prolog initialization failed: {e}")
46
+ print(" Parser will continue without Prolog validation")
47
+ self.prolog_initialized = False
48
+
49
+ def _load_dcg_grammar(self):
50
+ """
51
+ Load DCG-based grammar rules from coptic_grammar.pl
52
+ and Coptic lexicon from coptic_lexicon.pl
53
+
54
+ This adds more sophisticated pattern matching using Definite Clause Grammars,
55
+ adapted from the French DETECT5.PRO error detector.
56
+ """
57
+ try:
58
+ from pathlib import Path
59
+
60
+ # Get path to DCG grammar file
61
+ # Note: The grammar file will load the lexicon automatically via ensure_loaded
62
+ current_dir = Path(__file__).parent
63
+ grammar_file = current_dir / "coptic_grammar.pl"
64
+
65
+ # Load grammar rules (which will load the lexicon)
66
+ if grammar_file.exists():
67
+ # Convert path to Prolog-compatible format
68
+ grammar_path = str(grammar_file.absolute()).replace('\\', '/')
69
+
70
+ # Load the module
71
+ query = f"consult('{grammar_path}')"
72
+ list(self.prolog.query(query))
73
+
74
+ print(f"✓ DCG grammar rules and lexicon loaded from {grammar_file.name}")
75
+ self.dcg_loaded = True
76
+ else:
77
+ print(f"ℹ DCG grammar file not found at {grammar_file}")
78
+ self.dcg_loaded = False
79
+
80
+ except Exception as e:
81
+ print(f"⚠️ Warning: Could not load DCG grammar: {e}")
82
+ self.dcg_loaded = False
83
+
84
+ def _load_coptic_grammar(self):
85
+ """Load Coptic linguistic rules into Prolog"""
86
+
87
+ # Try to load DCG grammar file if it exists
88
+ self._load_dcg_grammar()
89
+
90
+ # ===================================================================
91
+ # COPTIC MORPHOLOGICAL RULES
92
+ # ===================================================================
93
+
94
+ # Article system: definite articles
95
+ self.prolog.assertz("definite_article('ⲡ')") # masculine singular
96
+ self.prolog.assertz("definite_article('ⲧ')") # feminine singular
97
+ self.prolog.assertz("definite_article('ⲛ')") # plural
98
+ self.prolog.assertz("definite_article('ⲡⲉ')") # masculine singular (variant)
99
+ self.prolog.assertz("definite_article('ⲧⲉ')") # feminine singular (variant)
100
+ self.prolog.assertz("definite_article('ⲛⲉ')") # plural (variant)
101
+
102
+ # Pronominal system - Independent pronouns
103
+ self.prolog.assertz("independent_pronoun('ⲁⲛⲟⲕ')") # I
104
+ self.prolog.assertz("independent_pronoun('ⲛⲧⲟⲕ')") # you (m.sg)
105
+ self.prolog.assertz("independent_pronoun('ⲛⲧⲟ')") # you (f.sg)
106
+ self.prolog.assertz("independent_pronoun('ⲛⲧⲟϥ')") # he
107
+ self.prolog.assertz("independent_pronoun('ⲛⲧⲟⲥ')") # she
108
+ self.prolog.assertz("independent_pronoun('ⲁⲛⲟⲛ')") # we
109
+ self.prolog.assertz("independent_pronoun('ⲛⲧⲱⲧⲛ')") # you (pl)
110
+ self.prolog.assertz("independent_pronoun('ⲛⲧⲟⲟⲩ')") # they
111
+
112
+ # Suffix pronouns (enclitic)
113
+ self.prolog.assertz("suffix_pronoun('ⲓ')") # my/me
114
+ self.prolog.assertz("suffix_pronoun('ⲕ')") # your (m.sg)
115
+ self.prolog.assertz("suffix_pronoun('ϥ')") # his/him
116
+ self.prolog.assertz("suffix_pronoun('ⲥ')") # her
117
+ self.prolog.assertz("suffix_pronoun('ⲛ')") # our/us
118
+ self.prolog.assertz("suffix_pronoun('ⲧⲛ')") # your (pl)
119
+ self.prolog.assertz("suffix_pronoun('ⲟⲩ')") # their/them
120
+
121
+ # Coptic verbal system - Conjugation bases (tense/aspect markers)
122
+ self.prolog.assertz("conjugation_base('ⲁ')") # Perfect (aorist)
123
+ self.prolog.assertz("conjugation_base('ⲛⲉ')") # Imperfect/past
124
+ self.prolog.assertz("conjugation_base('ϣⲁ')") # Future/conditional
125
+ self.prolog.assertz("conjugation_base('ⲙⲡⲉ')") # Negative perfect
126
+ self.prolog.assertz("conjugation_base('ⲙⲛ')") # Negative existential
127
+ self.prolog.assertz("conjugation_base('ⲉⲣϣⲁⲛ')") # Conditional
128
+
129
+ # Auxiliary verbs (copulas)
130
+ self.prolog.assertz("copula('ⲡⲉ')") # is (m.sg)
131
+ self.prolog.assertz("copula('ⲧⲉ')") # is (f.sg)
132
+ self.prolog.assertz("copula('ⲛⲉ')") # are (pl)
133
+
134
+ # ===================================================================
135
+ # COPTIC SYNTACTIC RULES
136
+ # ===================================================================
137
+
138
+ # Noun phrase structure rules
139
+ # Valid NP structure: Article + Noun
140
+ self.prolog.assertz("valid_np(Article, Noun) :- definite_article(Article), noun_compatible(Noun)")
141
+
142
+ # Helper: Any word can be a noun (simplified)
143
+ self.prolog.assertz("noun_compatible(_)")
144
+
145
+ # Definiteness agreement rule - In Coptic, definiteness is marked by articles
146
+ self.prolog.assertz("requires_definiteness(Noun, Article) :- definite_article(Article)")
147
+
148
+ # Tripartite nominal sentence pattern
149
+ # Coptic tripartite pattern: Subject - Copula - Predicate
150
+ # Example: ⲁⲛⲟⲕ ⲡⲉ ⲡⲛⲟⲩⲧⲉ (I am God)
151
+ self.prolog.assertz("tripartite_sentence(Subject, Copula, Predicate) :- independent_pronoun(Subject), copula(Copula), noun_compatible(Predicate)")
152
+
153
+ # Verbal sentence patterns
154
+ # Verbal sentence: Conjugation + Subject + Verb
155
+ self.prolog.assertz("verbal_sentence(Conj, Subject, Verb) :- conjugation_base(Conj), (independent_pronoun(Subject) ; definite_article(Subject)), verb_compatible(Verb)")
156
+
157
+ # Helper: Any word can be a verb (simplified)
158
+ self.prolog.assertz("verb_compatible(_)")
159
+
160
+ # ===================================================================
161
+ # DEPENDENCY VALIDATION RULES
162
+ # ===================================================================
163
+
164
+ # Validate subject-verb relationship
165
+ self.prolog.assertz("valid_subject_verb(Subject, Verb, SubjPOS, VerbPOS) :- member(SubjPOS, ['PRON', 'NOUN', 'PROPN']), member(VerbPOS, ['VERB', 'AUX'])")
166
+
167
+ # Validate determiner-noun relationship
168
+ self.prolog.assertz("valid_det_noun(Det, Noun, DetPOS, NounPOS) :- DetPOS = 'DET', member(NounPOS, ['NOUN', 'PROPN'])")
169
+
170
+ # Validate modifier relationships
171
+ self.prolog.assertz("valid_modifier(Head, Modifier, ModPOS) :- member(ModPOS, ['ADJ', 'ADV', 'DET'])")
172
+
173
+ # Validate punctuation assignments - content words should NOT be punct
174
+ # Only actual punctuation marks (PUNCT POS tag) should have punct relation
175
+ self.prolog.assertz("invalid_punct(Word, POS, Relation) :- Relation = 'punct', member(POS, ['VERB', 'NOUN', 'PRON', 'PROPN', 'DET', 'ADJ', 'ADV', 'AUX', 'NUM'])")
176
+
177
+ # ===================================================================
178
+ # ERROR CORRECTION RULES
179
+ # ===================================================================
180
+
181
+ # Suggest correct relation for DET (determiner)
182
+ # DET before NOUN should be 'det' relation
183
+ self.prolog.assertz("suggest_correction('DET', _, 'det')")
184
+
185
+ # Suggest correct relation for PRON (pronoun)
186
+ # PRON is typically subject (nsubj), object (obj), or possessive
187
+ self.prolog.assertz("suggest_correction('PRON', 'VERB', 'nsubj')") # Pronoun before verb = subject
188
+ self.prolog.assertz("suggest_correction('PRON', 'AUX', 'nsubj')") # Pronoun before aux = subject
189
+ self.prolog.assertz("suggest_correction('PRON', _, 'nsubj')") # Default for pronoun
190
+
191
+ # Suggest correct relation for NOUN
192
+ self.prolog.assertz("suggest_correction('NOUN', 'VERB', 'obj')") # Noun after verb = object
193
+ self.prolog.assertz("suggest_correction('NOUN', 'AUX', 'nsubj')") # Noun after copula = predicate nominal
194
+ self.prolog.assertz("suggest_correction('NOUN', _, 'obl')") # Default for noun
195
+
196
+ # Suggest correct relation for VERB
197
+ # Main verbs are often root, ccomp (complement clause), or advcl (adverbial clause)
198
+ self.prolog.assertz("suggest_correction('VERB', 'SCONJ', 'ccomp')") # Verb after subordinator = complement
199
+ self.prolog.assertz("suggest_correction('VERB', 'VERB', 'ccomp')") # Verb after verb = complement
200
+ self.prolog.assertz("suggest_correction('VERB', _, 'root')") # Default for verb
201
+
202
+ # Suggest correct relation for AUX (auxiliary/copula)
203
+ self.prolog.assertz("suggest_correction('AUX', _, 'cop')") # Copula relation
204
+
205
+ # Suggest correct relation for ADJ (adjective)
206
+ self.prolog.assertz("suggest_correction('ADJ', 'NOUN', 'amod')") # Adjective modifying noun
207
+
208
+ # Suggest correct relation for ADV (adverb)
209
+ self.prolog.assertz("suggest_correction('ADV', _, 'advmod')") # Adverbial modifier
210
+
211
+ # Suggest correct relation for NUM (number)
212
+ self.prolog.assertz("suggest_correction('NUM', 'NOUN', 'nummod')") # Number modifying noun
213
+ self.prolog.assertz("suggest_correction('NUM', _, 'obl')") # Default for number (temporal/oblique)
214
+
215
+ # ===================================================================
216
+ # MORPHOLOGICAL ANALYSIS RULES
217
+ # ===================================================================
218
+
219
+ # Clitic attachment patterns
220
+ self.prolog.assertz("has_suffix_pronoun(Word, Base, Suffix) :- atom_concat(Base, Suffix, Word), suffix_pronoun(Suffix), atom_length(Base, BaseLen), BaseLen > 0")
221
+
222
+ # Article stripping for lemmatization
223
+ self.prolog.assertz("strip_article(Word, Lemma) :- definite_article(Article), atom_concat(Article, Lemma, Word), atom_length(Lemma, LemmaLen), LemmaLen > 0")
224
+
225
+ # If no article found, word is its own lemma
226
+ self.prolog.assertz("strip_article(Word, Word) :- \\+ (definite_article(Article), atom_concat(Article, _, Word))")
227
+
228
+ print("✓ Coptic grammatical rules loaded into Prolog")
229
+
230
+ # ===================================================================
231
+ # PYTHON INTERFACE METHODS
232
+ # ===================================================================
233
+
234
+ def validate_dependency(self, head_word, dep_word, head_pos, dep_pos, relation):
235
+ """
236
+ Validate a dependency relation using Prolog rules
237
+
238
+ Args:
239
+ head_word: The head word text
240
+ dep_word: The dependent word text
241
+ head_pos: POS tag of head
242
+ dep_pos: POS tag of dependent
243
+ relation: Dependency relation (nsubj, obj, det, etc.)
244
+
245
+ Returns:
246
+ dict: Validation result with status and suggestions
247
+ """
248
+ if not self.prolog_initialized:
249
+ return {"valid": True, "message": "Prolog not available"}
250
+
251
+ try:
252
+ result = {"valid": True, "warnings": [], "suggestions": []}
253
+
254
+ # Check subject-verb relationships
255
+ if relation in ['nsubj', 'csubj']:
256
+ query = f"valid_subject_verb('{dep_word}', '{head_word}', '{dep_pos}', '{head_pos}')"
257
+ query_result = list(self.prolog.query(query))
258
+ if not query_result:
259
+ result["warnings"].append(
260
+ f"Unusual subject-verb: {dep_word} ({dep_pos}) → {head_word} ({head_pos})"
261
+ )
262
+
263
+ # Check determiner-noun relationships
264
+ elif relation == 'det':
265
+ query = f"valid_det_noun('{dep_word}', '{head_word}', '{dep_pos}', '{head_pos}')"
266
+ query_result = list(self.prolog.query(query))
267
+ if not query_result:
268
+ result["warnings"].append(
269
+ f"Unusual det-noun: {dep_word} → {head_word}"
270
+ )
271
+
272
+ # Check for incorrect punctuation assignments and suggest corrections
273
+ query = f"invalid_punct('{dep_word}', '{dep_pos}', '{relation}')"
274
+ query_result = list(self.prolog.query(query))
275
+ if query_result:
276
+ # Query for suggested correction
277
+ correction_query = f"suggest_correction('{dep_pos}', '{head_pos}', Suggestion)"
278
+ correction_result = list(self.prolog.query(correction_query))
279
+
280
+ if correction_result and 'Suggestion' in correction_result[0]:
281
+ suggested_rel = correction_result[0]['Suggestion']
282
+ result["warnings"].append(
283
+ f"⚠️ PARSER ERROR: '{dep_word}' ({dep_pos}) incorrectly labeled as 'punct' → SUGGESTED: '{suggested_rel}'"
284
+ )
285
+ result["suggestions"].append({
286
+ "word": dep_word,
287
+ "pos": dep_pos,
288
+ "incorrect": relation,
289
+ "suggested": suggested_rel,
290
+ "head_pos": head_pos
291
+ })
292
+ else:
293
+ result["warnings"].append(
294
+ f"⚠️ PARSER ERROR: '{dep_word}' ({dep_pos}) incorrectly labeled as 'punct' - should be a content relation"
295
+ )
296
+
297
+ return result
298
+
299
+ except Exception as e:
300
+ return {"valid": True, "message": f"Validation error: {e}"}
301
+
302
+ def check_tripartite_pattern(self, words, pos_tags):
303
+ """
304
+ Check if a sentence follows the Coptic tripartite nominal pattern
305
+
306
+ Args:
307
+ words: List of word forms
308
+ pos_tags: List of POS tags
309
+
310
+ Returns:
311
+ dict: Pattern analysis results
312
+ """
313
+ if not self.prolog_initialized or len(words) < 3:
314
+ return {"is_tripartite": False}
315
+
316
+ try:
317
+ # Check for tripartite pattern: Pronoun - Copula - Noun
318
+ subj, cop, pred = words[0], words[1], words[2]
319
+
320
+ query = f"tripartite_sentence('{subj}', '{cop}', '{pred}')"
321
+ query_result = list(self.prolog.query(query))
322
+ is_tripartite = len(query_result) > 0
323
+
324
+ return {
325
+ "is_tripartite": is_tripartite,
326
+ "pattern": f"{subj} - {cop} - {pred}" if is_tripartite else None,
327
+ "description": "Tripartite nominal sentence" if is_tripartite else None
328
+ }
329
+
330
+ except Exception as e:
331
+ return {"is_tripartite": False, "error": str(e)}
332
+
333
+ def analyze_morphology(self, word):
334
+ """
335
+ Analyze word morphology using Prolog rules
336
+
337
+ Args:
338
+ word: Coptic word to analyze
339
+
340
+ Returns:
341
+ dict: Morphological analysis
342
+ """
343
+ if not self.prolog_initialized:
344
+ return {"word": word, "analyzed": False}
345
+
346
+ try:
347
+ analysis = {"word": word, "components": []}
348
+
349
+ # Check for definite article
350
+ article_query = f"strip_article('{word}', Lemma)"
351
+ results = list(self.prolog.query(article_query))
352
+ if results:
353
+ result = results[0]
354
+ if 'Lemma' in result:
355
+ lemma = result['Lemma']
356
+ if lemma != word:
357
+ analysis["has_article"] = True
358
+ analysis["lemma"] = lemma
359
+ analysis["article"] = word.replace(lemma, '')
360
+
361
+ # Check for suffix pronouns
362
+ suffix_query = f"has_suffix_pronoun('{word}', Base, Suffix)"
363
+ results = list(self.prolog.query(suffix_query))
364
+ if results:
365
+ result = results[0]
366
+ analysis["has_suffix"] = True
367
+ analysis["base"] = result.get('Base')
368
+ analysis["suffix"] = result.get('Suffix')
369
+
370
+ return analysis
371
+
372
+ except Exception as e:
373
+ return {"word": word, "error": str(e)}
374
+
375
+ def validate_parse_tree(self, words, pos_tags, heads, deprels):
376
+ """
377
+ Validate an entire parse tree using Prolog constraints
378
+
379
+ Args:
380
+ words: List of word forms
381
+ pos_tags: List of POS tags
382
+ heads: List of head indices
383
+ deprels: List of dependency relations
384
+
385
+ Returns:
386
+ dict: Overall validation results with warnings and suggestions
387
+ """
388
+ if not self.prolog_initialized:
389
+ return {"validated": False, "reason": "Prolog not available"}
390
+
391
+ try:
392
+ results = {
393
+ "validated": True,
394
+ "warnings": [],
395
+ "suggestions": [],
396
+ "patterns_found": []
397
+ }
398
+
399
+ # Check for tripartite pattern (basic assertz-based)
400
+ tripartite = self.check_tripartite_pattern(words, pos_tags)
401
+ if tripartite.get("is_tripartite"):
402
+ results["patterns_found"].append(tripartite)
403
+
404
+ # If DCG grammar is loaded, use advanced pattern matching
405
+ if hasattr(self, 'dcg_loaded') and self.dcg_loaded:
406
+ try:
407
+ dcg_results = self._validate_with_dcg(words, pos_tags, heads, deprels)
408
+ if dcg_results and isinstance(dcg_results, dict):
409
+ # Merge DCG results
410
+ if "patterns_found" in dcg_results and dcg_results["patterns_found"]:
411
+ results["patterns_found"].extend(dcg_results["patterns_found"])
412
+ if "warnings" in dcg_results and dcg_results["warnings"]:
413
+ results["warnings"].extend(dcg_results["warnings"])
414
+ except Exception as e:
415
+ print(f"Warning: DCG validation failed: {e}")
416
+ # Continue with basic validation even if DCG fails
417
+
418
+ # Validate each dependency (existing validation)
419
+ for i, (word, pos, head, rel) in enumerate(zip(words, pos_tags, heads, deprels)):
420
+ if head > 0 and head <= len(words): # Not root
421
+ head_word = words[head - 1]
422
+ head_pos = pos_tags[head - 1]
423
+
424
+ validation = self.validate_dependency(head_word, word, head_pos, pos, rel)
425
+ if validation.get("warnings"):
426
+ results["warnings"].extend(validation["warnings"])
427
+
428
+ return results
429
+
430
+ except Exception as e:
431
+ return {"validated": False, "error": str(e)}
432
+
433
+ def _validate_with_dcg(self, words, pos_tags, heads, deprels):
434
+ """
435
+ Validate parse tree using DCG grammar rules
436
+
437
+ Args:
438
+ words: List of word tokens
439
+ pos_tags: List of POS tags
440
+ heads: List of head indices
441
+ deprels: List of dependency relations
442
+
443
+ Returns:
444
+ dict: DCG validation results
445
+ """
446
+ try:
447
+ # Convert Python lists to Prolog format
448
+ words_pl = self._list_to_prolog_atoms(words)
449
+ pos_pl = self._list_to_prolog_atoms(pos_tags)
450
+ heads_pl = '[' + ','.join(map(str, heads)) + ']'
451
+ deprels_pl = self._list_to_prolog_atoms(deprels)
452
+
453
+ # Query the DCG validation predicate
454
+ query = f"coptic_grammar:validate_parse_tree({words_pl}, {pos_pl}, {heads_pl}, {deprels_pl})"
455
+
456
+ # Execute query - it asserts patterns and warnings
457
+ list(self.prolog.query(query))
458
+
459
+ # Retrieve patterns
460
+ patterns = []
461
+ pattern_query = "coptic_grammar:pattern_found(P)"
462
+ try:
463
+ for result in self.prolog.query(pattern_query):
464
+ if isinstance(result, dict) and 'P' in result:
465
+ pattern_data = result.get('P')
466
+ if pattern_data:
467
+ formatted = self._format_prolog_term(pattern_data)
468
+ patterns.append(formatted)
469
+ except Exception as e:
470
+ print(f"Warning: Error retrieving patterns: {e}")
471
+
472
+ # Retrieve warnings
473
+ warnings = []
474
+ warning_query = "coptic_grammar:warning(W)"
475
+ try:
476
+ for result in self.prolog.query(warning_query):
477
+ if isinstance(result, dict) and 'W' in result:
478
+ warning_data = result.get('W')
479
+ if warning_data:
480
+ formatted = self._format_prolog_term(warning_data)
481
+ warnings.append(formatted)
482
+ except Exception as e:
483
+ print(f"Warning: Error retrieving warnings: {e}")
484
+
485
+ # Clean up dynamic predicates
486
+ try:
487
+ list(self.prolog.query("coptic_grammar:retractall(pattern_found(_))"))
488
+ list(self.prolog.query("coptic_grammar:retractall(warning(_))"))
489
+ except Exception as e:
490
+ print(f"Warning: Error cleaning up Prolog predicates: {e}")
491
+
492
+ return {
493
+ "patterns_found": patterns,
494
+ "warnings": warnings
495
+ }
496
+
497
+ except Exception as e:
498
+ print(f"DCG validation error: {e}")
499
+ import traceback
500
+ traceback.print_exc()
501
+ return {
502
+ "patterns_found": [],
503
+ "warnings": []
504
+ }
505
+
506
+ def _list_to_prolog_atoms(self, python_list):
507
+ """
508
+ Convert Python list of strings to Prolog list with properly quoted atoms
509
+
510
+ Args:
511
+ python_list: Python list of strings
512
+
513
+ Returns:
514
+ str: Prolog list syntax
515
+ """
516
+ if not python_list:
517
+ return "[]"
518
+
519
+ # Quote and escape each string
520
+ items = []
521
+ for item in python_list:
522
+ # Escape single quotes
523
+ escaped = str(item).replace("'", "\\'")
524
+ items.append(f"'{escaped}'")
525
+
526
+ return '[' + ','.join(items) + ']'
527
+
528
+ def _format_prolog_term(self, term):
529
+ """
530
+ Format a Prolog term for Python display
531
+
532
+ Args:
533
+ term: Prolog term (can be atom, list, or compound)
534
+
535
+ Returns:
536
+ dict: Formatted representation (always a dict)
537
+ """
538
+ if isinstance(term, list):
539
+ result = {}
540
+ for item in term:
541
+ if hasattr(item, 'name') and hasattr(item, 'args'):
542
+ # Compound term like pattern_name('...')
543
+ key = item.name
544
+ value = item.args[0] if len(item.args) > 0 else None
545
+ result[key] = str(value) if value is not None else ''
546
+ return result if result else {'data': str(term)}
547
+ elif isinstance(term, str):
548
+ # Simple string/atom - wrap in dict
549
+ return {'type': term, 'data': term}
550
+ else:
551
+ # Other types - convert to string and wrap
552
+ return {'data': str(term)}
553
+
554
+ def query_prolog(self, query_string):
555
+ """
556
+ Direct Prolog query interface for custom queries
557
+
558
+ Args:
559
+ query_string: Prolog query as string
560
+
561
+ Returns:
562
+ Query result or None
563
+ """
564
+ if not self.prolog_initialized:
565
+ return None
566
+
567
+ try:
568
+ results = list(self.prolog.query(query_string))
569
+ return results[0] if results else None
570
+ except Exception as e:
571
+ print(f"Prolog query error: {e}")
572
+ return None
573
+
574
+ def cleanup(self):
575
+ """
576
+ Cleanup Prolog engine and threads properly
577
+ """
578
+ if self.prolog_initialized and self.prolog is not None:
579
+ try:
580
+ # Try to properly halt the Prolog engine
581
+ # This attempts to stop all Prolog threads
582
+ try:
583
+ # Query halt to stop Prolog cleanly
584
+ list(self.prolog.query("halt"))
585
+ except:
586
+ # halt will raise an exception as Prolog stops, which is expected
587
+ pass
588
+
589
+ # Clean up the Prolog instance
590
+ self.prolog = None
591
+ self.prolog_initialized = False
592
+ print("✓ Prolog engine cleaned up successfully")
593
+ except Exception as e:
594
+ print(f"Warning: Error during Prolog cleanup: {e}")
595
+
596
+
597
+ # ===================================================================
598
+ # CONVENIENCE FUNCTIONS
599
+ # ===================================================================
600
+
601
+ def create_prolog_engine():
602
+ """Factory function to create and initialize Prolog engine"""
603
+ return CopticPrologRules()
604
+
605
+
606
+ # ===================================================================
607
+ # EXAMPLE USAGE
608
+ # ===================================================================
609
+
610
+ if __name__ == "__main__":
611
+ print("="*70)
612
+ print("Coptic Prolog Rules - Test Suite")
613
+ print("="*70)
614
+
615
+ # Initialize engine
616
+ prolog = create_prolog_engine()
617
+
618
+ if not prolog.prolog_initialized:
619
+ print("\n⚠️ Prolog not available. Cannot run tests.")
620
+ exit(1)
621
+
622
+ print("\n" + "="*70)
623
+ print("TEST 1: Tripartite Pattern Recognition")
624
+ print("="*70)
625
+
626
+ # Test tripartite sentence: ⲁⲛⲟⲕ ⲡⲉ ⲡⲛⲟⲩⲧⲉ (I am God)
627
+ words = ['ⲁⲛⲟⲕ', 'ⲡⲉ', 'ⲡⲛⲟⲩⲧⲉ']
628
+ pos_tags = ['PRON', 'AUX', 'NOUN']
629
+
630
+ result = prolog.check_tripartite_pattern(words, pos_tags)
631
+ print(f"\nInput: {' '.join(words)}")
632
+ print(f"Result: {result}")
633
+
634
+ print("\n" + "="*70)
635
+ print("TEST 2: Morphological Analysis")
636
+ print("="*70)
637
+
638
+ # Test article stripping
639
+ test_words = ['ⲡⲛⲟⲩⲧⲉ', 'ⲧⲃⲁϣⲟⲣ', 'ⲛⲣⲱⲙⲉ']
640
+ for word in test_words:
641
+ analysis = prolog.analyze_morphology(word)
642
+ print(f"\nWord: {word}")
643
+ print(f"Analysis: {analysis}")
644
+
645
+ print("\n" + "="*70)
646
+ print("TEST 3: Dependency Validation")
647
+ print("="*70)
648
+
649
+ # Test subject-verb relationship
650
+ validation = prolog.validate_dependency(
651
+ head_word='ⲡⲉ',
652
+ dep_word='ⲁⲛⲟⲕ',
653
+ head_pos='AUX',
654
+ dep_pos='PRON',
655
+ relation='nsubj'
656
+ )
657
+ print(f"\nDependency: ⲁⲛⲟⲕ (PRON) --nsubj--> ⲡⲉ (AUX)")
658
+ print(f"Validation: {validation}")
659
+
660
+ print("\n" + "="*70)
661
+ print("TEST 4: Custom Prolog Query")
662
+ print("="*70)
663
+
664
+ # Test custom query
665
+ result = prolog.query_prolog("definite_article(X)")
666
+ print(f"\nQuery: definite_article(X)")
667
+ print(f"Result: {result}")
668
+
669
+ print("\n" + "="*70)
670
+ print("All tests completed!")
671
+ print("="*70)
requirements.txt CHANGED
@@ -5,3 +5,4 @@ stanza
5
  torch
6
  transformers>=4.30.0
7
  sentencepiece>=0.1.99
 
 
5
  torch
6
  transformers>=4.30.0
7
  sentencepiece>=0.1.99
8
+ pyswip>=0.2.10