Plans for support of ordered set?

I saw the release of 2.19, hurray, but thought to ask, are ordered sets on the drawing board for a future release?

I ask because using TypeDB to represent hypergraphs of texts will require me to create a ordered relationship because words (think entities) aren’t just members of a set comprising a verse, but are members of an ordered set comprising a verse. Think of the verse as a container object. The verses are also ordered, but can be differently ordered, depending upon the tradition/witness.

That may answer my own question since the order can be unique in any particular instance and default ordering would save recording relationships only for modern printed editions.

Thoughts?

BTW, refs to verse citations won’t be attributes of verses. Because what the content of a verse ref varies according to various traditions. So references, think John 1:1 are going to be separate entities with additional attributes identitying authorities.

Thanks!

Ordered sets have been spoken about, but I don’t think we have a timeline for that yet. @Joshua will be able to confirm that.

Could you share your current schema and a hypothetical schema if you were to have an ordered-set?
I’d like to understand why you can’t model the ordered verse with native typeql. Or is the problem only when you’re trying to query it?

Ah, yes to sharing the data, “can’t model with native typeql,” was from ignorance of the sort operator. I was thinking only in terms of the hypergraph.

I have elided a large number of attributes from each word to make John 1:1 easy to view.

<verse id="JHN 1:1">

The <w> elements are sub-parts of a verse (noting that as entities, each occurs only one time, be referenced for multiple verses. Saves on errors by repeating the attributes, but introduces the need to note variations.)

<w ref="JHN 1:1!1" xml:id="n43001001001">Ἐν</w>
<w ref="JHN 1:1!2" xml:id="n43001001002">ἀρχῇ</w>
<w ref="JHN 1:1!3" xml:id="n43001001003">ἦν</w>
<w ref="JHN 1:1!4" xml:id="n43001001004">ὁ</w>
<w ref="JHN 1:1!5" xml:id="n43001001005">Λόγος</w>
<w ref="JHN 1:1!6" xml:id="n43001001006">καὶ</w>
<w ref="JHN 1:1!7" xml:id="n43001001007">ὁ</w>
<w ref="JHN 1:1!8" xml:id="n43001001008">Λόγος</w>
<w ref="JHN 1:1!9" xml:id="n43001001009">ἦν</w>
<w ref="JHN 1:1!10" xml:id="n43001001010">πρὸς</w>
<w ref="JHN 1:1!11" xml:id="n43001001011">τὸν</w>
<w ref="JHN 1:1!12" xml:id="n43001001012">Θεόν</w>
<w ref="JHN 1:1!13" xml:id="n43001001013">καὶ</w>
<w ref="JHN 1:1!14" xml:id="n43001001014">Θεὸς</w>
<w ref="JHN 1:1!15" xml:id="n43001001015">ἦν</w>
<w ref="JHN 1:1!16" xml:id="n43001001016">ὁ</w>
<w ref="JHN 1:1!17" xml:id="n43001001017">Λόγος</w>

Using the sort operator from typeql, it’s trivial to sort the words in the proper order, at least with these attributes. What I was missing is the default document order axis of XML, which enables me to ask for the 5th word, without explicit sorting. (Same sorting logic applies to verses, chapters, books, assuming ids for sorting.)

This text is from a particular printed edition, the issues becoming more complex when we start working with different sources with have different words, different orders of words, different numbering systems. Our identifiers need a scope because JHN 1:1!1 can return any number of words, including a null result, depending upon the source you are querying.

I suspect the answer is we will have to build-in sort on any query that returns results to our users. Thanks for the quick response!

Patrick

I understand why you’d want to bake the ordering into the graph.

To see if I understand your problem, Does this model make sense or am I missing something? (Assuming you’re ok with using sort)

define
id sub attribute, value string;
ref sub attribute, value string;

position sub attribute, value long;
word-text sub attribute, value string;
word sub entity, owns id, owns word-text;

word-position sub relation, relates word, relates position; 
ordered-verse sub relation, owns ref, relates ordered-word;

word plays word-position:word;
position plays word-position:position;
word-position plays ordered-verse:ordered-word;

With data:

insert 
$wp1 1 isa position;
#...
$wp17 17 isa position;
#...


$n43001001001 isa word, has id "n43001001001", has word-text "Ἐν";
$jhn_1_1_1 (word: $n43001001001, position: $wp1) isa word-position;
#....
$n43001001017 isa word, has id "n43001001017", has word-text "Λόγος";
$jhn_1_1_17 (word: $n43001001017, position: $wp17) isa word-position;


$jhn_1_1 (ordered-word: $jhn_1_1_1, #...,
 ordered-word: $jhn_1_1_17 ) isa ordered-verse ;
$jhn_1_1 has ref "JHN 1:1";

And a query


match
$verse-ref = "JHN 1:1";
$ordered-verse (ordered-word: $ordered-word) isa ordered-verse, has ref $verse-ref;

$ordered-word (word: $word, position: $position ) isa word-position;
$word has word-text $text;
get $text, $position, $verse-ref;
sort $position;
group $verse-ref;

I do see why querying would be inconvenient without the ability to return "sets"I’m using a match-group but it’s limited to one level of nesting. We do have plans for supporting depth > 1 nested results on the roadmap with a fetch construct (assuming I understand the feature correctly). Hopefully that’s only a few months away.

Thanks!

That’s very helpful but I’m uncertain about word-text as an attribute.

If you have a moment, type says:

“All of the pesron type instances have ownership of the same instance of the name type with the value “Bob”).”

I had to think about that for a while, because the same instance can have different readings depending on context. But it being the same instance is no bar to adding sub relations to bind different readings to the same instance. OK, that works. Thanks again!

Sorry, I didn’t mean the schema as a suggestion. I was just doing the equivalent of paraphrasing to test my understanding of your domain. It’s very useful to guide us when designing/implementing features.

I think your understanding of the “Bob” problem is indeed what was intended. I believe this was made to encourage the user to model the situation properly. (Please correct me if I don’t understand your domain properly: ) Congruent to the case of multiple persons having the same name, If there are multiple readings having the same ‘text’, they should be modelled as different ‘readings’, rather than different instances of the same text.

Thanks! Appreciate all the help I can get!

I’ll try to flesh out the most basic problem with “readings” first.

The notion that we have a fixed text against which variation is expressed, is a construct of the profession. For both the Hebrew Bible and New Testament, there are “standard editions,” which don’t really exist anywhere aside as historical publications. That is no manuscript witness is the same as those texts. We work from them only as a matter of convenient.

In a very real sense, the “standard” texts are variations upon the manuscript witnesses and a “base text” is only a matter of personal choice.

What appears to be static numbering of the words, is only true for one edition or manuscript. In my example, it was derived from the 1904 Nestle Greek NT.

So, it’s not fair to say John 1:1!5 then readings for the fifth word, because there may be no fifth word in some witnesses. The number of words themselves can change, even have entire verses or ranges of verses be absent.

Q: What if every word had an evidenced-by relationship to a witness (which could include a printed edition), and an order attribute (within a verse) that is meaningful only if you ask for a verse, with that witness, and sort the words based on the order attribute? That way even though the word-text might be the same, identity and other attributes are keyed on the evidenced-by relationship. Thanks!

I’ll be grinding away tomorrow but I just realized (damage from long use of texts) the identifiers I associate with words are truthfully identifiers for words on a line, not particular words. Some witnesses will have more or less words, including gaps. So the word sub entity only needs id and ref, plus a relation that specifies text for that position, that relation also including a witness, parsing, etc. I’ll try to write it up like you did in the morning. Thanks for the prod on abstraction.

With some scars and obvious better modeling decisions to be made. Witness isn’t connected to text-instance for example.

define

case sub attribute, value string;
citation sub attribute, value string;
class sub attribute, value string;
domain sub attribute, value string;
gender sub attribute, value string;
gloss sub attribute, value string;
id sub attribute, value string;
ln sub attribute, value string;
morph sub attribute, value string;
number sub attribute, value string;
parsing sub attribute, value string;
range sub attribute, value string;
ref sub attribute, value string;
role sub attribute, value string;
rule sub attribute, value string;
sigla sub attribute, value string;
strong sub attribute, value string;
cltype sub attribute, value string;
word sub attribute, value string;
wtype sub attribute, value string;

witnessed-by sub relation,
relates witness,
relates text;

text-instance sub relation,
relates text,
relates w,
owns case,
owns class,
owns domain,
owns gender,
owns gloss,
owns ln,
owns morph,
owns number,
owns parsing,
owns role,
owns strong,
owns wtype;

sentence sub entity,
owns ref,
owns range;

text sub entity,
owns word,
plays witnessed-by:text;

w sub entity,
owns id,
owns ref,
plays text-instance:w;

wg sub entity,
owns role,
owns rule,
owns cltype,
owns range;

witness sub entity,
owns sigla,
owns citation,
plays witnessed-by:witness;

In the webinars earlier this week, great ones by the way, it was mentioned that TypeDB is being rewritten in Rust. (yes?) If that’s the case, I think Rust already has support for ordered sets: OrderedSet in phf - Rust

Although after sitting with my text case, I’m not sure I need ordered sets. If I have an unadorned string of words in the attribute of an entity. their order is fixed. If words are represented by separate entities, then a word “appears” in other relations, even though there is only one instance of the entity that owns it. (Sorry, text damage from thinking of words, even the same word, as other than an association with a position on a line. Hidden from the user of course but I suspect that is how I would model it.)

Ordering is something we’ve been actively thinking about. We have two loose plans in mind:

  1. Introduce composite attribute values:
coordinate sub attribute, value [Double, Double]

for example. This should allow for primitive valued lists within an attribute, enums, and possibly other types of structures such as key-value. These would be immutable values, same as our attributes now.

  1. Introduce ordered connections

We are considering introducing an @ordered annotation do allow ordering connections as ordered sets

file sub entity,
  owns edit-history @ordered;

archive sub entity,
  relates content @ordered;

This would preserve all the current semantics of owns and relates/plays semantics, but allow writing and reading these connections with control over ordering:

insert 
$x isa file, has edit-history [2021-01-01, 2022-01-01] # create initial ordered owns

or alternatively 2 queries with implicit ‘append’ ordering:

insert $x isa file, has edit-history 2021-01-01;
match $x isa file; insert $x has edit-history 2022-01-01;

Insert at position:

match
$x isa file;
insert
$x has edit-history[0] 2020-01-01;

matching the ordered list:

match 
$x isa file, has edit-history[] $h;

would return $h = [2020..., 2021..., 2022...]

We imagine if we have ordered collections, they’re indexable:

match 
$x isa file, has edit-history[] $h;
$y isa file, has edit-history[0] $h2;
$h[0] = $h2

We can also do something similar for role playing:

insert
$archive (content: [$file0, $file1, $file2]);
# equivalent to a sequential insert appending files to the archive as new role players of `content`

matching ordered role players:

match
$archive (content[]: $c) isa archive;
$c[0] has edit-history[0] 2021...;

Note that if we’re going to allow variables to hold some kind of a set of things, I don’t see why we can’t introduce that feature earlier:

match $x isa file, has edit-history[] $h

could return an unordered set for $h.

This feature is pretty interesting, early whiteboard days, but we definitely will want something that allows explicit ordering (there will be a cost, such as an extra index).

It’s hard to give an estimate for the time being, but I hope outlining this gives you some ideas of what could be coming up :slight_smile:

Regarding attributes, some use cases requires duplicated values, like this array
[“yellow”,“blue”,“blue”,“red”]

Which can be currently stored in different ways thanks yo attributes being able to play roles:


define

orderedOwnership sub relation,
relates target,
relates next,
relates owner,
plays orderedOwnership:next;

orderedStringAttribute sub attribute, abstract, value string,
plays orderedOwnership:target;

labels sub orderedStringAttribute;

book sub entity, owns labels, plays orderedOwnership:owner;
insert

$blue isa labels;
$blue 'blue';
$yellow isa labels;
$yellow 'yellow';
$red isa labels;
$red 'red';

$b1 isa book;

$t1 (target: $yellow, next: $t2, owner: $b1) isa orderedOwnership;
$t2 (target: $blue, next: $t3, owner: $b1) isa orderedOwnership;
$t3 (target: $blue, next: $t4, owner:$b1) isa orderedOwnership;
$t4 (target: $red, owner:$b1) isa orderedOwnership;

Another thing to consider both in arrays (ordered attributes) and vectors (ordered edges) is that this would require a new api to modify order, as we should be able to:

  1. Append something (or a list of things) at the end
  2. Append something in a particular position
  3. Remove last
  4. Remove something in a particular position
  5. Change the position of a particular thing into a different position (rearrange)

Also, a particularly easy to imagine example is storing an html/jsx file in typeDB.

This one requires:

  • Order
  • Nesting
  • Repetition.

As we could store something like this

<div>
  <MyComp>
  <br/>
  <image src: "hey">
  <div> 
     <MyComp/>
  </div>
  <MyComp/>
</div>

Where we can see the 3 things (repetition, order and nested things)