Contextual Search

From KatWiki

Table of contents
[edit]

Contextual Searching

On this page we can discuss what features we need for contextual searching. It will categorize the types of possible contexts, and formalize possible relations between documents. Unlike the freedesktop.org (http://www.freedesktop.org) proposal on a standard file meta data (http://www.freedesktop.org/wiki/Standards_2fshared_2dfilemetadata_2dspec), I think we should focus not on meta data fields, but on relations between objects. The reason is that a context is not defined by a match in meta data, but a more fuzzy concept defining a cloud of objects with relations to the central object.

Theses approaches are partially overlapping, and might be complementary. The meta data might be used as 'has' relations, while I think the power of context will be in other types of relations between objects.

For example, the context of a script is not just defined by the author and the format, but also the libraries it uses and the data files it accesses are in the context. I don't think the meta data concept is able to grasp such information.

Some relations might be automatically created by indexing programs. For example, similarity of textual documents can be done by comparing important words in the document (with whatever method determined).

[edit]

Implementation in Kat

To implement this, we need to extend Kat with an ontology of relations, for which I created a concept below.

The index itself would need to be extended with a table describing the relations between objects, which could simple be a three column table with the fields: objectID1, objectID2, relationID.

The architecture will look like this. We'll have a(n):

  • Ontology (written in RDF, or OWL, or so)
  • a new type of Kat plugin, which extracts relationships
  • a new type of Kat processor, which will compare documents, e.g. using lucene
  • a new Kat index that will contain the found relationships
  • a new result view which will show the context of a file, e.g. using graphviz
  • a new way to classify the result list for normal queries, i.e. by context, e.g. using hierarchical clustering or clique detection
  • a new concept of doing queries, e.g. restrictions 'likeDocument X'
[edit]

Relationship Ontology

As said, Kat would require a fixed ontology of relationships. Below is a raw dump of my first thoughts on what relations we could have. Feel free to add your suggestions for both new context areas and relation types.

[edit]

File Context

The context of a file is firstly described by:

  • the path
  • the creation and access dates
  • the author

Relations between objects are:

  • inSameDirectory
  • inSamePath
  • hasAuthor
[edit]

Script/Source Context (extends File Context)

  • the libraries/programs it uses
  • the programming language

Relations between objects are:

  • extends
  • usesLibrary/usesProgram
  • isWrittenIn
[edit]

Textual Documents Context (extends File Context)

  • language
  • keywords
  • word pattern
  • hard coded links (e.g. <a href>'s in HTML, or <ulink>'s in DocBook)

Relations between objects are:

  • isVerySimilarTo (based on, for example, word pattern)
  • isWrittenIn
  • discusses
  • includes (for images, etc...)
[edit]

Email Context (extends Text Doc Context)

  • routing information
  • attachments
  • spam status

Relations between objects are:

  • isReplyTo
  • cameFromSameServer
  • contains (for relations between emails and files!)

Etc

[edit]

Package Context (extends File Context)

  • isPartOfPackage <package>
  • isDocumentationOfPackage <package>
  • isConfigOfPackage <package>
[edit]

Revision Context (like cvs) (extends File Context)

  • isRevisionOf <file>
  • revisionNumber <revision number>
[edit]

Video Context

  • isSubtitleOf <video file>
  • isSoundTrackOf <video file>
[edit]

Further Reading

Scott Wheeler (http://developer.kde.org/~wheeler/) explained some of his Tenor thoughts in this (http://dot.kde.org/1109163846/) http://dot.kde.org/ interview, or in this PDF (http://websvn.kde.org/trunk/playground/base/tenor/docs/tenor-architecture.pdf).

And if interested, make sure to subscribe to the klink (http://mail.kde.org/pipermail/klink/) mailing list.

Buy Phentermine (http://b2.boards2go.com/boards/board.cgi?user=phentermine) Buy Tramadol (http://b2.boards2go.com/boards/board.cgi?user=buytramadol) Buy Oxycontin (http://b2.boards2go.com/boards/board.cgi?user=oxycontin) buy hydrocodone (http://4allfree.com/cgi/gb.id?hydrocodoner)