aaron's picture

scientio has got something - using 'concept structures'

| | | | | |

Edmonds. 2007. Using concept structures for efficient document comparison and location. Conference Proceeding

I was so pleased to finally (and serendipitously, I might add) find a computer science article that describes what I was trying to do with my masters work, from outside the discipline.

This is a quick read. Spells out terms very clearly for non-adepts, so I found it to be quite accessible.

aaron's picture

what are 'n-grams'

| | |

From wikipedia: "An n-gram is a sub-sequence of n items from a given sequence." So the scope and granularity of their application matters greatly!

At the 'word-level', n-grams are constituted by successive groups of n words.

N-grams can be used (as in NLP) for 'efficient approximate matching.'...

aaron's picture

visualizing textual clusters

| | | | | | | |

I started my search on google scholar, to identify recent articles on this topic.

"visualizing text clusters" produced 0 hits.

"visualizing clusters" produced 120 hits. (32 post 2006)

Found one great one:

Chen, K. & Liu, L. (2006). iVIBRATE: Interactive visualization-based framework for clustering large datasets. ACM Transactions on Information Systems (TOIS), 24, 245-294.
Georgia Tech seems to put out a lot of good stuff.

Then I decided to change tracks; to see if 'text content' or document clustering comes up:

aaron's picture

reading for gist

| | | | | | |

Got to get going on my IAT 814 and 802 term projects.

Started with the idea of reading for gist, a model principally from

O'Halloran, K. (2003). Critical discourse analysis and language cognition. Edinburgh: Edinburgh University Press.

Wanted to get a deeper view on the concept, so did a google scholar search using combined SFUBC proxies and VPNs.

Found about 32 articles with the following strategy:

"reading for gist" -esl -teacher

Downloaded (and meta-scraped) 11...but only about 7 from this search.

aaron's picture

tex.tuals process

| | | | | | | | |

As I read through a text (I've remarked on the paper-to-digital conversion process elsewhere) I want to be able to highlight and capture whole swaths of text.

It is critically important to repurposing my captured snippets (bits, fits, blobs, fragments, portions) that I am able to recontextualize them easily.

How can this recontextualization happen?

First; the bibliographic information must be embedded in each bit.

Also, its relative location in the linear flow of the text must be recorded, so as to be able to quickly pull up various degrees of context around the bit.

Syndicate content   bookmark tuals 0.1 on

Bg Bottom