aaron's picture

scientio has got something - using 'concept structures'

| | | | | |

Edmonds. 2007. Using concept structures for efficient document comparison and location. Conference Proceeding

I was so pleased to finally (and serendipitously, I might add) find a computer science article that describes what I was trying to do with my masters work, from outside the discipline.

This is a quick read. Spells out terms very clearly for non-adepts, so I found it to be quite accessible.

aaron's picture

what are 'n-grams'

| | |

From wikipedia: "An n-gram is a sub-sequence of n items from a given sequence." So the scope and granularity of their application matters greatly!

At the 'word-level', n-grams are constituted by successive groups of n words.

N-grams can be used (as in NLP) for 'efficient approximate matching.'...

aaron's picture

reading for gist

| | | | | | |

Got to get going on my IAT 814 and 802 term projects.

Started with the idea of reading for gist, a model principally from

O'Halloran, K. (2003). Critical discourse analysis and language cognition. Edinburgh: Edinburgh University Press.

Wanted to get a deeper view on the concept, so did a google scholar search using combined SFUBC proxies and VPNs.

Found about 32 articles with the following strategy:

"reading for gist" -esl -teacher

Downloaded (and meta-scraped) 11...but only about 7 from this search.

aaron's picture

googling 'textual analytics'

| | | | |


InfoProfiler beta - Features

Information sources:

InfoProfiler can deal with all content sources where text is available in a machine-readable format e.g., World Wide Web, SEC Filings, Proprietary databases, and company Intranets.

Extracts Text and Information pieces:

InfoProfiler has inbuilt text extraction capabilities and can extract information from a website as well as from a given set of websites. It understands the structure of a webpage and decides whether it is a news, forum, review, bulletin board or a blog. InfoProfiler can also remove unwanted text portions (such as, advertisements) from a given information source.

aaron's picture

tex.tuals process

| | | | | | | | |

As I read through a text (I've remarked on the paper-to-digital conversion process elsewhere) I want to be able to highlight and capture whole swaths of text.

It is critically important to repurposing my captured snippets (bits, fits, blobs, fragments, portions) that I am able to recontextualize them easily.

How can this recontextualization happen?

First; the bibliographic information must be embedded in each bit.

Also, its relative location in the linear flow of the text must be recorded, so as to be able to quickly pull up various degrees of context around the bit.

Syndicate content   bookmark tuals 0.1 on

Bg Bottom