
scientio has got something - using 'concept structures'
Computer Science | Linguistics | Computational Linguistics | Natural Language Processing | Concept Mapping | Document Similarity | WordNetEdmonds. 2007. Using concept structures for efficient document comparison and location. Conference Proceeding
I was so pleased to finally (and serendipitously, I might add) find a computer science article that describes what I was trying to do with my masters work, from outside the discipline.
This is a quick read. Spells out terms very clearly for non-adepts, so I found it to be quite accessible.

what are 'n-grams'
Information Technology | Computational Linguistics | Natural Language Processing | N-gramsFrom wikipedia: "An n-gram is a sub-sequence of n items from a given sequence." So the scope and granularity of their application matters greatly!
At the 'word-level', n-grams are constituted by successive groups of n words.
N-grams can be used (as in NLP) for 'efficient approximate matching.'...

reading for gist
Cognitive Science | Computer Science | Information Science | Computational Linguistics | Natural Language Processing | Gist | Reading | VisualizationGot to get going on my IAT 814 and 802 term projects.
Started with the idea of reading for gist, a model principally from
O'Halloran, K. (2003). Critical discourse analysis and language cognition. Edinburgh: Edinburgh University Press.
Wanted to get a deeper view on the concept, so did a google scholar search using combined SFUBC proxies and VPNs.
Found about 32 articles with the following strategy:
"reading for gist" -esl -teacher
Downloaded (and meta-scraped) 11...but only about 7 from this search.

googling 'textual analytics'
Computer Science | Information Technology | Linguistics | Computational Linguistics | Entity Recognition | Information ExtractionSource: http://www.textualanalytics.com/solutions/web/information_analysis/infop...
InfoProfiler beta - Features
Information sources:
InfoProfiler can deal with all content sources where text is available in a machine-readable format e.g., World Wide Web, SEC Filings, Proprietary databases, and company Intranets.
Extracts Text and Information pieces:
InfoProfiler has inbuilt text extraction capabilities and can extract information from a website as well as from a given set of websites. It understands the structure of a webpage and decides whether it is a news, forum, review, bulletin board or a blog. InfoProfiler can also remove unwanted text portions (such as, advertisements) from a given information source.

tex.tuals process
Information Science | Information Technology | Linguistics | Philosophy | Computational Linguistics | Natural Language Processing | Ontology | Argument Recognition | Context | Definition RecognitionAs I read through a text (I've remarked on the paper-to-digital conversion process elsewhere) I want to be able to highlight and capture whole swaths of text.
It is critically important to repurposing my captured snippets (bits, fits, blobs, fragments, portions) that I am able to recontextualize them easily.
How can this recontextualization happen?
First; the bibliographic information must be embedded in each bit.
Also, its relative location in the linear flow of the text must be recorded, so as to be able to quickly pull up various degrees of context around the bit.
Recent blog posts
- Note to self: Neatreceipts method
- 1239482 seconds since last panic
- Tag Folders (or maybe just smart folders) broken after 10.6.1 upgrade
- Hard to clean spots, No. 8: My Work folder
- Updating to DockSpaces 2.45 breaks FlowSpaces PLIST hacks
- Hard to clean spots, No. 7: My Databases folder
- Hard to clean spots, No. 6: My Documents folder
- Whats with these Stacks 'Drawers' anyhow?
- Hard to clean spots, No. 5: My Desktop
- Hard to clean spots, No. 4: My Applications Folder
bookmark
tuals 0.1 on del.icio.us