Semantic Restructuring is the pursuit of enlightenment, enlivenment, empowerment through the creative re-arranging of the building blocks of meaning. For a better description, Start Here.
| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | |||
| 5 | 6 | 7 | 8 | 9 | 10 | 11 |
| 12 | 13 | 14 | 15 | 16 | 17 | 18 |
| 19 | 20 | 21 | 22 | 23 | 24 | 25 |
| 26 | 27 | 28 | 29 | 30 |
| Months | ||
|---|---|---|
| Jan | Feb | Mar |
| Apr | May | Jun |
| Jul | Aug | Sep |
| Oct | Nov | Dec |
Bateson, books, cogling, context, CPB, embodiment, framing, I Ching, paradox, perception influence, prisdem, semantic punctuation, sensation, techniques, unconscious
I am curious about the growing reliance on google for statistical fact checks. More and more I'm seeing the best and the brightest using google searches to subtantiate their claims, as in this post at Language Log.
I recall, a good while back, when I was first learning about the vi/emacs holy war, there was the "EDITORs Sucks-Rules-O-Meter", which lead me to the Operating System SROM, which in turn lead me to the "Tool of Objective Truth" (formerly at zdnet uk, apparently the plug's been pulled on that one.) I can't imagine anyone ever thought the Sucks-Rules-O-Meter was actually measuring anything; rather, it counts reports. To say there are possible sampling issues with those reports is to put the matter mildly. And even if one were to argue that the sample is sufficiently controlled as to warrant drawing conclusions, the conclusions drawn would still be about the opinions of the universe of respondents sampled and in no way reflective of any objective, criteria based analysis of the tools in question (the kind of analysis that would show Windows can't suck as much as we linux lovers love to say; if it did there wouldn't be the steady flowing river of formerly-winDoze boxes on which to run linux.)(Put differently, what linux really rocks at is a.) making the most of hand-me-down machines, b.) implementing unix. The great majority of users don't give a good gosh darn about the OS; they just want their fancy typewriter/jukebox to work as advertised and not have to cope with any learning curve greater than that associated with setting the clock on a VHS recorder. For these users Linux is still a nightmare, Mac still a little obscure, and the gates virus rocks and rules supreme...however much that truth may turn our stomachs.)
Back to my concerns with using google as a representative sample of language use. The language refelcted in the gooble db is language filtered from the norm in some non-trivial ways. It is the language of literacy; try as we might to forget it, the language of writing is only superficially related to language in general; literacy is not a criterion for language use. As if this weren't enough of a skew toward elitism, this language of literacy is further filtered by the criteria of publication; even a publication in a blog betrays relevant socio-cultural memberships and status. Plus, and this is the one that most concerns me, how the heck do we know that the database itself is inviolate? Sure, there is no conceivable reason anyone would monkey with the data in the google db; the more accurately the google db reflects what the google spiders have found the more valuable that db is. Still, the assumption that there is no benefit to someone skewing the numbers isn't the same as conclusively establishing the validity of those numbers...validity, that is, of those numbers for the select sub-set of language use that those numbers represent.
Oh well; this kind of google searching is damned useful, a great start, better than nothing. I guess I'd just like to see the occasional disclaimer along the lines of, "The language represented by these google searches is skewed in favor of that dialect or social register associated with published writing, an elite subset of language use in general." The Operating System Sucks-Roles-O-Meter page has links to "Tool of Objective Truth" and "COMPLETELY UNSCIENTIFIC comparison of programming languages." Sadly, both of those links have fallen to link rot since my last visit, sometime in the 20th century. Still, even as busted links, they represent something simple but important: one of them tells it like it is, right up front.
Addendum
I wrote Geoffrey K. Pullum, the author of the relevant
language log post, trying to get trackback turned on for said post, here is an
excerpt of his reply:
I drew no statistical conclusions of the sort where sampling bias could arise; the numbers merely show that his claim is wildly, overwhelmingly wrong, a piece of unchecked hyperbole so extreme that it should never have been made...I don't see that the objections are worth raising: they are hypothetically relevant, to kinds of points that I didn't try to make.(Emphasis added: RL)
Entirely true and valid. I would have done well to add a disclaimer of my own that while I worry what casual readers of language log will make of such examples I am certain that the folks posting there are too sharp to miss such subtleties. That my little rant was such non-sequitor is plain embarassing.
[]
static link
writebacks: 1 (writeback = trackback +/- comment)
Duane wrote
On a related note
Did you catch the following http://www.frozennorth.org/C2011481421/E652809545/ it is a bit of response to http://www.syracuse.com/news/poststandard/index.ssf?/base/news-0/1093338972139211.xml That cast doubt upon the entire web as a reference thing