Information Retrieval Research – IDF/ TVT

Online marketing information can change quickly This article is 16 years and 133 days old, and the facts and opinions contained in it may be out of date.

Inverse Document Frequency/ Term Vector Theory

Well, I haven’t had as much chance I would like to do research on this latest update, but I did see Jake mention IDF in this post at SEW (which also has a ton of other good information (msg #50 from xan among others) in it. I’ve also heard it mentioned a handful of times, and figured it was high time to sit down and do some dedicated research on at least one of the speculative new technologies (LSI, Hilltop, and all the incredible information orion is bombarding us with these days)…while everything seems to point back to “quality relevant links”, I think it’s good to broaden one’s horizons and understand what determines “quality” in a changing environment.

Inverse Document Frequency
– term used to help determine the position of a term in a vector space model.
Formula for IDF:
IDF = log(D/d) where D = collection size and d = number of documents containing a given term.

weight of a term, w=tf*IDF

– see alsoTerm Vector Theory

According to orion at the above mentioned TVT thread, the formula for term vector theory is as follows:
w(i) = tf(i)*IDF = tf(i)*log[D/df(i)]


tf(i) = term frequency, number of times a term i occurs in a document IDF = Inverse document frequency = log[D/df(i)]
D = database size or number of documents available
df(i) = number of documents containing term i

I wish they’d do more pictures of this stuff for the slower people in the crowd:
Term Vector Theory Chart
More on Term Vector Theory at Webmasterworld and an Art vs. Science discussion – on term weight formula from HighRankings.

Not sure if I digested all this, but at least now I have some good bookmarks for later. My take is that you may start seeing more pages (if you haven’t already;)…that will show up without the actual keyphrase you searched on the pages that are returned in the serps.

More information about Todd Malicoat aka stuntdubl.

  • Brandon

    And I thought logs only occured naturally… in woods…

    *whew that joke is funny on so many levels

  • Soren

    We need someone to convert this theory into words without the math. Any volunteers?