Skip to main content
Short URL(SRL) : http://srl.diigo.com/12ew

Shirky: Ontology is Overrated -- Categories, Links, and Tags

Popularity Report

Total Popularity Score: 0

Loading...
Loading...
Loading...
Loading...
Loading...
Loading...

Rank

Bookmark History

Saved by 194 people (-65 private), first by anonymouse user on 2006-03-02


Public Comment

on 2005-12-22 by jgentry

Good stuff.

on 2006-07-27 by davemorehouse

Clay Shirky's enlightening distillation of how tagging fundamentally differs from catgorization, and offers new organizational paradigms based on user-centric ontologies. Yeah, something like that.

on 2006-08-02 by jasonfleming73

The rise of user-developed classification.

on 2006-08-10 by worldnagata

面白そう。ただし英語。だから、面白“そう”。なんとかせねばと思いつつ。

on 2006-10-25 by rjjjsp

Ontology is Overrated: Categories, Links, and Tags This piece is based on two talks I gave in the spring of 2005 -- one at the O'Reilly ETech conference in March, entitled "Ontology Is Overrated", and one at the IMCExpo in April entitled "Folksonomies & T

on 2006-11-03 by pistos

This piece is based on two talks I gave in the spring of 2005 -- one at the O'Reilly ETech conference in March, entitled "Ontology Is OverRated", and one at the IMCExpo in April entitled "Folksonomies & Tags: The rise of user-developed classification." Th

on 2006-11-14 by mikeheth

file systems - Google vs. Yahoo

on 2006-12-03 by sholman

seminal essay by Clay Shirky

on 2007-09-04 by forestfortrees

This piece is based on two talks Clay gave in the spring of 2005 -- one at the O'Reilly ETech conference in March, entitled "Ontology Is Overrated", and one at the IMCExpo in April entitled "Folksonomies & Tags: The rise of user-developed classification."

on 2007-10-28 by dedlily

excellent discussion!!

on 2009-07-29 by grahamperrin

I should read this alongside Organizing projects and contexts in Chandler.

on 2009-07-30 by grahamperrin

on 2009-11-06 by anmelu

go pointe - der er ingen reol cyperspace

Public Sticky notes

We understand better than you how the world is organized, because we are trained professionals. So if you mistakenly think that Books and Literature are entertainment, we'll put a little flag up so we can set you right, but to see those links, you have to 'go' to where they 'are'

Highlighted by mstrohm

organic ways of organizing information

Highlighted by clakesnapster

Today I want to talk about categorization, and I want to convince you that a lot of what we think we know about categorization is wrong. In particular, I want to convince you that many of the ways we're attempting to apply categorization to the electronic world are actually a bad fit, because we've adopted habits of mind that are left over from earlier strategies. I also want to convince you that what we're seeing when we see the Web is actually a radical break with previous categorization strategies, rather than an extension of them. The second part of the talk is more speculative, because it is often the case that old systems get broken before people know what's going to take their place. (Anyone watching the music industry can see this at work today.) That's what I think is happening with categorization. What I think is coming instead are much more organic ways of organizing information than our current categorization schemes allow, based on two units -- the link, which can point to anything, and the tag, which is a way of attaching labels to links. The strategy of tagging -- free-form labeling, without regard to categorical constraints -- seems like a recipe for disaster, but as the Web has shown us, you can extract a surprising amount of value from big messy data sets.

Highlighted by kfortowsky

Google can decide what goes with what after hearing from the user, rather than trying to predict in advance what it is you need to know.

Highlighted by jgentry

letting individuals create value for one another, often without realizing it.

Highlighted by jgentry

Signal Loss from Expression

Highlighted by jgentry

That strategy of designing categories to cover possible cases in advance is what I'm primarily concerned with, because it is both widely used and badly overrated in terms of its value in the digital world.

Highlighted by jgentry

In a world where publishing is cheap, putting something out there says nothing about its quality. It's what happens after it gets published that matters.

Highlighted by jgentry

They also underestimate the loss from erasing difference of expression, and they overestimate loss from the lack of a thesaurus.

Highlighted by jgentry

The other big problem is that predicting the future turns out to be hard, and yet any classification system meant to be stable over time puts the categorizer in the position of fortune teller.

Highlighted by jgentry

Yahoo, faced with the possibility that they could organize things with no physical constraints, added the shelf back

Highlighted by jgentry

Yahoo is saying "We understand better than you how the world is organized,

Highlighted by jgentry

The essence of a book isn't the ideas it contains. The essence of a book is "book." Thinking that library catalogs exist to organize concepts confuses the container for the thing contained. The categorization scheme is a response to physical constraints on storage, and to people's inability to keep the location of more than a few hundred things in their mind at once.

Highlighted by jgentry

Tagging, by contrast, gets better with scale.

Highlighted by jgentry

Venn diagram

Highlighted by jgentry

The solution to this sort of signal loss is growth.

Highlighted by jgentry

You merge from the URLs, and then try and derive something about the categorization from there. This allows for partial, incomplete, or probabilistic merges that are better fits to uncertain environments -- such as the real world -- than rigid classification schemes.

Highlighted by jgentry

The Parable of the Ontologist, or, "There Is No Shelf" #

A little over ten years ago, a couple of guys out of Stanford launched a service called Yahoo that offered a list of things available on the Web. It was the first really significant attempt to bring order to the Web. As the Web expanded, the Yahoo list grew into a hierarchy with categories. As the Web expanded more they realized that, to maintain the value in the directory, they were going to have to systematize, so they hired a professional ontologist, and they developed their now-familiar top-level categories, which go to subcategories, each subcategory contains links to still other subcategories, and so on. Now we have this ontologically managed list of what's out there.

Highlighted by mwesch

many of the ways we're attempting to apply categorization to the electronic world are actually a bad fit, because we've adopted habits of mind that are left over from earlier strategies.

Highlighted by topyli

What I think is coming instead are much more organic ways of organizing information than our current categorization schemes allow, based on two units -- the link, which can point to anything, and the tag, which is a way of attaching labels to links.

Highlighted by topyli

organic ways of organizing information than our current categorization schemes allow, based on two units -- the link, which can point to anything, and the tag, which is a way of attaching labels to links.

Highlighted by mstrohm

an explicit specification of a conceptualization

Highlighted by mstrohm

What kinds of things exist or can exist in the world, and what manner of relations can those things have to each other

Highlighted by mstrohm

I want to argue that even the ontological ideal is a mistake. Even using theoretical perfection as a measure of practical success leads to misapplication of resources

Highlighted by mstrohm

organic ways of organizing information

Highlighted by clakesnapster

Browse versus search is a radical increase in the trust we put in link infrastructure

Highlighted by mstrohm

Browse says the people making the ontology, the people doing the categorization, have the responsibility to organize the world in advance.

Highlighted by mstrohm

It says nobody gets to tell you in advance what it is you need

Highlighted by mstrohm

When Does Ontological Classification Work Well?

Highlighted by mstrohm

  • Small corpus
  • Formal categories
  • Stable entities
  • Restricted entities
  • Clear edges
  • Highlighted by mstrohm

  • Expert catalogers
  • Authoritative source of judgment
  • Coordinated users
  • Expert users
  • Highlighted by mstrohm

    If you've got a large, ill-defined corpus, if you've got naive users, if your cataloguers aren't expert, if there's no one to say authoritatively what's going on, then ontology is going to be a bad strategy.

    Highlighted by mstrohm

    This is voodoo categorization, where acting on the model changes the world

    Highlighted by mstrohm

    it forces the categorizers to take on two jobs that have historically been quite hard: mind reading, and fortune telling. It forces categorizers to guess what their users are thinking, and to make predictions about the future.

    Highlighted by mstrohm

    assertion that restricting vocabularies improves signal assumes that that there's no signal in the difference itself

    Highlighted by mstrohm

    You can't collapse these categorizations without some signal loss

    Highlighted by mstrohm

    They're not merging at the category level. They're merging at the globally unique item level

    Highlighted by mstrohm

    The Web is mainly notable for two things -- the way it ignored most of the theories of hypertext and rich metadata, and how much better it works than any of the proposed alternatives

    Highlighted by mstrohm

    market logic, where you deal with individual motivation, but group value

    Highlighted by mstrohm

    Each individual categorization scheme is worth less than a professional categorization scheme. But there are many, many more of them.

    Highlighted by mstrohm

    Market logic allows many distinct points of view to co-exist, because it allows individuals to preserve their point of view,

    Highlighted by mstrohm

    Signal Loss from Expression

    Highlighted by mstrohm

    Tagging, by contrast, gets better with scale

    Highlighted by mstrohm

    If there is no shelf, then even imagining that there is one right way to organize things is an error.

    Highlighted by mstrohm

    the idea that the categorization is done after things are tagged is incredibly foreign to cataloguers

    Highlighted by mstrohm

    You don't merge tagging schemes at the category level

    Highlighted by mstrohm

    Today I want to talk about categorization, and I want to convince you that a lot of what we think we know about categorization is wrong. In particular, I want to convince you that many of the ways we're attempting to apply categorization to the electronic world are actually a bad fit, because we've adopted habits of mind that are left over from earlier strategies. I also want to convince you that what we're seeing when we see the Web is actually a radical break with previous categorization strategies, rather than an extension of them. The second part of the talk is more speculative, because it is often the case that old systems get broken before people know what's going to take their place. (Anyone watching the music industry can see this at work today.) That's what I think is happening with categorization. What I think is coming instead are much more organic ways of organizing information than our current categorization schemes allow, based on two units -- the link, which can point to anything, and the tag, which is a way of attaching labels to links. The strategy of tagging -- free-form labeling, without regard to categorical constraints -- seems like a recipe for disaster, but as the Web has shown us, you can extract a surprising amount of value from big messy data sets.

    Highlighted by florizal

    This allows for partial, incomplete, or probabilistic merges

    Highlighted by mstrohm

    Merges create partial overlap between tags, rather than defining tags as synonyms

    Highlighted by mstrohm

    We move from a binary choice between saying two tags are the same or different to the Venn diagram option of "kind of is/somewhat is/sort of is/overlaps to this degree"

    Highlighted by mstrohm

    This is a single user's tags

    Highlighted by mstrohm

    Once you expand your time scale to include the actual life of the categorization scheme itself, you recognize that the distinction between temporary and permanent is awfully vague. There isn't in fact a binary condition of a tag that can or cannot survive any kind of long-term examination.

    Highlighted by mstrohm

    Does the world make sense or do we make sense of the world?

    Highlighted by mstrohm

    value in aggregate

    Highlighted by mstrohm

    Critically, the semantics here are in the users, not in the system

    Highlighted by mstrohm

    del.icio.us has no idea what the tags mean.

    Highlighted by mstrohm

    The signal benefit of these systems is that they don't recreate the structured, hierarchical categorization so often forced onto us by our physical systems.

    Highlighted by mstrohm

    organic ways of organizing information

    Highlighted by clakesnapster

    organic ways of organizing information

    Highlighted by clakesnapster

    Today I want to talk about categorization, and I want to convince you that a lot of what we think we know about categorization is wrong. In particular, I want to convince you that many of the ways we're attempting to apply categorization to the electronic world are actually a bad fit, because we've adopted habits of mind that are left over from earlier strategies. I also want to convince you that what we're seeing when we see the Web is actually a radical break with previous categorization strategies, rather than an extension of them. The second part of the talk is more speculative, because it is often the case that old systems get broken before people know what's going to take their place. (Anyone watching the music industry can see this at work today.) That's what I think is happening with categorization. What I think is coming instead are much more organic ways of organizing information than our current categorization schemes allow, based on two units -- the link, which can point to anything, and the tag, which is a way of attaching labels to links. The strategy of tagging -- free-form labeling, without regard to categorical constraints -- seems like a recipe for disaster, but as the Web has shown us, you can extract a surprising amount of value from big messy data sets.

    Highlighted by florizal

    you merge individual contents, because we now have URLs as unique handles.

    Highlighted by mstrohm

    Clay Shirky's Writings About the Internet

    Highlighted by hanswobbe

    on 2009-05-18 by hanswobbe

    ...

    The list of factors making ontology a bad fit is, also, an almost perfect description of the Web -- largest corpus, most naive users, no global authority, and so on. The more you push in the direction of scale, spread, fluidity, flexibility, the harder it becomes to handle the expense of starting a cataloguing system and the hassle of maintaining it, to say nothing of the amount of force you have to get to exert over users to get them to drop their own world view in favor of yours.

    Highlighted by anmelu

    Ontology is Overrated: Categories, Links, and Tags

    Highlighted by wshis77

    Ontology is Overrated: Categories, Links, and Tags

    Highlighted by grahamperrin

    Ontology

    Highlighted by dmedelong

    spring of 2005

    Highlighted by grahamperrin

    This piece is based on two talks

    Highlighted by grahamperrin

    a heavily edited concatenation

    Highlighted by grahamperrin

    user-developed classification

    Highlighted by sallygla

    many of the ways we're attempting to apply categorization to the electronic world are actually a bad fit, because we've adopted habits of mind that are left over from earlier strategies.

    Highlighted by anmelu

    I also want to convince you that what we're seeing when we see the Web is actually a radical break with previous categorization strategies, rather than an extension of them. The second part of the

    Highlighted by anmelu

    Web is actually a radical break with previous categorization strategies, rather than an extension of them

    Highlighted by martinnguyen

    what we're seeing when we see the Web is actually a radical break with previous categorization strategies, rather than an extension of them.

    Highlighted by lindseybp

    much more organic ways of organizing information than our current categorization schemes allow, based on two units -- the link, which can point to anything, and the tag, which is a way of attaching labels to links. The strategy of tagging -- free-form labeling, without regard to categorical constraints -- seems like a recipe for disaster, but as the Web has shown us, you can extract a surprising amount of value from big messy data sets.

    Highlighted by anmelu

    tagging -- free-form labeling,

    Highlighted by sallygla

    The strategy of tagging -- free-form labeling, without regard to categorical constraints -- seems like a recipe for disaster, but as the Web has shown us, you can extract a surprising amount of value from big messy data sets.

    Highlighted by forestfortrees

    free-form labeling,

    Highlighted by dmedelong

    What is Ontology?

    Highlighted by dmedelong

    The main thread of ontology in the philosophical sense is the study of entities and their relations.

    Highlighted by forestfortrees

    The question ontology asks is: What kinds of things exist or can exist in the world, and what manner of relations can those things have to each other?

    Highlighted by neilp9

    The sense of ontology there is something like "an explicit specification of a conceptualization."

    Highlighted by forestfortrees

    The sense of ontology there is something like "an explicit specification of a conceptualization."

    Highlighted by martinnguyen

    And then there's ontological classification or categorization, which is organizing a set of entities into groups, based on their essences and possible relations

    Highlighted by neilp9

    Now, anyone who deals with categorization for a living will tell you they can never get a perfect system. In working classification systems, success is not "Did we get the ideal arrangement?" but rather "How close did we come, and on what measures?"

    Highlighted by clakesnapster

    periodic table of the elements is my vote for "Best. Classification. Evar."

    Highlighted by clakesnapster

    The periodic table of the elements is my vote for "Best. Classification. Evar."

    Highlighted by neilp9

    Lacking the right measurements, they assumed that gaseousness was an essential aspect -- literally, part of the essence -- of those elements.

    Highlighted by dedlily

    The periodic table of the elements is my vote for "Best. Classification. Evar."

    Highlighted by martinnguyen

    periodic table of the elements

    Highlighted by dmedelong

    If it's impossible to create a completely coherent categorization

    Highlighted by neilp9

    this is the Dewey Decimal System's categorization for religions of the world, which is the 200 category.

    Dewey, 200: Religion
    210 Natural theology
    220 Bible
    230 Christian theology
    240 Christian moral & devotional theology
    250 Christian orders & local church
    260 Christian social theology
    270 Christian church history
    280 Christian sects & denominations
    290 Other religions
    

    How much is this not the categorization you want in the 21st century?

    Highlighted by sallygla

    What's being optimized is number of books on the shelf. That's what the categorization scheme is categorizing.

    Highlighted by clakesnapster

    What's being optimized is number of books on the shelf. That's what the categorization scheme is categorizing.

    Highlighted by dedlily

    What's being optimized is number of books on the shelf. That's what the categorization scheme is categorizing. It's tempting to think that the classification schemes that libraries have optimized for in the past can be extended in an uncomplicated way into the digital world. This badly underestimates, in my view, the degree to which what libraries have historically been managing is an entirely different problem.

    Highlighted by sallygla

    What's being optimized is number of books on the shelf. That's what the categorization scheme is categorizing.

    Highlighted by martinnguyen

    Highlighted by dedlily

    on 2007-10-28 by dedlily

    the key point is non overlapping

    on 2007-11-26 by sallygla

    Very true. A strength and goal of LCSH

    organized into non-overlapping categories

    Highlighted by clakesnapster

    It is organized into non-overlapping categories that get more detailed at lower and lower levels

    Highlighted by dedlily

    any concept is supposed to fit in one category and in no other categories

    Highlighted by clakesnapster

    he system is really built, is designed to minimize seek time on shelves.

    Highlighted by martinnguyen

    What's being optimized is number of books on the shelf

    Highlighted by dmedelong

    The categorization scheme is a response to physical constraints on storage

    Highlighted by dedlily

    frailty of human memory

    Highlighted by sallygla

    a book can be about several things at once.

    Highlighted by martinnguyen

    The essence of a book is "book."

    Highlighted by dmedelong

    a book has to be declared to be about some main thing

    Highlighted by martinnguyen

    People have been freaking out about the virtuality of data for decades, and you'd think we'd have internalized the obvious truth: there is no shelf. In the digital world, there is no physical constraint that's forcing this kind of organization on us any longer. We can do without it, and you'd think we'd have learned that lesson by now.

    Highlighted by anmelu

    As the Web expanded more they realized that, to maintain the value in the directory, they were going to have to systematize, so they hired a professional ontologist, and they developed their now-familiar top-level categories, which go to subcategories, each subcategory contains links to still other subcategories, and so on. Now we have this ontologically managed list of what's out there.

    Highlighted by neilp9

    the obvious truth: there is no shelf

    Highlighted by rickdude

    Yahoo, faced with the possibility that they could organize things with no physical constraints, added the shelf back.

    Highlighted by martinnguyen

    a priori organization

    Highlighted by martinnguyen

    oth of those explanations may have been true at different times and in different measures, but the effect was to override the users' sense of where things ought to be, and to insist on the Yahoo view instead.

    Highlighted by neilp9

    It's easy to see how the Yahoo hierarchy maps to technological constraints as well as physical ones. The constraints in the Yahoo directory describes both a library categorization scheme and, obviously, a file system

    Highlighted by neilp9

    There is no file system. The links alone are enough.

    Highlighted by dedlily

    Berners-Lee

    Highlighted by dmedelong

    Highlighted by martinnguyen

    They missed the end of this progression, which is that, if you've got enough links, you don't need the hierarchy anymore. There is no shelf. There is no file system. The links alone are enough

    Highlighted by neilp9

    One reason Google was adopted so quickly when it came along is that Google understood there is no shelf, and that there is no file system. Google can decide what goes with what after hearing from the user, rather than trying to predict in advance what it is you need to know.

    Highlighted by brantles

    One reason Google was adopted so quickly when it came along is that Google understood there is no shelf, and that there is no file system.

    Highlighted by martinnguyen

    Browse versus search is a radical increase in the trust we put in link infrastructure

    Highlighted by dedlily

    Browse versus search is a radical increase in the trust we put in link infrastructure, and in the degree of power derived from that link structure. Browse says the people making the ontology, the people doing the categorization, have the responsibility to organize the world in advance. Given this requirement, the views of the catalogers necessarily override the user's needs and the user's view of the world. If you want something that hasn't been categorized in the way you think about it, you're out of luck.

    Highlighted by brantles

    The search paradigm says the reverse. It says nobody gets to tell you in advance what it is you need. Search says that, at the moment that you are looking for it, we will do our best to service it based on this link structure, because we believe we can build a world where we don't need the hierarchy to coexist with the link structure.

    Highlighted by brantles

    Browse versus search is a radical increase in the trust we put in link infrastructure, and in the degree of power derived from that link structure.

    Highlighted by martinnguyen

    there are a number of cases where you get significant value out of not categorizing

    Highlighted by grahamperrin

    the critical question: Is categorization a good idea?

    Highlighted by grahamperrin

    Let's say I need every Web page with the word "obstreperous" and "Minnesota" in it. You can't ask a cataloguer in advance to say "Well, that's going to be a useful category, we should encode that in advance." Instead, what the cataloguer is going to say is, "Obstreperous plus Minnesota! Forget it, we're not going to optimize for one-offs like that." Google, on the other hand, says, "Who cares? We're not going to tell the user what to do, because the link structure is more complex than we can read, except in response to a user query."

    Highlighted by neilp9

    When Does Ontological Classification Work Well? #

    Highlighted by clakesnapster

    Browse versus search is a radical increase in the trust we put in link infrastructure, and in the degree of power derived from that link structure. Browse says the people making the ontology, the people doing the categorization, have the responsibility to organize the world in advance. Given this requirement, the views of the catalogers necessarily override the user's needs and the user's view of the world. If you want something that hasn't been categorized in the way you think about it, you're out of luck.

    The search paradigm says the reverse. It says nobody gets to tell you in advance what it is you need. Search says that, at the moment that you are looking for it, we will do our best to service it based on this link structure, because we believe we can build a world where we don't need the hierarchy to coexist with the link structure.

    Highlighted by neilp9

    Highlighted by martinnguyen

    Google understood there is no shelf

    Highlighted by dmedelong

    there is no shelf, and that there is no file system. Google can decide what goes with what after hearing from the user, rather than trying to predict in advance what it is you need to know.

    Highlighted by carlaarena

    versus search is a radical increase in the trust we put in link infrastructure, and in the degree of power derived from that link structure. Browse
    says the people making the ontology, the people doing the categorization, have the responsibility to organize the world in advance. Given this requirement, the views of the catalogers necessarily override the user's needs and the user's view of the world. If you want something that hasn't been categorized in the way

    Highlighted by anmelu

    Browse versus search is a radical increase in the trust we put in link infrastructure, and in the degree of power derived from that link structure. Browse
    says the

    Highlighted by anmelu

    says the people making the ontology, the people doing the categorization, have the responsibility to organize the world in advance. Given this requirement, the views of the catalogers necessarily override the user's needs and the user's view of the world. If you want something that hasn't been categorized in the way you think about it, you're out of luck.

    Highlighted by anmelu

    Domain to be Organized

    • Small corpus
    • Formal categories
    • Stable entities
    • Restricted entities
    • Clear edges

    Highlighted by clakesnapster

    When Does Ontological Classification Work Well?

    Highlighted by martinnguyen

    have the responsibility to organize the world in advance.

    Highlighted by dmedelong

    When people were offered search and categorization side-by-side, fewer and fewer people were using categorization to find things.

    Highlighted by neilp9

    Domain to be Organized

    • Small corpus
    • Formal categories
    • Stable entities
    • Restricted entities
    • Clear edges

    Highlighted by martinnguyen

    at the moment that you are looking for it, we will do our best to service it based on this link structure

    Highlighted by dedlily

    Participants

    • Expert catalogers
    • Authoritative source of judgment
    • Coordinated users
    • Expert users

    Highlighted by clakesnapster

    Domain to be Organized

    • Small corpus
    • Formal categories
    • Stable entities
    • Restricted entities
    • Clear edges

    Highlighted by neilp9

    The more of those characteristics that are true, the better a fit ontology is likely to be.

    Highlighted by martinnguyen

    Participants

    • Expert catalogers
    • Authoritative source of judgment
    • Coordinated users
    • Expert users

    Highlighted by martinnguyen

    some characteristics where ontological classification doesn't work well":

    Highlighted by clakesnapster

    Domain

    • Large corpus
    • No formal categories
    • Unstable entities
    • Unrestricted entities
    • No clear edges

    Participants

    • Uncoordinated users
    • Amateur users
    • Naive catalogers
    • No Authority

    Highlighted by clakesnapster

    Users have a terrifically hard time guessing how something they want will have been categorized in advance, unless they have been educated about those categories in advance as well, and the bigger the user base, the more work that user education is.

    Highlighted by martinnguyen

    "Here are some characteristics where ontological classification doesn't work well":

    Domain

    • Large corpus
    • No formal categories
    • Unstable entities
    • Unrestricted entities
    • No clear edges

    Participants

    • Uncoordinated users
    • Amateur users
    • Naive catalogers
    • No Authority

    Highlighted by martinnguyen

    You can also turn that list around. You can say "Here are some characteristics where ontological classification doesn't work well":

    Domain

    • Large corpus
    • No formal categories
    • Unstable entities
    • Unrestricted entities
    • No clear edges

    Participants

    • Uncoordinated users
    • Amateur users
    • Naive catalogers
    • No Authority

    Highlighted by neilp9

    classification scheme that works

    Highlighted by dmedelong

    The list of factors making ontology a bad fit is, also, an almost perfect description of the Web -- largest corpus, most naive users, no global authority, and so on. The more you push in the direction of scale, spread, fluidity, flexibility, the harder it becomes to handle the expense of starting a cataloguing system and the hassle of maintaining it, to say nothing of the amount of force you have to get to exert over users to get them to drop their own world view in favor of yours.

    Highlighted by martinnguyen

    This is voodoo categorization, where acting on the model changes the world

    Highlighted by martinnguyen

    categorizers to take on two jobs that have historically been quite hard: mind reading, and fortune telling.

    Highlighted by clakesnapster

    eople doing the categorizing believe, even if only unconciously, that naming the world changes it.

    Highlighted by martinnguyen

    The list of factors making ontology a bad fit is, also, an almost perfect description of the Web -- largest corpus, most naive users, no global authority, and so

    Highlighted by anmelu

    One of the biggest problems with categorizing things in advance is that it forces the categorizers to take on two jobs that have historically been quite hard: mind reading, and fortune telling. It forces categorizers to guess what their users are thinking, and to make predictions about the future.

    Highlighted by martinnguyen

    The reason we don't know whether or not Buffy, The Vampire Slayer is science fiction, for example, is because there's no one who can say definitively yes or no. In environments where there's no authority and no force that can be applied to the user, it's very difficult to support the voodoo style of organization

    Highlighted by neilp9

    thesaurus

    Highlighted by martinnguyen

    One of the biggest problems with categorizing things in advance is that it forces the categorizers to take on two jobs that have historically been quite hard: mind reading, and fortune telling. It forces categorizers to guess what their users are thinking, and to make predictions about the future.

    Highlighted by neilp9

    Some people say they're interested in movies. Some people say they're interested in film. Some people say they're interested in cinema.

    Highlighted by martinnguyen

    You can't collapse these categorizations without some signal loss.

    Highlighted by clakesnapster

    Those terms actually encode different things, and the assertion that restricting vocabularies improves signal assumes that that there's no signal in the difference itself, and no value in protecting the user from too many matches.

    Highlighted by martinnguyen

    naming the world changes it

    Highlighted by dmedelong

    The other big problem is that predicting the future turns out to be hard, and yet any classification system meant to be stable over time puts the categorizer in the position of fortune teller.

    Highlighted by sallygla

    You can't collapse these categorizations without some signal loss.

    Highlighted by martinnguyen

    underestimate the loss from erasing difference of expression, and they overestimate loss from the lack of a thesaurus.

    Highlighted by martinnguyen

    The movie people don't want to hang out with the cinema people." Those terms actually encode different things, and the assertion that restricting vocabularies improves signal assumes that that there's no signal in the difference itself, and no value in protecting the user from too many matches.

    Highlighted by neilp9

    LiveJournal makes absolutely no attempt to enforce solidarity or a thesaurus or a minimal set of terms, no check-box, no drop-box, just free-text typing. Some people say they're interested in movies. Some people say they're interested in film. Some people say they're interested in cinema.

    Highlighted by dedlily

    Cities are real. They are real, physical facts. Countries are social fictions. It is much easier for a country to disappear than for a city to disappear, so when you're saying that the small thing is contained by the large thing, you're actually mixing radically different kinds of entities. We pretend that 'country' refers to a physical area the same way 'city' does, but it's not true, as we know from places like the former Yugoslavia.

    Highlighted by martinnguyen

    You can't collapse these categorizations without some signal loss

    Highlighted by dedlily

    magine a world where everything can have a unique identifier.

    Highlighted by martinnguyen

    And once you can do that, anyone can label those pointers, can tag those URLs, in ways that make them more valuable, and all without requiring top-down organization schemes. And this -- an explosion in free-form labeling of links, followed by all sorts of ways of grabbing value from those labels -- is what I think is happening now.

    Highlighted by martinnguyen

    The presence of unique labels means that merging libraries doesn't require merging categorization schemes.

    Highlighted by dedlily

    Now imagine a world where everything can have a unique identifier

    Highlighted by dedlily

    tags enable a huge amount of user-produced organizational value, at vanishingly small cost.

    Highlighted by martinnguyen

    "Each individual categorization scheme is worth less than a professional categorization scheme. But there are many, many more of them."

    Highlighted by grahamperrin

    And if you can find any way to create value from combining myriad amateur classifications over time, they will come to be more valuable than professional categorization schemes, particularly with regards to robustness and cost of creation.

    Highlighted by martinnguyen

    individual differences don't have to be homogenized.

    Highlighted by martinnguyen

    The addition of a few simple labels hardly seems so momentous, but the surprise here, as so often with the Web, is the surprise of simplicity. Tags are important mainly for what they leave out. By forgoing formal classification, tags enable a huge amount of user-produced organizational value, at vanishingly small cost.

    Highlighted by neilp9

    Well-managed, well-groomed organizational schemes get worse with scale

    Highlighted by jhknight

    "Each individual categorization scheme is worth less than a professional categorization scheme. But there are many, many more of them." If you find a way to make it valuable to individuals to tag their stuff, you'll generate a lot more data about any given object than if you pay a professional to tag it once and only once. And if you can find any way to create value from combining myriad amateur classifications over time, they will come to be more valuable than professional categorization schemes, particularly with regards to robustness and cost of creation.

    Highlighted by neilp9

    The signal loss in traditional categorization schemes comes from compressing things into a restricted number of categories

    Highlighted by martinnguyen

    Tags are important mainly for what they leave out

    Highlighted by dedlily

    The solution to this sort of signal loss is growth. Well-managed, well-groomed organizational schemes get worse with scale, both because the costs of supporting such schemes at large volumes are prohibitive, and, as I noted earlier, scaling over time is also a serious problem. Tagging, by contrast, gets better with scale.

    Highlighted by martinnguyen

    In a world where publishing is expensive, the act of publishing is also a statement of quality -- the filter comes before the publication. In a world where publishing is cheap, putting something out there says nothing about its quality. It's what happens after it gets published that matters.

    Highlighted by martinnguyen

    The Web has an editor, it's everybody.

    Highlighted by martinnguyen

    "Each individual categorization scheme is worth less than a professional categorization scheme. But there are many, many more of them."

    Highlighted by dedlily

    robustness and cost of creation

    Highlighted by dedlily

    Highlighted by martinnguyen

    The signal loss in traditional categorization schemes comes from compressing things into a restricted number of categories. With tagging, when there is signal loss, it comes from people not having any commonality in talking about things. The loss is from the multiplicity of points of view, rather than from compression around a single point of view. But in a world where enough points of view are likely to provide some commonality, the aggregate signal loss falls with scale in tagging systems, while it grows with scale in systems with single points of view.

    Highlighted by neilp9

    "We are Yahoo We do not have biases. This is just how the world is. The world is organized into a dozen categories.

    Highlighted by dmedelong

    The solution to this sort of signal loss is growth. Well-managed, well-groomed organizational schemes get worse with scale, both because the costs of supporting such schemes at large volumes are prohibitive, and, as I noted earlier, scaling over time is also a serious problem. Tagging, by contrast, gets better with scale. With a multiplicity of points of view the question isn't "Is everyone tagging any given link 'correctly'", but rather "Is anyone tagging it the way I do?" As long as at least one other person tags something they way you would, you'll find it

    Highlighted by neilp9

    The Filtering is Done Post Hoc - There's an analogy here with every journalist who has ever looked at the Web and said "Well, it needs an editor." The Web has an editor, it's everybody. In a world where publishing is expensive, the act of publishing is also a statement of quality -- the filter comes before the publication. In a world where publishing is cheap, putting something out there says nothing about its quality. It's what happens after it gets published that matters. If people don't point to it, other people won't read it. But the idea that the filtering is after the publishing is incredibly foreign to journalists.

    Highlighted by lindseybp

    We move from a binary choice between saying two tags are the same or different to the Venn diagram option of "kind of is/somewhat is/sort of is/overlaps to this degree".

    Highlighted by martinnguyen

    The loss is from the multiplicity of points of view, rather than from compression around a single point of view. But in a world where enough points of view are likely to provide some commonality, the aggregate signal loss falls with scale in tagging systems, while it grows with scale in systems with single points of view.

    Highlighted by dedlily

    The solution to this sort of signal loss is growth

    Highlighted by dedlily

    question isn't "Is everyone tagging any given link 'correctly'", but rather "Is anyone tagging it the way I do?

    Highlighted by dedlily

    using a thesaurus to force everyone's tags into tighter synchrony would actually worsen the noise you'll get with your signal. If there is no shelf, then even imagining that there is one right way to organize things is an error.

    Highlighted by dedlily

    on 2007-10-28 by dedlily

    IMPORTANT. see rizoma

    Merges create partial overlap between tags, rather than defining tags as synonyms. Instead of saying that any given tag "is" or "is not" the same as another tag, del.icio.us is able to recommend related tags by saying "A lot of people who tagged this 'Mac' also tagged it 'OSX'." We move from a binary choice between saying two tags are the same or different to the Venn diagram option of "kind of is/somewhat is/sort of is/overlaps to this degree".

    Highlighted by neilp9

    The existence of an odd or unusual tag is a problem if it's the only way a given link has been tagged, or if there is no way for a user to avoid that tag

    Highlighted by dedlily

    by letting users tag URLs and then aggregating those tags, we're going to be able to build alternate organizational systems, systems that, like the Web itself, do a better job of letting individuals create value for one another, often without realizing it.

    Highlighted by tomgeorge2348

    Organization Goes Organic

    Highlighted by martinnguyen

    We are moving away from binary categorization -- books either are or are not entertainment -- and into this probabilistic world,

    Highlighted by martinnguyen

    It's all dependent on human context. This is what we're starting to see with del.icio.us, with Flickr, with systems that are allowing for and aggregating tags. The signal benefit of these systems is that they don't recreate the structured, hierarchical categorization so often forced onto us by our physical systems. Instead, we're dealing with a significant break -- by letting users tag URLs and then aggregating those tags, we're going to be able to build alternate organizational systems, systems that, like the Web itself, do a better job of letting individuals create value for one another, often without realizing it.

    Highlighted by avanelk

    the semantics here are in the users, not in the system

    Highlighted by martinnguyen

    The tag overlap is in the system, but the tag semantics are in the users.

    Highlighted by michaeltwofish

    We are moving away from binary categorization -- books either are or are not entertainment -- and into this probabilistic world, where N% of users think books are entertainment. It may well be that within Yahoo, there was a big debate about whether or not books are entertainment. But they either had no way of reflecting that debate or they decided not to expose it to the users. What instead happened was it became an all-or-nothing categorization, "This is entertainment, this is not entertainment." We're moving away from that sort of absolute declaration, and towards being able to roll up this kind of value by observing how people handle it in practice.

    It comes down ultimately to a question of philosophy. Does the world make sense or do we make sense of the world? If you believe the world makes sense, then anyone who tries to make sense of the world differently than you is presenting you with a situation that needs to be reconciled formally, because if you get it wrong, you're getting it wrong about the real world.

    Highlighted by neilp9

    It's all dependent on human context. This is what we're starting to see with del.icio.us, with Flickr, with systems that are allowing for and aggregating tags. The signal benefit of these systems is that they don't recreate the structured, hierarchical categorization so often forced onto us by our physical systems. Instead, we're dealing with a significant break -- by letting users tag URLs and then aggregating those tags, we're going to be able to build alternate organizational systems, systems that, like the Web itself, do a better job of letting individuals create value for one another, often without realizing it.

    Highlighted by lindseybp

    Organization Goes Organic

    Highlighted by dedlily

    It comes down ultimately to a question of philosophy. Does the world make sense or do we make sense of the world? If you believe the world makes sense, then anyone who tries to make sense of the world differently than you is presenting you with a situation that needs to be reconciled formally, because if you get it wrong, you're getting it wrong about the real world.

    Highlighted by dmedelong

    without a goal

    Highlighted by dedlily

    the semantics here are in the users, not in the system.

    Highlighted by carlaarena

    the semantics here are in the users, not in the system

    Highlighted by dedlily

    The tag overlap is in the system, but the tag semantics are in the users. This is not a way to inject linguistic meaning into the machine.

    Highlighted by dedlily

    It's all dependent on human context

    Highlighted by carlaarena

    The signal benefit of these systems is that they don't recreate the structured, hierarchical categorization so often forced onto us by our physical systems. Instead, we're dealing with a significant break -- by letting users tag URLs and then aggregating those tags, we're going to be able to build alternate organizational systems, systems that, like the Web itself, do a better job of letting individuals create value for one another, often without realizing it.

    Highlighted by carlaarena