Tags

What are Tags?

LibraryThing members tag their books with free-text keywords. A tag is a unique string, but can be more than one word long. Every tag has a separate id—so "history," "History" and "HISTORY" have three separate IDs.

Tag Combination/Aliasing

LibraryThing's software and members "combine" tags together, creating bundles of unique strings that are fundamentally the same thing. LibraryThing then picks the "best" of the tags—generally the most popular—and links (or "aliases") all the other versions to that version.

For example, "new york" and "New York" have been combined together, with "New York" picked as the winning form.

Tag aliasing extends beyond case. LibraryThing members combine tags with different spellings but the same meaning, such that "wwii," "world war 2," "ww2" and so forth. In this case, all these variants are "aliased to" one tag id.

We recommend that feed users display only the final, aliased form, but consider using both the final and intermediate forms in searching.

Tag Approval

LibraryThing staff "approve" and reject tags. Tags are approved or rejected based upon their usefulness overall, and especially for searching.

Approval generally requires that the tag is used a number of times—the minimum varies—and have an essentially clear meaning.

Criteria for rejection include:

  • Personal tags ("read in 2015", "Box 32")
  • Purely evaluative tags ("awesome")
  • Tags relating closely to a specific edition ("leather-bound," "audiobook", classification numbers)
  • Obscene tags ("shit")
  • Ambiguous tags ("Smith" is sometimes a character in a book, sometimes an author, sometimes the person they borrowed the book from, etc.)

We recommend most feed users display only "approved" tags. Approval was designed for our LibraryThing for Libraries product, and indicates the tag is probably useful and safe for display. Unapproved tags can be useful in improving search, but should not be used by themselves.

Audience Level

Audience level was developed for libraries to avoid showing certain hot-button, mostly sexual, tags (even if they had books that fit the tags). Audience level is hand set by LibraryThing staff, and is a number. The numbers correspond to MPAA movie ratings.

  • 10 = PG-13. e.g., erotic, teen sex, lingerie, food porn
  • 20 = R. e.g., kink, oral sex, fetishes
  • 30 = NC-17. e.g., porn, and any number of dirty words and phrases

Numbers apart from these three do not currently exist, but you should program any system to expect that they might in the future (i.e., ignore tags with an audience level above 20, not just with an audience level of 30).

Work to Tags

Filename: worktotags.xml

Sample Feed: worktotags_small.xml

Contents and Purpose

This is a catalog of the tags applied to all the works in the LibraryThing system.

It is normally used together with the "Tag Information" feed, below.

A simpler version of the work-to-tags information, restricted to approved tags and in the final, canonical form, can be found under "Simple Works to Tags," above.

Example

<worktotags>
    <work workcode="2">
        <tag id="302312" count="14">college football</tag>
        <tag id="931452" count="1">nebraska football</tag>
        <tag id="624418" count="1">civil law</tag>
        <tag id="3373" count="1">europe</tag>
        <tag id="1042723" count="1" aliasedtoid="18587">Reader</tag>
    </work>
    <work workcode="7">
        <tag id="84729" count="1" aliasedtoid="339">Classroom Library</tag>
        <tag id="715036" count="1" aliasedtoid="4877">Reference</tag>
    </work>
</worktotags>

Elements and Attributes

The basic structure should be clear from the example above.

<worktotags> is the top-level element.
  • <work> is repeated. Works that do not have any tags will not appear.
    • workcode="N" attr (required) As above, "workcode" means work id. The <work> elements are ordered low-to-high by workcode.
    • <tag> contains the text of the tag itself.
      • Note The text given in the <tag> is the non-aliased form of the tag. You probably want the aliased form. For this you need to ingest and process the “Tag Information” feed, below.
      • id="N" attr (required) This is the id of the tag (see "Concepts" above).
      • count="N" attr (required) This is the (cached) number of times the tag was applied to the work in question. The tags are ordered high-to-low by count.
      • aliasedto="N" attr This is the id of the tag to which the tag has been aliased to (see "Concepts," above).

Tag Information

Filename: taginfo.xml

Sample Feed: taginfo_small.xml

Contents and Purpose

This is a catalog of all tag ids. It lists each id, together with the id and text of the tag to which it has been aliased, if any.

It is normally used together with the "Tag Information" feed.

It also lists the total count for that tag and whether or not the tag has been "approved" by LibraryThing staff.

Example

<taginfo>
    <tag id="6">
        <text>children's</text>
        <totalcount>221742</totalcount>
        <approved>true</approved>
    </tag>
    ...
    <tag id="11">
        <text>childrens</text>
        <aliasedto id="6">children's</aliasedto>
        <totalcount>221742</totalcount>
    </tag>
    ...
    <tag id="34">
        <text>favorites</text>
        <totalcount>19762</totalcount>
        <approved>false</approved>
    </tag>
</taginfo>

Elements and Attributes

<taginfo> is the top-level element.
  • <tag> is repeated. Is has the required attribute:
    • id="N" attr is the numerical id of the tag.
    • <text> (required) The text of the tag itself.
    • <aliasedto> (optional) The text of the tag to which the tag has been aliased.
      • id="N" attr (required) is the id of the tag to which the tag has been aliased.
    • <totalcount> (optional) The number of times the tag has been used.
      • This number includes all tags that have been combined with the tag, not just the count for the particular tag.
    • <approved> (optional) Values are "true" and "false."
      • Approval includes all tags that have been combined with the tag, not just the approval of the particular tag.
    • <audiencelevel> (optional) See above.

Simple Works to Tags

Filename: worktotags_simple.xml

Sample Feed: worktotags_simple_small.xml

Contents and Purpose

This is a simplified catalog of all works and the tags applied to them, and should be suitable for most members.

Only approved tags are shown. Tags are shown in their final, aliased form (i.e., "fiction" and "Fiction" are combined, with "fiction" chosen as the text).

This file can be used without the "Tag Information" file.

Example

<worktotags>
    <work>
        <workcode>17</workcode>
        <taglist>
            <tag>
                <id>2</id>
                <text>fiction</text>
                <count>193</count>
            </tag>
            <tag>
                <id>142</id>
                <text>humor</text>
                <count>87</count>
            </tag>
            <tag>
                <id>630994</id>
                <text>realistic fiction</text>
                <count>74</count>
            </tag>
            <tag>
                <id>44773</id>
                <text>freckles</text>
                <count>70</count>
            </tag>
            ...
        </taglist>
    </work>
</worktotags>

Elements and Attributes

<worktotags> is the top-level element.
  • <work> (repeated) is the work and its tag.
    • <workcode> (required) is the work's id, called workcode.
    • <taglist> (required) contains the <tag> elements.
      • <tag> (repeated) is one tag.
        • <id> (required) is the tag id.
        • <text> (required) is the text of the tag.
        • <count> (required) is the number of times the tag was applied to the work.
        • <audiencelevel> (optional) See above under "Tag Information."

Tag Translations

Filename: tagtranslations.xml

Sample Feed: tagtranslations_small.xml

Contents and Purpose

This is a feed of LibraryThing "translations" for tags, based on Wikipedia data and LibraryThing members' curation.

They are not warranted to be always correct or complete. The most common problem is when the English shows up as the "translation." Member interest ensures that they are not often actually wrong.

Example

<tagtranslations>
    <tag id="10">
        <id>10</id>
        <text>wizards</text>
        <translationlist>
            <translation>
                <id>949063</id>
                <text>tovenaars</text>
                <lang>dut</lang>
            </translation>
            <translation>
                <id>9568134</id>
                <text>sorĉistoj</text>
                <lang>epo</lang>
            </translation>
            <translation>
                <id>9719312</id>
                <text>Velho</text>
                <lang>fin</lang>
            </translation>
            <translation>
                <id>1873655</id>
                <text>Sorciers</text>
                <lang>fre</lang>
            </translation>
            <translation>
                <id>433</id>
                <text>Zauberer</text>
                <lang>ger</lang>
            </translation>
            ...
        </translationlist>
    </tag>
</tagtranslations>

Elements and Attributes

<tagtranslations> is the top-level element.
  • <tag> (repeated) is the tag which is to be translated. It has the required attribute:
    • id="N" attr is the numerical id of the review.
    • <id> (required) is the tag's id (again).
    • <text> (required) is the text of the tag.
    • <translationlist> (required) contains the <translation> elements.
      • <translation> (repeated) is the translation for one language.
        • <id> (required) is the tag id of the translation. (All translations have their own tag id.)
        • <text> (required) is the text of the translation.
        • <lang> (required) is the language of the tag (see "Language abbreviations" in the Introduction).