|
Scholarly Link Specification Framework S-Link-S
|
Outline
Scholarly
Link Specification Framework (S-Link-S)
Purpose
The
Scholarly Link Specification (S-Link-S) Framework is designed to facilitate
reference linking to diverse targets. Previously, a database or journal publisher or library wanting to make links
to an electronic resource had to first work out a linking agreement, then work out a method to interchange
linking data, and finally have programmers implement the links. S-Link-S streamlines this process by providing
a well-defined syntax and vocabulary for the exchange of the necessary information. A software module can then implement
reference linking for a large number of target sites using a single software module. S-Link-S is widely used to link to publisher and aggregator sites
from libraries and link servers.
OCLC Openly
Informatics has developed and is using the S-Link-S specification as a basis for
1Cate Linking Engine software and related services to serve the scholarly information industry.
This specification was
made public from the very beginning, in 1998, in the form of a draft to seek comment and criticism from
potential users of the software.
Openly's 1Cate software implements S-Link-S is available for licensing.
The Journalseek database icludes a library of over a thousand S-Link-S linking templates, and is also available for licensing.
This
version
This public draft was published on July 11 , 2006 http://nj.oclc.org/SLinkS/SLinkS-20060711.html
Changes are detailed below.
A previous version was published on September 28 , 2005 http://nj.oclc.org/SLinkS/SLinkS-20050928.html
The
most recent version of this document can be found at
http://nj.oclc.org/SLinkS/SLinkS.html
The S-Link-S template language, as described in its XML document type definition, has reached version 1.15, and can be considered stable enough to serve as a basis for software implementations.
Please note the statement of Copyright and Permitted Use of this document.
Design
The
S-Link-S framework has two components,
- a URL templating language
- a metadata vocabulary.
The URL templating language tells how to construct a URL based on bibliographic data, while
the metadata give information about what the URL leads to and how it may be used.
The templating vocabulary is expressed using XML syntax
conforming to an XML DTD, or Document Type Definition. The metadata is
expressed using RDF (Resource Description Framework) syntax
using a vocabulary defined in
an RDF Schema. At present, the RDF metadata component of S-Link-S is not being actively maintained.
Software based on the S-Link-S specification can take bibliographic data and return a URL, or a POST form based on S-Link-S data.
Before discussing the workings of S-Link-S in detail, it's useful to sketch out the overall architectures of some example systems that could make use of S-Link-S.
The first example is pictured below. Here, it is imagined that S-Link-S specifications are collected and maintained in a centralized clearinghouse (such as Journalseek).
To link references, Publisher A sends reference data to a S-Link-S processing engine. The processing engine uses a database of S-Link-S specifications to construct links
pertaining to the references, and then returns them to the publisher. The publisher then provides these links to its users.
This model assumes that the links are relatively static, perhaps needing monthly or yearly updating. Since the links are static, the processing engine has time to look up
individual articles from databases to fully resolve indirect links. Publishers wanting to specify link methods for their
publications only have to deal with the clearinghouse, not all the other publishers.

In the second example, Publisher A has built or licensed a database of S-Link-S specifications. Included in these
might be specifications obtained privately from another publisher. This publisher serves his content out of a database, building web pages on the fly, and
uses a S-Link-S Engine to dynamically construct links. The dynamic linking allows this publisher to deliver links that expire in a few hours, which his agreement with Publisher C requires.

In a similar configuration, a 3rd party, such as a library, may operate a dynamic linking service. This vision for a Scholarly linking environment has largely come to pass with the advent of the OpenURL Link-Servers deployed by libraries. The 1Cate Link-Server is an example of such a use.
In the third example, Publisher B has licensed the full text of a journal to a library. Publisher A includes a generic link to an article in Publisher B's journal.
The library's users access the web through a proxy server equipped with a S-Link-S based filter. When User A follows the citation link in publisher A's journal,
the S-Link-S filter recognizes it as one that the library has subscribed to. An enhanced HTTP request is then forwarded to publisher B. User A
gets full access to the journal without even knowing that her library has intervened on her behalf. Here a generic URL might be http://www.publisher.com/vol1/page35 and "enhanced" URL's might be http://harvard:password@www.publisher.com/vol1/page35 (password added) or http://www.library.edu/www.publisher.com/vol1/page35 (local holding).

Link
Templating
The
basic idea for URL template strings is that most articles can be accessed using
a template in which field place-holders are replaced by bibliographic data strings.
Some simple manipulations of the resulting string may be required for
generation of the final URL.
We
represent the place-holders for bibliographic data items using XML general entities, which start with
"&" and end with ";". For example, if the volume number is used in the URL, the volume number is denoted by &volume; Entity names are case
sensitive.
Manipulations
of strings are denoted using element mark-up of the template string. For example, to pad the volume number with 0's to make 3 characters, you would write
<pad padChar="0" length="3">&volume;</pad>
Place-holders for functions of the bibliographic data are denoted using empty elements. These start with "<" and end with "/>".
For example, the place-holder for an ISO format string (YYYY-MM-DD) formed from the publication date given by the bibliographics data is
<parsedDate/>
In this example of a S-Link-S template, we model a journal at http://www.publisher.com/ in which articles have URL's based on the volume and page
| Volume |
Start Page |
URL |
| 3 |
25 |
http://www.publisher.com/003/25/ |
| 10 |
485 |
http://www.publisher.com/010/485/ |
<?xml version="1.0"?>
<!DOCTYPE slinks SYSTEM "slinks.dtd">
<slinks ID="example">
<URL>http://www.publisher.com/<pad padChar="0" length="3">&volume;</pad>/&startPage;/</URL>
</slinks>
|
|
|
XML is a syntax for structural markup of text.
A S-Link-S template has a top level element, slinks, which contains all the template information.
In our example, "<slinks ID="example">" denotes the beginning of a template element with the identifier "example".
Templates need to have an "ID" so that they can be referred to in statements like "Use the template with ID=example to make links to the journal with ISSN=1234-5678"
"</slinks>" signifies the end of the S-Link-S template.
If you're just skimming the specification, you should skip ahead to the description of metadata, as the next section starts to describe
the S-Link-S Template elements in gory technical detail.
If you're studying the details of the specification, you'll want to look at the DTD (Document Type Definition), which is available at http://nj.oclc.org/SLinkS/slinks.dtd. There is also an HTML version.
The next sections go through the DTD, element by element and entity by entity, and explain their purposes.
Top
level container elements
Elements are chunks of text marked by tags enclosed in ">" and "<".
- <slinks>
- This
is the top level element of the linking template. It is referred to in metadata
using its ID attribute. It can contain "var" elements, "lookUpTable" elements, "scratch" elements, a "DOi" element, a "URL" element, a "postArgs" element, a "cookie" element, a "notRequired" element and "locator" elements, in that order. The order is chosen to enable single-pass parsing.
- attributes:
-
- ID
- An ID used to refer to the S-Link-S template. (required)
- vers
- The version of S-Link-S . Should be "1" for the present specification.
- complete
- A complete template is self-contained and one can be used without a network
connection; it needs no on-line look-ups or digital signatures. This attribute can be inferred from parsing the file; the attribute is provided to allow applications to store the result for the benefit of subsequent parsing. (yes | no) optional.
- resultType
- a place to put information about what the template resolves to,
i.e. "article", "abstract", "search", "volume", "issue", "homepage". Optional.
A "slinks" element can contain different types of templates.
The most common link will be the URL, which is what you type into your web browser to get someplace on the internet. Occasionally,
Specialized links may require form elements to be entered using the arguments of the "HTTP POST" or "HTTP-Cookie" headers. In the case of POST argunments, the template defines a set of name-value pairs.
An emerging technology for linking to digital resources is the "Digital Object Identifier".
Templates for DOi's (lower case i is used to indicate that you're talking about the actual DOI string) go in the DOi element.
- <URL>
- This
element contains a template for a URL. In this and other elements declared as
mixed content, white space is preserved. URL encoding should be assumed to
occur only after all parsing and manipulation has been done.
- attributes:
-
- usage
- a string describing how the URL is to be interpreted.
possible values:
- literal
- If a template is literal, the string it forms is to be used
as the URL etc.
- redirected
- A redirected template forms a URI which may get redirected to
another URI. The redirection URI is to be used as the "result"
of the element. Multiple redirections should be followed to the end.
- query
- A query URL is to be used only with a locator element to do web-page
look-ups.
In principle, we could add this to the DOi element, but that would require adding
handle resolving code to the implementation.
- <DOi>
- This
element contains a template for a digital object identifier. When this element
is present, it can be used in a URL template using the <getDOi/> element. If the DOi must be retrieved from a database, use a locator element.
- <postArgs>
- Occasionally,
a server may require form data to be submitted in a post arguments header. This
element contains one or more <postItem> elements which contain the required form data. Implementing software may choose to provide a user with an HTML form containing the data, or perhaps a javascripted form submission.
- attributes:
-
- encoding
- Occasionally, a server may require form data to be submitted
in a post arguments header. This element contains the items for the required form data.
The encoding attribute indicates the encoding expected by the target
Default: "UTF-8"
- <postItem>
- This
element contains a template for a value of a Post argument.
- attributes:
-
- key
- The name of the POST argument.
- isCheck
- (optional,(true|false) default: false) whether a Post argument represents a checkbox or not. The reason this is needed is because web browsers treat checkboxes differently from other inputs. The key is not sent when the box is not checked; as a result, many cgi resolvers just check for the presence/absence of the key.
- <cookie>
- Session ID's and user identification is often accomplished using a special header in the HTTP protocol called the HTTP-Cookie.
The cookie element is included in S-Link-S to enable specification of site authentication in the intranet/library service scenario.
The cookie element should never be used in generic or public link specifications.
Example:
<slinks ID="example2" usage="literal">
<DOi>(the contents of the DOi element)</DOi>
<URL>(the contents of the URL element)</URL>
<postArgs><postItem name="param1" >value1</postItem></postArgs>
</slinks>
|
- <notRequired>
- Experience has shown that an important bit of information for use of S-Link-S template is knowledge of which elements are and are not required.
Normally, the interpreter can deduce which bibliogrpahic data is required, but occasionally it helps to tell the interpreter explicitly what is required.
Any entities required to compute the string templated in the notRequired element are considered to be not required in the URL
Locators
- <locator>
- The "locator" element can be used to find a string on a web page retrieved a URL identified by the query attribute.
Its content is a "PERL 5 Regular Expression" which can be used to search for a text pattern on a web page. Note that the characters ">", "<", and "&" must be escaped with entities, ">", "<", and "&" in the content of this element.
- attributes:
-
- name
- a name of the thing that is being sought (required)
- group
- if there is more than one parenthesized group, then this selects which group.
if group = 0 then the whole match is used. Default: "1"
- query
- The query attribute identifies an element to be used as the query string for the locator.
Default: "URL". If the value is "URL", then the URL template is used to copmpute a URL which is used to retrieve a web page off the internet.
If the value of query is the varID of a "var" element, the content of the var element is used as the retrieval URL.
If the value of the query attribute is the name of another locator, than the result of that locator is used as the retrieval URL.
By using this attribute in multiple locator elements, a chain of queries can be made. This is useful when the result of a query contains URL's to pages which contain the information being sought.
Example (this locator will find the PubMed ID in a PubMed Search Result):
<locator name="pmid" group="1" query="URL">PMID: +(\d+)</locator>
|
Example 2:
<?xml version="1.0"?>
<!DOCTYPE slinks SYSTEM "slinks.dtd">
<slinks ID="publistquery" complete="no">
<URL usage="query">http://www.publist.com/cgi-bin/search?SearchType=Adv&Title=&ISSN=<replace for="-" with="">&ISSN;</replace>&Desc=&Pub=&MaxHits=10&SortBy=Format&Format=1</URL>
<locator name="URL2">HREF="/cgi-bin/(show\?PLID=\d+)"</locator>
<locator name="title" query="URL2">Title:</B>(.+)</TD></locator>
<locator name="subject" query="URL2">TARGET="SubjectListWin">(.+)</A></locator>
<locator name="publisher" query="URL2">TARGET="PubWin">(.+)</A></locator>
</slinks>
|
This example shows how to use chaining to look-up data items from the PubList website.
The first locator uses the the URL element to supply the initial search URL.
The web page returned for this URL contain links to other web pages, and these URL's are matched using the Regular Expression in the locator element's content.
The rest of the locators have a query atribute of "URL2", which is the "name" attribute of the first locator, and so the match result of the first query is used as the query URL for these locators.
The web page at URL2 is retrieved, and the regular expresions in the locators are used to extract the data items.
Variable Containers
-
<var>
- contains a template for
a variable string that other elements may use.
- attributes:
-
- ID
- The
label used to refer to the variable section. Must be unique in the document.
- example:
| <var
ID="a5g16">Volume &vol;</var>
|
- <lookUpTable>
- together
with lookup, this element allows almost any site to be described in the S-Link-S
framework. Nonetheless, it's bad practice to use the lookUpTable in place of a
search engine if you need a table item for every item in your journal. item is
the only allowed content.
- attributes:
-
- ID
- a
unique identifier used to address the lookup table (required)
- default
- The
value returned if no match is found default:""
-
<item>
- a
record in a lookUpTable. (empty)
- attributes:
-
- key
- a
string used to access the record. If there are two items with the same key,
then only the first is used.
- value
- The
string returned by a look-up
-
<scratch>
- is a var by another name. It contains a template for
a variable string that other elements may use. Scratch elements can be used as calculation registers so that a template may build up a calculation.
- attributes:
-
- ID
- The
label used to refer to the scratch section. Must be unique in the document.
- example:
| <scratch
ID="volval"><param name="a5g16"/></var>
|
Bibliographic data place-holders
"Bibliographic data place-holders" indicate where
bibliographic inputs are to be substituted in a template string. Since XML syntax is used in S-Link-S , XML "SYSTEM" entities
are used to represent most of these tokens. For example, if a publisher wants to
specify that links to an article on page 587 in volume 3 should use the URL
http://www.publisher.com/3/587/
, the string "
http://www.publisher.com/&volume;/&startPage;/"
would be used. Entities always start with "&" and end with ";". The names
are case-sensitive.
(Technical Note: A few place-holders are expressed as empty elements (such as <SICI/>) S-Link-S uses entities to represent input strings, and uses elements to represent text strings which are the result of processing input strings.
A preferred implementation may be to insert data for entities in "the internal subset" of a template document type declaration.)
General
rules for place-holder normalization
Bibliographic data is usually produced by humans, and therefore can acquire "noise". For example a journal may have a page number "L - 123". Authors may transcribe this as "l123", "p.L123", "pp:L123". For this reason, the standard normalizations of the bibliographic input strings are specified to try to remove some of the noise. ("1123" and "123" are also likely, but we can't do much there.)
The normalization operations should
take place in an order specified for each entity. In general, the normalizations
are irreversible. For example, if the first author's name is R. McDonald, &authLast; is "mcdonald". You should not require "McDonald" in a case-sensitive URL, because linkers may only know that the name is "MCDONALD".
The defined normalizations are:
- replaceSlash
- Occasionally,
a data string may have a "/" in it; most servers treat
"/" as a path delimiter. In such cases, "/" should be
replaced with "-" in the many place-holders. This is denoted in the
following list as the slashReplace property.
- removeWhiteSpace
- White
space (as defined in the XML spec) should be removed in data place-holders which specify this normalization.
- underscoreWhiteSpace
- All
instances of one or more white space (as defined in the XML recommendation)
should be replaced by a single underscore character in data place-holders which specify this normalization.
- removePunctuation
- The
following punctuation should be removed from place-holder strings: "#",
",", ".", ":", "(", ")", "[", "]", "{", "}", "!", ";", """ . and replaced with white space in data place-holders which specify this normalization.
- trimPunctuation
- Punctuation (enumerated above) and whitespace are removed from the end and beginning of the place-holder string.
- removeStrings
- Many
entities should have certain strings removed for normalization. The list of
strings should be removed in order.
- lowerCase
- The
data string should be converted to lower case.
- toAscii
- Accented and special characters should be replaced by the closest unaccented equivalents. A unicode to ascii table for use in such conversions is available at http://nj.oclc.org/SLinkS/unicodeMap.txt
An S-Link-S implementation should normalize bibliographic data on entry. In cases
where specific formats are required, a S-Link-S implementation should consider
malformed or ambiguous entries to be invalid. A S-Link-S implementation may apply
correct formatting, such as adding a dash to ISSN's.
URL
Encoding and White Space Handling
In elements declared as mixed content (anywhere you have character data.), white space
is preserved. URL encoding should be assumed to occur only after
all parsing and manipulation has been done.
General Bibliographic data
place-holders
- &baseURL;
- The
base URL for the journal. This URL should start with a protocol declaration
(usually "http://"). It can be set from S-Link-S metadata using the "baseURL" Property of a WebService
Normalization:
removeWhiteSpace
- &volume;
- A
string denoting the journal volume.
Normalization:
lowerCase, removeStrings : {"volume", "vol"},
ReplaceSlash, trimPunctuation, removeWhiteSpace
- &issue;
- A
string denoting an issue number.
Normalization:
lowerCase, removeStrings {"issue", "iss",
"no" , "number" , "num"}, ReplaceSlash,
trimPunctuation, removeWhiteSpace
- &pages;
- A
string denoting the page numbers. &pages; =&startpage;-&endpage;
- &startPage;
- A
string denoting the page on which an article starts.
Normalization:
lowerCase, removeStrings {"pages", "page",
"no" , "number" , "num"}, ReplaceSlash,
trimPunctuation, removeWhiteSpace
- &endPage;
- A
string denoting the page on which an article ends.
Normalization:
lowerCase, removeStrings {"pages", "page",
"no" , "number" , "num"}, ReplaceSlash,
trimPunctuation, removeWhiteSpace
- &pSeq;
- When more than one articles are found on a page, this string denotes which article on the page is referred to.
The &pSeq; string is most commonly a letter.
Normalization:
lowerCase, ReplaceSlash, trimPunctuation, removeWhiteSpace
- &artNum;
- A
string denoting an article number in cases where there are no pages.
Normalization:
lowerCase, removeStrings {"pages",
"page", "no" , "number" ,
"num"}, trimPunctuation, removeWhiteSpace
- &itemNumExact;
- A
string denoting an item number for databases, archives, reports, patents. Normalization is minimal to allow for a wide variety of item number formats.
Normalization: removeWhiteSpace
- &itemNum;
- A
string denoting an item number for databases, archives, reports, patents. Normalization is maximal, and should be used for simple item numbers.
Normalization: lowerCase, replaceSlash, removeStrings { "number" ,"no." ,"no" ,
"num.","num", "#"}, removeWhiteSpace
- &ISSN;
- The
ISSN of the journal. If there is a separate ISSN for an on-line version, this
place-holder refers to the print version ISSN. The match string for an ISSN is
"[0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9x]". The last character is
a check digit. It should be set from S-Link-S metadata using the "ISSN" Property
- &eISSN;
- The
ISSN of an on-line version of a print journal. Same format as &ISSN;. It can be set from S-Link-S metadata using the "eISSN" Property
- &CODEN;
- The
CODEN of the journal. The match string for a CODEN string is
"[A-Z][A-Z][A-Z][A-Z][A-Z][A-Z0-9]". The last digit is a check digit. It should be set from S-Link-S metadata using the "CODEN" Property
- &ISBN;
- The
International Standard Book Number for a book. The match string for an ISSN is
"[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9x]". The last digit is
a check digit. Hyphens are omitted.
- &ISBN13;
- The
13-digit International Standard Book Number for a book. Hyphens are omitted.
- &doi;
- The digital object identifier for an article, where it is known a priori.
This is different from getDOi element (below) which is meant to trigger a lookup from a doi database such as crossref or to use a computed doi from the DOi template element.
The practical difference is that the entity triggers a request in the host software, whereas a getDOi triggers a request in the S-Link-S framework.
I expect that getDOi will be deprecated.
- &jKey;
- Many
publishers use the same template string for all their journals, and use a key
string to distinguish among them. The key string has to be declared in the
journal metadata.
Publication Date Placeholders
A S-Link-S implementation should handle dates specially. If the
bibliographic token "month" is entered as "January" then a template
calling for "mo" should get "1", and "ssn" should get "winter", etc.
- &year;
- A
string denoting the year of publication. The match string is
"[0-2][0-9][0-9][0-9]". It's probably safe to assume that most journals published before 1 A. D. are not on-line.
- &yr;
- A
two-digit publication year string. The match string is
"[0-9][0-9]". Publishers are admonished not to use this because of
Y2K. "00" shall be interpreted to mean "1900"
- &month;
- A
string representing the month of publication. This string should match
"(january|february|march|april|may|june|july|august|september|october|november|december)"
- &mon;
- A
3-letter string representing the month of publication. This string should match
"(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)"
- &mo;
- A
2-digit string representing the month of publication. This string should match
"(01|02|03|04|05|06|07|08|09|10|11|12)"
- &day;
- A
2-digit string representing the day of publication. This string should match
"[0-3][0-9]"
- <parsedDate/>
- An
ISO format string (YYYY-MM-DD) formed from &year: or &yr;, &mo; or
&month;, and &day;. Note that this is an (empty) element rather than an entity because it is not an input, but rather a string computed from the inputs. Note that the parsedDate element can also be used to format date strings. (see below)
- &ssn;
- A
string representing the season of publication. This string should match
"(winter|spring|summer|fall)"
- &quarter;
- A
string representing the quarter of publication. This string should match
"(1|2|3|4)"
- &authLast;
- A
string representing the first author's last name.
Normalization:
toAscii, lowerCase, removePunctuation, underscoreWhiteSpace.
- &authInit;
- A
string composed of the first author's first and middle initials.
Normalization:
toAscii, lowerCase, removePunctuation, removeWhiteSpace.
Input-only
place-holders
These
place-holders should not be used in URL specifications, because their representations
are not generally unique. They are often useful, however, for disambiguation or search
and are inputs needed for SICI generation.
- &uTitle;
- title
of the item, represented as a Unicode string.
- &aTitle;
- title
of the item, represented as a 7-bit ascii string.
- &jTitle;
- title
of the journal containing the item, represented as a 7-bit ascii string.
- &uAuthLast;
- First Authors's Last Name, represented as a Unicode string.
Public
database key place-holders
Certain
article databases are sufficiently public, accessible or widely used that their
keys may be suitable for use as place-holders in link URLs. In other words, S-Link-S software should know where to look these up.
- <pmid/>
- The NCBI PubMed unique identifier number for the article. The pmid lookup is accomplished using the pmid.xml template file.
- <getDOi/>
- The
Digital Object identifier for the article should be retrieved and placed here. If there is content in the DOi element, it will be placed here, otherwise, the DOi.xml S-Link-S template will be used to do the look-up. At the present time, there is no global DOi look-up facility. The DOi.xml file currently provided uses the Wiley DOi server to look up Wiley DOi's. For internal use with S-Link-S Calculator, the DOi.xml default template may be altered to accomplish private DOi database lookup.
String
manipulation mark-up
S-Link-S templates may make use of these text manipulation functions.
- <pad>
- used
to markup strings which need padding to make a fixed-length string
- attributes:
-
- padChar
- a
character to pad with (default:"0" zero)
- length
- how long the padded string should be (required) (an integer). If length is shorter that the string to be padded, then the string is chopped.If length is not an integer (more precisely, if Integer(length) throws a NumberFormatException), pad will do nothing.
- align
- The
side the text should align to (left|right) (default:right)
- examples:
| <pad padChar="0" length="3">2</pad>
becomes
"002" |
Here pad is used to chop:
| <pad align="left" length="1">1999</pad>
becomes
"1" |
- <replace>
- substitute
one string for another in the element content
- attributes:
-
- for
- The
string to replace
- with
- The
string to substitute
- grep
- whether
to use PERL5 regular expressions "(yes|no)" default:"no"
- example:
| <replace
for="1" with="one">12</replace>
becomes
"one2" |
-
<changeCase>
- change
case of the text in the element content
- attributes:
-
- to
- (upper|lower|title)
the text can be changed to UPPER case, lower case, or Title Case. Title case
treats all non-alphanumerics as word separators, and then capitalizes the first
letter of each word. (required)
- offset
- characters
up to the offset character are unchanged. default: 0
-
Example:
| <changeCase
to="upper">r1260</changeCase>
becomes
"R1260" |
-
<encode>
- URLencode the text in the element content
- attributes:
-
- encoding
- for non-ascii characters, a target may expect a particular encoding; this attribute allows the template to specify a particular encoding . (Default: "UTF-8")
-
Example:
| <encode>That's all folks!</encode>
becomes
"That%27s+all+folks%21" |
-
<if>
This element implements conditional sections, and can
contain "case", "match", "notEmpty" and "else" elements. case and match contain boolean conditions
as attributes. notEmpty is true if it is not empty, and else is always true. The content of the first element with a true condition is selected, and remaining conditional elements need not be evaluated.
In some situations where you think this logic is appropriate, you may want to consider using separate S-Link-S templates and set validity ranges in the metadata using the starts or startDate property.
- <case>
- The
attributes define a comparison of strings. If true, the content of this element
becomes the value of the parent if element
- attributes:
-
- varID
- The ID of a variable section to use as the
left hand value for a comparison.
- op
- a
comparison operator, matches "(gt|lt|eq|ne|ge|le)"
- const
- The
right hand value for a comparison.
- order
- (numeric|alpha|date)
default: numeric. In numeric comparison, the numbers are extracted from the strings
before comparison. In a date comparison,
both strings should be either YYYY-MM-DD format strings
or they must be simple token values, i.e. "july" . date string parsing
is lenient. .In alpha comparisons, the compareTo
method of the Java String object is used, with the English Locale.
- <else>
- like
case, but always true
- example:
<var ID="v">&volume;</var>
...
<if>
<case varID="v" op="ge" const="3">V&volume;</case>
<else>1-2</else>
</if>
|
-
<match>
- tests
a match of one string in another
- attributes:
-
- with
- The
match string
- varID
- The ID of the "var" or "scratch" element containing the
string in which to search for the match string
- grep
- whether
to use PERL5 regular expressions "(yes|no)" default:"no"
- example (here we check if the startPage has an "l" or an "L" in it, and change the URL accordingly):
<var ID="p">&startPage;</var>
...
<if>
<match varID="p" with="[Ll]" grep="yes">letters/&page;</match>
<else>articles/&page;</else>
</if> |
- <notEmpty>
- notEmpty is true if its content is not equal to the empty string. This has proven to be useful in cases where you want to use placeholder text when a token is not provided.
- example:
<var ID="v">&volume;</var>
...
http://www.site.com/query?issue=<if>
<notEmpty>&issue;</notEmpty>
<else>all</else>
</if>
|
-
<option>
- The contents of the option element may be redundant and can be omitted if it contains a placeholder which was not resolved.
- example:
| <option>&issue;/</option>
|
- <pattern>
specify a pattern for the element content. Specifying patterns will improve the reliability of
link formation and is absolutely essential for the library/intranet scenario where generic link URL's
must be recognized and replaced with enhanced link URL's.
- attributes:
-
- model
- A PERL5 regular expression that the element content should match.
- example:
| <pattern
model="l?[0-9]+">&volume;</pattern>
|
This example expresses that the &volume; placeholder should be either a number or a number preceded by the letter "l". Note that the pattern element is descriptive, not prescriptive. It has no effect on the enclosed text, except that a processor can use it to flag errors.
In this example, if the capital letter "L" was used in stead of "l", the standard normalization (lower-case) for &vol; would contradict the model.
-
<lookUp>
- looks
up a value in a lookUpTable using the content of this element as the key. The
entire element is replaced by the returned value. Look-up is case sensitive
- attributes:
-
- ref
- The
ID of the lookUpTable to use
- example:
<lookUpTable ID="yrs">
<item key="1991" value="old/5"/>
<item key="1992" value="old/6"/>
<item key="1993" value="old/7"/>
<item key="1994" value="papers/8"/>
<item key="1995" value="papers/9"/>
<item key="1996" value="papers/10"/>
</lookUpTable>
|
|
<lookUp ref="yrs">&year;</lookUp> returns
"old/7"
when
&year;
is
"1993" |
Functions of Variable Text.
The "hash" and "checkSum" elements are functions of a text string which is either in a "var" element referenced by the "varID" attribute, or of what we call the current text. The current text is the text resulting from parsing the character data and elements which occur in the parent element before the relevant hash or checkSum element. It's easier to illustrate by example:
| <URL>&baseURL;volume=<pad length="3">&volume;</pad><checkSum/>1234.html</URL> |
Here the underlined part is the "current text for the checkSum element.
- <hash>
- an
empty element which is replaced by an MD5 hash (expessed in hexadecimal, capital letters) of the targeted marked element
- attributes:
-
- varID
- The
ID of a marked target section
- example:
| <hash varID="1">
becomes
"29B0FE973D179E0E5B147598137D28CF" |
-
<checkSum>
- an
empty element which is replaced by a= checksum of the targeted marked element.
Supported algorithms are: "mod37", which is useful for alphanumeric strings and
is specified in Z39.56-1996 (The version 2 SICI). Other useful checksum
algorithms may be added as experience warrants.
- attributes:
-
- varID
- The
ID of a marked target section
- type
- The
name of the algorithm used to calculate the checksum. (mod37).
- example:
-
<var ID="cs1">0066-4200(1990)25<>1.0.TX;2-</var>
<checksum varID="cs1">
becomes
"S"
|
(In this example, note that "<>" must be escaped with "<>")
-
<parsedDate>
- An
ISO format string (YYYY-MM-DD) formed from either the publication date (pubDate), the date of processing (today) or the date and time of processing as yyyy-MM-dd:HH:mm:ss (now).
- attributes:
-
- when
- The
date to be formatted. (pubDate | today | now) default :pubDate.
- example:
-
<parsedDate when="today"/>
becomes
"2005-08-03"
|
Commerce
and security place-holders
One advantage of specifying rules for link construction instead of just exchanging
tables of links is that you can ask linkers to embed extra information. in the
links they construct.
- &linkerID;
- This
is an identifier which, by mutual arrangement, can be used to identify the
linker. A null string will be substituted when there is no arrangement between
linker and linkee. For an example of how this might be used, consider the
Amazon.com associates program. The URL in this case would be described as
http://www.amazon.com/exec/obidos/ISBN=&ISBN;/&linkerID;/
. Remember that since the resulting URL can be bookmarked, linkerID is only
useful in situations where it benefits the linker to add the place-holder. i.e. you
can use this to pay for people to link to you, but you can't use it to charge
them.
Although
you might think this might be useful for source tracking, it's really only
useful for that if you add a signature as well...
- <private/>
This is a place-holder which can be used to implement private arrangements. It is an element to facilitate external function calls by a S-Link-S engine.
As an example of how this might be used, consider a case where, as part of a business arrangement, a publisher wants
links to be perishable. To make the links perishable, the publishers agree to set <private/> to be equal to
today's string YYYY-MM-DD, and then use a template like
http://www.publisher.com/&volume;/&startPage;/<private/><dsig/>.
The linked publisher then verifies the hash and rejects invalid URLs.
This URL cannot be bookmarked.
- attribute:
-
- data
- whatever extra data is needed for the private arrangement.
- example:
-
| <private data="YYYY-MM-DD"/> |
In principle, publishers can use the private place-holder to exchange other sorts of information.
It is expected that
future versions of this specification will include additional specific information exchange place-holders as
the uses of these become clear.
- <param/>
This is a a way to insert arbitrary named parameters into a URL. if the named parameter cannot be found, the S-Link-S interpreter
will behave as though an entity named "foo" was not found, so option elements can be used.
As an example of how this might be used, consider a case where a link server wants to pass through a parameter "foo"
supplied on a GET query.
http://www.aggregator.com/get?foo=value&issn=1213-3423
-->
http://www.publisher.com/1213-3423?foo=value
.
while "issn" is a well known parameter name, "foo" isn't.
- attribute:
-
- name
- the name of the parameter.
- example:
-
| http://www.publisher.com/&ISSN;?foo=<param name="foo"/> |
This element was added to assist with parameter passing in OpenURL link servers.
- <dsig/>
- dsig is a 128 bit hex-coded (32 character) integer which is the
"MD5" one-way hash of the referenced variable or the current text
of the parent element, with the linkee's password appended
to it. This can be used to implement "digital signing" of a URL.
An S-Link-S implementation will have a centralized signature authority
which authenticates a user and adds the linkee's password to the signed
string before computing and returning a hash string to the linker.
- attributes:
-
- varID
- ID of the variable to sign
There are a number of security and commerce possibilities that have not been included.
- Encryption Encryption has not been included because in order to make sense, there needs to be someone to hide the information from.
Links, by their nature, are rather public and should be storable and exchangable. If a linkee wants a linker to communicate something privately, putting the data in encrypted links is probably the silliest way to accomplish it.
- Immovable Links Using hashes, it is possible to make links that work only from specific pages. The principle effect of this would be to annoy users, since determined robots could easily spoof the referring page.
- Mechanisms to Charge the Linker Again, if a linkee wants to collect money from people linking to them, it's much easier to do this by private agreement.
SICI-related
markup
A SICI element is defined to allow publishers to use SICI's in URL's and DOI's. The titleCode element is for SICI support
- <titleCode/>
- an
ascii string derived from the title of the item using the rules specified in
ANSI/NISO Z39.56-1991 or ANSI/NISO Z39.56-1996.
- attributes:
-
- vers
-
(1|2) default: 2. the SICI version. vers="1" corresponds to ANSI/NISO
Z39.56-1991; vers="2" corresponds to ANSI/NISO Z39.56-1996
- example:
- when
&aTitle; is "Characteristics of InSb Photovoltaic Detectors at 77 K and Below",
<titleCode vers="1"/>
becomes
"CIPD"
- <SICI/>
- an
empty element which is replaced by SICI strings according to the specified
attributes .
- attributes:
-
- vers
-
(1|2) default: 2. the SICI version. vers="1" corresponds to ANSI/NISO
Z39.56-1991; vers="2" corresponds to ANSI/NISO Z39.56-1996
- titleCode
- whether
to include the Title Code (yes|no) in the contribution segment. default: "no".
- enumeration
- (v|vn)
Default: "v". This attribute specifies the required level of detail in the
enumeration string.
- chronology
- (year|yearMo|yearMoDa|yearQ|yearS) Default:year. This attribute specifies the
detail required for the chronology string.
- CSI
- (1|2|3)
Default "2". the SICI-2 code structure identifier. CSI="1" is the SICI-2 for a
journal issue; CSI="3" contains private codes.
- DPI
- (0|1|2|3)
Default:"0". The SICI-2 Derivative Part Identifier. DPI="0" is a contribution,
DPI="1" is a table of contents, DPI="2" is an index, DPI="3" is an abstract.
- MFI
- (TX|TL|TH|TS|TB|CD|CF|CT|CO|HE|HD|SC|VX|ZN|ZU|ZZ)
Default:"TX". The SICI-2 "Medium/Format Identifier".
- example:
- when
&ISSN; is 00368075, &year; is 1992. &vol; is 256 and &page; is
784,
<SICI/>
becomes
"0036-8075(1992)256<784>2.0.TX2-Z"
Note that the "&", "<", ">" and '"' characters may need to be escaped. If you use a SICI in a URL, you'll need to deal with this on the receiving end.
Changes
Many of the changes have been made in response to feedback and criticism from members of the publishing community.
We are extremely grateful to Mark Doyle, David Ephron, Arthur Smith, Steve Hitchcock, Herbert Van de Sompel and Dan Connolly for comments and criticisms.
Miles Poindextexter at Openly has also participated in the refinement of S-Link-S.
- 10/26/98 First version on the web
- Expanded on &jkey;.
- Expanded on &pSeq;.
- Added "Open Issues".
- Corrected template example.
- Clarified hashing.
- 10/28/98
- Extensive editing and clarifications.
- Changed some cases for names.
- Added itemNumExact.
- Changed some token normalizations.
- Added unAccent normalization.
- added crosslinks
- added option element
- 10/29/98
- More editing and clarifications.
- Added link property and LinkMethod Class to "reify" the linkage.
- Added dc:Coverage, dc:Language, dc:Subject notes
- Added the Translation value to Relationship
- 10/30/98
- Added system architecture sketches.
- hashURL, medline, getDOi, titleCode, parseddate made into elements
- added &unknown; input entity
- 11/2/98
-
- 11/7/98
- Revised UserType definitions
- Added validQuery property and Class
- added linkClassName Property
- added useMarked value for Resolution
- 11/24/98
- Added intranet/library usage scenario
- Added pattern, cookie elements
- "tokens" are now called "placeholders"
- 12/8/98
- Corrected 1st S-Link-S Template Example
- 2/8/99
- Made a number of name changes; after working with the old names for a while, we decided that some of them were confusing.
- "Linkable" changed to "WebService"
- "Citeable" changed to "CiteablePub"
- "ClassSite" changed to "ContainingSite"
- "Access" changed to "LinkedObjectType"
- "access" changed to "linkedObjectType"
- Added two values to "LinkedObjectType" (was "Access"), "TOC", and "RightsData"
- 2/23/99
- Lots of Changes made.
- Many typos fixed.
- Made a bunch more name changes.
- "medline" changed to "muid"
- "link" changed to "linkMethod" (To make the naming convention consistent.)
- "relatedTo" changed to "citeablePub" (ditto)
- "UserType" changed to "AllowedUser", "userType" changed to "allowedUser"
- "Everyone" changed to "Anything"
- "linkPermission" changed to "linkPolicy", "LinkPermission" changed to "LinkPolicy"
- "linkPermissionURL" changed to "linkPolicyURL"
- "Friendly" changed to "Reciprocal"
- "linkNotify" changed to "linkNotifyEmail"
- "linkButton" changed to "linkButtonURL"
- padChar now defaults to zero, and we clarify that pad can be used to chop.
- the RDF example is changed to reflect the changes in the RDF syntax and schema specifications
- added "Subscriber" to the allowed "AllowedUser"'s.
- 2 new attributes, "webServiceType" and "legalRelation" were added to replace "relationship".
- new attribute for WebServices: serviceArea. dc:Coverage is omitted-it didn't mean what I thought it meant.
- The availability of "subPropertyOf" in the latest RDF Schema has led us to bring Dublin Core properties into the S-Link-S Scheme, where they are declared as Dublin Core subProperties.
- Added Class SLinkSFile to serve as a domain for meta-metadata.
- Added property SLinkSFileLocation.
- linkPolicy is a property of a WebService, not a LinkMethod
- Properties which set entities now modify LinkMethod
- added Classes Books and Serial
- added property clicksAway
- added LinkedObjectType: Holding; removed Biblio
- 3/11/99
- described what will happen if pad is not an integer.
- 5/5/99
- S-Link-S Template language thoroughly revised to reflect the implementation in code. The template language is at the "final call" stage and will soon be frozen.
- "SLinkS is changed to "S-Link-S". "S-Link-S" is being registered as a Service Mark of Openly Informatics, Inc.
- changed "muid" placeholderto "pmid" (there exists a muid, but pmid is prefered for linking)
- added "locator"
- removed "unknown"
- changed "mark" to "var"
- "match" and "case" now work differently. Both now reference a var element instead of containing variable text in attribute values. This was necessary because XML restrictions on external entities in attribute values made the previous construct rather awkward.
- The implementation of "hash" and "checkSum" is now that if the "varID" attribute is missing the hash or checkSum is generated from the "current text" of the parent element.
- unAccent normalization renamed toAscii.
- Attribute "ref" is changed to "varID" whereever it refers to a var.
- "passHash" element replaced with "dsig".
- hashURL removed; hash is implemented in a way that will do the same thing
- removed "password" placeholder.
- 6/2/99
- S-Link-S Template language reaches version 1.0. Minor changes were made in order to move the functions of the "Resolution" metadata into the template language.
Changes in metadata:
- Resolution property and class removed from Metadata Scheme.
- linkClassName property removed.
- validQuery property removed. This property belongs in another scheme.
- the RDF example is now legal.
- the links to RDF schema and syntax have been updated.
Changes in template language:
- added "complete" attribute to slinks element to help distinguish templates that need
a network connection from those that don't.
- added "usage" attribute to URL. This changes the way that query URL's are interpreted
- added "query" attribute to locator. The behavior of locator has changed a bit.
- changed &URL; to &baseURL; to reduce confusion with the element <URL>
- 6/22/99
- Corrections made in RDF example.
- 7/14/99
- No changes have been made in the template language, but the explanation of locators has been elaborated.
Extensive changes in the metadata schema have been made, partly to reflect the emerging form of the S-Link-S Legal Framework, and partly to weed out properties that were not being implemented.
- Added all the S-Link-S BibData tokens, mostly so that comments and labels in rdfs syntax can be added.
- URL property of LinkMethod changed to baseURL of WebService to synch with template entities.
- Removed password property. It is now assumed that passwords are attached to linkerID's, and thus they needn't be a property of any of the objects that this schema defines.
- Removed legalRelation property. It was felt that this property was not really relevant to linking and that comparing the publisher of WebService and of CiteablePub was sufficient for anything that legalRelation would have been used for.
- Removed serviceArea. We'd been hoping that standards for the content of this field would emerge; since it hasn't happened, it was felt that it would be better to add it later than to have people submit unusable information.
- Removed linkPolicyURL. It was felt that supporting this item would be detrimental to the goal of automated link policies.
- Replaced linkPolicy with restriction and allowedContext based on the version 3 legal framework.
- Added mime-type.
- 8/26/99
- Added the "Redirection" Allowed Context.
- 11/10/99
- Noted Openly's "Link.Openly" software.
- added ":(){}[]!;"" to list of removed punctuation during normalization
- added the trimPunctuation normalization
- added download for unicode to ascii table.
- changed removePunctuation to trimPunctuation in normalizations for &volume;, &issue;, &startPage;, &endPage;, &pSeq;, &itemNum;
- added &jTitle; entity to template.
- redundant allowed context removed
- the explanation of the template property was changed. In previous versions, the value of the property was allowed to be "literal" XML content conforming to the S-Link-S DTD. Unfortunatately, this forced validation, because of the external entities typically used in S-Link-S XML. Thus, practical considerations forced us to abandon true xml content for this item; instead, the XML document is URL encoded and embedded as a string in the RDF. It's still OK to use an external URI.
- added webService core property; added note about properties implied by the linkMethod properties
- 12/7/2000
- Added link to Link.Openly Server
- Minor typo's corrected
- Added new element- "notEmpty"
- 3/20/2001
- Corrections courtesy of John Punin
- Added note about using URI resources in RDF. I should really come back and fix rather than just the little bandaid.
- Typo in RDF example corrected
- 7/11/2001
- Additions to deal with the real world
- added &doi; entity
- added <encode> element to deal with URL encoding
- added <param> element to help OpenURL parameter passing
- added <notRequired> element to help requirement analysis
- 5/26/2002
- corrections
- the lookUpTable example was not well formed; item elements should be empty
- 5/4/2003
- Additions
- <scratch> element added
- <postItem> element added
- encoding attribute added to <encode> and <postArgs>
- content model for <postArgs> is now postItem*
- added resultType attribute to root element
- updates in text
- 8/2/2004
- Additions
- isCheck attribute added to postItem
- 8/2/2005
- Overdue modernization
- Removed RDF section to separate document
- added function to parsedDate
- added uAuthLast
- 9/28/2005
- Additions
- 7/11/2006
- Revisons
- Openly is now part of OCLC
- Links to corrected DTD v1.15
- Updated introductory matter
Copyright
and Permitted Use
Author:
Eric S. Hellman, eric@openly.com
Copyright
(1998-2006) by OCLC Online Computer Library Center, Inc.
All
Rights Reserved.
Redistribution
and use in source and binary forms, with or without modification, are permitted
provided that the following conditions are met:
- Redistributions
of source must retain the above copyright notice, this list of conditions and
the following disclaimer.
- Redistributions
in binary form must reproduce the above copyright notice, this list of
conditions and the following disclaimer in the documentation and/or other
materials provided with the distribution.
- The
name of the author may not be used to endorse or promote products derived
from this software without specific prior written permission.
- "S-Link-S" is a Service Mark of OCLC Online Computer Library Center, Inc. The S-Link-S
name and logo may not be used to endorse or promote
products derived from this specification without specific prior written
permission of OCLC Online Computer Library Center, Inc..
THIS
INFORMATION IS PROVIDED "AS IS" AND ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT
OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY
OF SUCH DAMAGE.