This is an old revision of the document!
Below we will describe how the detect*
and do*
functions of Zotero translators can and should be coded. If you are unfamiliar with JavaScript, make sure to check out a JavaScript tutorial to get familiar with the syntax. In addition to the information on this page, it can often be very informative to look at existing translators to see how things are done.
Web Translators
detectWeb
detectWeb
is run to determine whether item metadata can indeed be retrieved from the webpage. The return value of this function should be the detected item type (e.g. “journalArticle”, see the overview of Zotero item types), or, if multiple items are found, “multiple”.
detectWeb
receives two arguments, the webpage document object and URL (typically named doc
and url
). In some cases, the URL provides all the information needed to determine whether item metadata is available, allowing for a simple detectWeb
function, e.g. (example from Cell Press.js
):
function detectWeb(doc, url) { if (url.indexOf("search/results") != -1) { return "multiple"; } else if (url.indexOf("content/article") != -1) { return "journalArticle"; } }
doWeb
doWeb
is run when a user, wishing to save one or more items, activates the selected translator. Sidestepping the retrieval of item metadata, we'll first focus on how doWeb
can be used to save retrieved item metadata (as well as attachments and notes) to your Zotero library.
Saving Single Items
Metadata
The first step towards saving an item is to create an item object of the desired item type (examples from “NCBI PubMed.js”):
var newItem = new Zotero.Item("journalArticle");
Metadata can then be stored in the properties of the object. Of the different fields available for the chosen item type (see the Field Index), only the title is required. E.g.:
var title = article.ArticleTitle.text().toString(); newItem.title = title; var PMID = citation.PMID.text().toString(); newItem.url = "http://www.ncbi.nlm.nih.gov/pubmed/" + PMID;
After all metadata has been stored in the item object, the item can be saved:
newItem.complete();
This process can be repeated (e.g. using a loop) to save multiple items.
Attachments
Attachments may be saved alongside item metadata via the item object's attachments
property. Common attachment types are full-text PDFs, links and snapshots. An example from “Pubmed Central.js”:
var linkurl = "http://www.ncbi.nlm.nih.gov/pmc/articles/PMC" + ids[i] + "/"; newItem.attachments = [{ url: linkurl, title: "PubMed Central Link", mimeType: "text/html", snapshot: false}]; var pdfurl = "http://www.ncbi.nlm.nih.gov/pmc/articles/PMC" + ids[i] + "/pdf/" + pdfFileName; newItem.attachments.push({ title:"PubMed Central Full Text PDF", mimeType:"application/pdf", url:pdfurl});
An attachment can only be saved if the source is indicated. The source is often a URL (set on the url
property), but can also be a file path (set on path
) or a document object (set on document
). Other properties that can be set are mimeType
(“text/html” for webpages, “application/pdf” for PDFs), title
, and snapshot
(if the latter is set to false
, an attached webpage is always saved as a link).
Notes
Notes are saved similarly to attachments. The content of the note, which should consist of a string, should be stored in the note
property of the item's notes
property. A title, stored in the title
property, is optional. E.g.:
bbCite = "Bluebook citation: " + bbCite + "."; newItem.notes.push({note:bbCite});
Saving Multiple Items
Some webpages, such as those showing search results or the index of a journal issue, list multiple items. For these pages, web translators can be written to a) allow the user to select one or more items and b) batch save the selected items to the user's Zotero library.
Item Selection
To present the user with a selection window that shows all the items that have been found on the webpage, a JavaScript object should be created. Then, for each item, an item ID and label should be stored in the object as a property/value pair. The item ID is used internally by the translator, and can be a URL, DOI, or any other identifier, whereas the label is shown to the user (this will usually be the item's title). Passing the object to the Zotero.selectItems
function will trigger the selection window, and the function will return the items that the user selected. An example from ESpacenet.js:
if (detectWeb(doc, url) == "multiple") { var items = new Object(); ... while (next_title = titles.iterateNext()) { items[next_title.href] = Zotero.Utilities.trim(next_title.textContent); } items = Zotero.selectItems(items); ... }
For compatibility with Zotero Connectors, Zotero.selectItems
should preferably be called with a callback function as the second parameter. This callback function receives the object with the selected items. We need an example of a translator that does this.
Batch Saving
Asynchronous
You will often need to make additional requests to fetch all the metadata needed, either to make multiple items, or to get additional information on a single item. The most common and reliable way to make such requests is with the utility functions Zotero.Utilities.doGet
, Zotero.Utilities.doPost
, Zotero.Utilities.processDocuments
.
Zotero.Utilities.doGet(url, callback, onDone, charset)
sends a GET request to the specified URL or to each in an array of URLs, and then calls function callback
with three arguments: response string, response object, and the URL. This function is frequently used to fetch standard representations of items in formats like RIS and BibTeX. The function onDone
is called when the input URLs have all been processed. The optional charset
argument forces the response to be interpreted in the specified character set.
Zotero.Utilities.doPost(url, postdata, callback, charset)
sends a POST request to the specified URL (not an array), with the POST string defined in postdata
and then calls function callback
with two arguments: response string, and the response object. The optional charset
argument forces the response to be interpreted in the specified character set.
Zotero.Utilities.processDocuments(url, callback, onDone, charset)
sends a GET request to the specified URL or to each in an array of URLs, and then calls the function callback
with XXXXX arguments: DOM document object, URL, and XXXX. the optional
charset
argument forces the response to be interpreted in the specified character set. This is approximately the equivalent of doGet
, except that it returns DOM document objects instead of strings.
Note: The response objects passed to the callbacks above are described in detail in the MDC Documentation.
Zotero.Utilities.processAsync(sets, callbacks, onDone)
can be used from translators to make it easier to correctly chain sets of asynchronous callbacks, since many translators that require multiple callbacks do it incorrectly [text from commit message, r4262]
Synchronous
Note While synchronous loading of sources is easier to implement, it should be avoided in new code to ensure compatibility with Zotero Connectors.
Webpages can be loaded synchronously with Zotero.Utilities.retrieveDocument
, which requires a URL as its argument, and returns a DOM document object, e.g. (example from Nagoya University OPAC.js
):
for (var url in items){ var doc = Zotero.Utilities.retrieveDocument(url); scrapeAndParse(doc, url); }
Metadata documents can be loaded synchronously using Zotero.Utilities.retrieveSource
. This function can be called with only a URL, in which case a GET request is executed, or with additional body, headers and responseCharset parameters, in which case a POST request is executed. The body, headers and responseCharset parameters are respectively the request body to POST to the URL, the HTTP headers to include in request, and the character set to force on the response. An example of Zotero.Utilities.retrieveSource
used for a GET request (from Google Scholar.js
):
var bibtexData = Zotero.Utilities.retrieveSource(this.bibtexLink);
Cross-Domain Restrictions
Note that all the above functions are affected by Firefox's HTTP Access Control. See the linked article at the Mozilla Developer Center for more details, but the gist of it is that Zotero.Utilities.retrieveDocument
and Zotero.Utilities.processDocuments
will not in general work when called from one domain, requesting documents from another domain. Such arrangements are actually fairly common for site index and search pages. The other functions, like Zotero.Utilities.doGet
, will work, but the response will be a simple text string which will usually have to be processed using regular expressions, not XPath or other DOM-based approaches.
When such HTTP Access Control prevents an action, you will see an error like this in the error console or debug output:
00:41:50 Translation using Test failed: message => Permission denied to access property 'documentElement' fileName => chrome://zotero/content/xpcom/translation/browser_firefox.js lineNumber => 451
Import Translators
Export Translators
Search Translators
Utility functions
Zotero provides several utility functions for translators to use. Some of them are used for asynchronous and synchronous HTTP requests; those are discussed above. In addition to those HTTP functions and the many standard functions provided by JavaScript, Zotero provides:
Zotero.Utilities.capitalizeTitle(title, ignorePreference)
Zotero.Utilities.cleanAuthor(author, creatorType, hasComma)
Zotero.Utilities.trimInternal(text)
Working with the Translator object
Methods
Zotero.loadTranslator(type)
translator.setSearch(OpenURL ContextObject)
For search translators. Takes a skeleton item object and …
translator.setString(string)
For import translators. Sets the string that the translator will import from.
translator.getTranslators()
Returns an array of translators that should be able to run on the given data. That is, those translators that return a non-false value for detectImport
, detectSearch
or detectWeb
when passed the input given with setString
, setSearch
, etc.
translator.setTranslator(translator)
Takes translator object (returned bygetTranslators(..)
, or the UUID of a translator.translator.setHandler(event, callback)
translator.translate()
Calling an import translator
use RIS as an example, then maybe MARC
Calling a search translator
(from COinS.js:53-67)
var search = Zotero.loadTranslator("search"); search.setHandler("itemDone", function(obj, item) { newItems.push(item); }); search.setHandler("done", function() { retrieveNextCOinS(needFullItems, newItems, couldUseFullItems, doc); }); search.setSearch(item); // look for translators var translators = search.getTranslators(); search.setTranslator(translators); search.translate();
Translator Framework
Many web translators can be written in a simplified form by using the Translator Framework, a library for translator development. Translators written in this way consist of simple sets of rules for scraping item metadata from specified portions of the page. See the Translator Framework page for details.