Intersections in cyberspace 2: Centrelink and DIAC multilingual content
September 28th, 2007 by Andrew CunninghamIn part 1 I indicated that I would discuss internationalization of Australian Government websites with respect to accessibility by referencing two particular federal government websites. In this article I’ll be discussing the Centrelink and DIAC websites.
My first impressions of these sites’ approach to content in community languages is that firstly, the solutions they use for non-English content hark back to 1995 and 1996 rather than 2007 and I’ve seen more sophisticated approaches to some of these languages coming from web developers in third world countries. Secondly, their approaches violate mandatory federal government web publishing requirements; specifically, they fail to meet level A checkpoints in WCAG 1.0 relating to provision of alternative text for images and marking up change of language. These failures tend to be specific to content in community languages.
Background
Federal Government agencies compliance with the Federal Government’s Web Publishing Guide on accessibility is mandatory. The Accessibility Web Publishing Guide builds on the Government’s access and equity policies. Within this context accessibility also implies that a website is usable regardless of the users linguistic and cultural background. The guide indicates that:
Australian Government departments and agencies are also required to maximise their use of new technologies by ensuring that their websites address access and equity issues for people from culturally and linguistically diverse backgrounds.
Audience
Access methods were briefly discussed in the Community Languages Online report. This idea of access models in Government and community websites was articulated in the NSLA discussion paper Languages in transition and other sources.
Two of the access models articulated in these reports are:
- mediated access; and
- direct access.
The mediated access model is the access model most commonly used for government information in community languages. Translated content is most often made available as PDF files. Access and navigation to these documents is via an English language user interface. It is assumed that mediators (service providers, community leaders, etc.) would locate the information, print it out and make it available to the end user. The intended user of the website and the intended user of the translated information are seen as separate, distinct audiences. Often little thought is given to how to promote the content to target mediators, encourage mediators to access and distribute the information, and how to resource mediators to enable them to do so.
Often translated information is placed on a government website because they already have the PDF files handy, rather than being part of a well planned and implemented communications strategy.
In the direct access model. Links from the front page of the website allow people with low levels of English literacy to navigate to and access information in their mother language. In this model the intended audience of the website and the intended audience of the translated information coincide. Currently it is rare for government websites to use a direct access model.
Through accident, rather than design, the Centrelink and DIAC websites fall into the mediated access model.
DIAC
On the DIAC website, the index page linking to various translations is buried within the site, requiring navigating through 5 levels of an English language content before having the option to select a language. Direct access to languages index pages is not possible from the site’s homepage.
The irony and contradiction occurs with subsequent navigation. Once you have located the index page for the Beginning a life in Australia booklets and choose a language the following language specific page is in the target language with no corresponding English language text. There is no method of navigating in language to this information. Access needs to be mediated, but at the same time the final pages you reach have been intended for end user to choose a PDF document, not the mediator.
Centrelink
The Centrelink site also has its oddities, although somewhat different in nature. Centrelink has an English language link to its translated content, allowing mediators to access all translated content. Simultaneously the Centrelink site uses an animated GIF containing images of various translations of We speak your language
to navigate to an index page listing all available languages.
At this level, it would appear that Centrelink is attempting to provide both mediated and direct access. Once you navigate to the languages’ index page, it is possible to navigate to an index page listing all the documents available in your language. These language index pages are strange. The introductory welcome text is written in the target language while the titles of the documents are listed in English.
Centrelink was providing direct access in Community languages to a certain level, but at the last stage require you to use English to select the document you require.
Ultimately the Centrelink documents require mediation, unless you have sufficient English literacy to negotiate document titles ladened with Centrelink jargon.
Character encodings
One of the core issues for supporting non-English language text isthe capacity of the content management system to support multilingual content. In cases where a website includes content in community languages and the internet is intended to be used as a communication medium to provide information in community languages, a migration strategy should be put into place to ensure the acquisition of a CMS capable of supporting the range community languages required.
An interesting puzzle about the Centrelink website, is that English language content is published in the UTF-8 character encoding, while non-English content is embedded as images. On the other hand the DIAC website uses the more restrictive ISO-8859-1 (Latin 1) character encoding.
The W3C has published a tutorial on declaring the document encoding.
Language tagging
W3C internationalization best practice techniques for Specifying Language in XHTML & HTML Content include the recommendation that you declare the language of content by using lang and xml:lang attributes (as appropriate) in the html element to set the default language for the whole document. Language attributes are then used on any element containing content in a different language to indicate change of language.
My understanding of the W3C Internationalization model is that the text processing language should be identified and identifiable for every element in a document.
WCAG 1.0 distinguishes between the need to identify any changes in the natural language of a document (level A) and the identification of the primary language of the document (level AAA), giving preferential weighting to marking up change in natural language without necessarily having to identify the initial language used by the document (for those sites seeking to comply with level A or AA checkpoints).
While in the current WCAG 2.0 draft the priority has been revered. Specifying the default human language of each Web page
is a level A requirement and specifying the human language of each passage or phrase in the content
is a level AA checkpoint.
Content
As with other Australian government websites, most content on the Centrelink and DIAC web sites is in English, with selected content in community languages.
The community languages’ content in these sites fall into two formats: PDF files and images. The images may be images of individual words or phrases on one hand or images of multiple paragraphs of text on the other.
It is important to note that there is no community language text in the HTML documents on the websites. All non-English text exists only in PDF documents or in images. I’ll leave discussions of PDF files for another post. I’ll focus on Centrelink and DIAC’s use of images as a mechanism for rendering non-English text.
Centrelink language index page
The Centrelink languages index page is a UTF-8 encoded web page. The web page has a declared default language of English. The page contains a series of images. Each image is a link to a index page containing a list of documents in a specific language.
The text within the image is a translation of the Centrelink tagline We speak your language
.
The alt text for each image is in English. For instance the image containing the text Noi parliamo al tua lingua
has a alt value of Italian language publications
.
<a href="/internet/internet.nsf/languages/it.htm"><img alt="Italian Language Publications" src="/internet/internet.nsf/filestores/images_languages/$file/italian.gif" height="30" width="185" /></a>
There are a couple of issues here:
- the mismatch of languages; and
- the use of images where text based links could have been used
An alternative representation that utilises the images and uses an Italian language alt attribute would have been:
<a href="/internet/internet.nsf/languages/it.htm"><img lang="it" xml:lang="it" alt="Noi parliamo al tua lingua" src="/internet/internet.nsf/filestores/images_languages/$file/italian.gif" height="30" width="185" /></a>
A simpler and much better alternative would be to get rid of the image altogether and use a text based link:
<a href="/internet/internet.nsf/languages/it.htm" lang="it" xml:lang="it">Noi parliamo al tua lingua</a>
Some web browsers render the title attribute as a tooltip. This allows an interesting compromise for bilingual access. Assuming the default language for the index page is English:
<span title="Italian Language Publications"><a href="/internet/internet.nsf/languages/it.htm" lang="it" xml:lang="it">Noi parliamo al tua lingua</a></span>
The link is in Italian and marked up as Italian text, but a pointer in the code, and for some web browsers a visual cue, in English assists monolingual web developers and information mediators.
If you want to go one step further, you could also add the hreflang attribute to the a element. See Identifying the language of a target document for further information.
DIAC language index page
The language index page for the Beginning a life in Australia booklets is an English language page that uses images as buttons to each language specific index page.
Looking at link to the Spanish page:
<a href="/living-in-australia/settle-in-australia/beginning-life/select/spa.htm">
<img src="/living-in-australia/_images/btn_spa.gif" alt="Spanish" />
</a>
The image has the Spanish word Español
within the image and an English alt attribute value. This would be better marked up as:
<a href="/living-in-australia/settle-in-australia/beginning-life/select/spa.htm" lang="es" xml:lang="es">
Español</a>
or
<span title="Spanish"><a href="/living-in-australia/settle-in-australia/beginning-life/select/spa.htm" lang="es" xml:lang="es">
Español</a></span>
Centrelink language specific pages
As mentioned above language specific pages on the Centrelink web site suffer from a split personality, uncertain which audience they are intended for. I’ll use the Italian publications page as an example.
The title of the page is in English. The title is followed by a welcome message in Italian and four short paragraphs in Italian describing the content. This introductory text is then followed by a list of document titles which are links to PDF files. The document titles are in English.
I’ll focus on the Italian text. Briefly:
- The web page is UTF-8 encoded
- Default language for the web page is English
- There is no markup indicating changes in language
- All Italian text is made available as a single GIF image
The unique content in Italian is provided as:
<p><img src="/internet/internet.nsf/languagelist2/Italian/$file/Italian.gif" height="300" width="500" alt="Italian Header"></p>
The alt attribute value to describe twelve lines of Italian text is Italian Header
. Not exactly in the spirit or the letter of WCAG 1.0.
If the page is intended for direct access, the titles should be in Italian, not English. If the content is intended to used via mediation this whole section is redundant because the site already provides an alternative method for mediated access.
That amount of text should never be included in an image, especially when the CMS apparently supports Unicode. it would be so simple to provide the Italian in the image as actual Unicode text.
The only languages that would provide technical difficulties in rendering Unicode text are Burmese (Myanmar), Sinhala and Khmer, but they can be done as proved by websites within Sri Lanka, Cambodia and the Myanmar Union.
DIAC language specific pages
I’ll use the Spanish language index to the Beginning a life in Australia booklets. The Spanish language page contains an introduction in Spanish and a series of links to the PDF files for each state and territory version of the booklet.
A summary of the page’s features:
- The web page is ISO-8859-1 encoded
- Default language for the web page is English
- There is no markup indicating changes in language
- All Spanish text is made available as images
All the text, including the links, is a single image that has been spliced to act as an image map. Not optimal for web accessibility and runs counter to good internationalisation techniques.
Like the Centrelink site, there are a couple of languages present that will provide technical difficulties: Burmese (Myanmar) and Khmer. Karen will provide more severe technical difficulties. The Karen languages, written in the Myanmar script, are not supported in the current version of Unicode. For further information look at the post on the Myanmar script.
Alternative formats
One interesting aspect of accessibility that often gets overlooked is the need at times to provide non-text alternatives in order to facilitate access to information. Certain new and emerging communities have low mother language literacy rates.
Centrelink has responded by developing CD-ROM based solutions involving both text and audio components. the next step would be to make the audio components of these documents available on the Centrelink website.
It would be possible have the text of a document in a web page and embed an object to play an audio file. Alternatively a link can be provided to download and play the audio file. A couple of sites have already experimented with this approach including the Kids Count site.
Other possibilities
Text based non-English links can be made more visually attractive by applying background images. W3C have a useful article on using background images for aiding in localising websites: background images that support localization.
The mechanism for navigation from the homepage to translated content is very important. If a small number of languages are involved, it is possible to list each language. A text based link would be preferable to using an image of text. The use of flags to indicate language should be avoided.
For a longer list of languages, it is possible to either have direct links to key languages with an additional English link to an index of all languages or simply have a single link in English to an index of languages. Animated images with scrolling or rotating text should be avoided.
A list of language codes acting as links to translated content may be too obscure for some users.
A couple of government websites are experimenting with using the national interpreter symbol as a link to translated text. This approach has both its strengths and weakness. Although a standard icon across governemnt sites to indicate the presence of translated content could be useful.
W3C also have a useful article on using the select element to provide access to translated content.
Posted in Accessibility, Web i18n |
October 12th, 2007 at 10:22 am
Driven by new NSW Govt web standards, the NSW Food Authroity is attempting to do both direct and mediated access. Project is not yet rolled out to all targetted languages but we are most progressed for Chinese and Vietnamese. There is a homepage ‘in-languages’ link graphic - perhaps too subtle. Then a series of side-by-side ‘in-language’ paths into the site. (see e.g. http://www.foodauthority.nsw.gov.au/n-chinese.asp and http://www.foodauthority.nsw.gov.au/n-vietnamese.asp). Its no doubt not yet perfect.
My observation though, is that the consistency required to achieve this was possible in our small case through centralisation of the publication approval process and even to a certain extent the content generation process. How large organisations and federated publication processes would achieve a similar outcome in a cost-effective way is no small feat notwithstanding an equitable and worthwhile one.
Also that our case was coded manually; we have not yet put our pilot CMS to the test of 11 languages mixed with English, with compliant code.
I’d be very interested in any CMSs that participants feel handle multiple LOTE well.
October 12th, 2007 at 12:07 pm
The side-by-side language approach is quite effective for index pages. One issue is how a multilingual website addresses accessibility requirements. One of the WCAG 1.0 priority 1 checkpoints requires marking up change in language. So Chinese and Vietnamese text needs to be appropriately identified in the markup.
Language tagging is also useful for CJK data. Using your site as an example: the site contains both Chinese and Japanese content. The stylesheets for the site do not specify the fonts to be used for Chinese and Japanese text.
When fonts aren’t specified, the web browsers will choose an appropriate CJK font based on the language tag. If the language is not specified in the markup the browser will fall back to the default font for CJK text.
Richard Ishida put together a test on web browsers’ automatic font assignment for CJK text. IE7 will default to a Simplified Chinese font for all CJK text while Firefox and IE6 will default to a Japanese font instead. Simplified Chinese text or Traditional Chinese text should not be rendered using a Japanese font. Likewise Japanese text should not be rendered using a Simplified Chinese font.
As to CMSs that support LOTE well. I don’t have a preference for any one CMS. My preference is for a CMS that is flexible and easily customised. I like to track the language of any content so that language changes can be marked up. I distinguish between the default language of a web page and the user interface language.
Internally within our own projects we have been developing PHP classes and functions that handle some of the internationalization issues. These are used in conjunction with templating systems and CSS to achieve better results within various web applications.
One important aspect, especially if Vietnamese is one of the languages that needs to be supported, is that the CMS chosen must support Unicode normalization. When you edit content, that content needs to be normalized when submitted. Before a search term is processed, the search term should be normalised. If normalisation does not occur some Vietnamese clients will not be able to search Vietnamese on a website.