Dienstag, 13. Dezember 2011

Shift from 3.6 to 8.0 - looks like it works

The weekly Report for SL for the week 2011-12-11 states:

The biggest changes is in 7.0  (-30,6%) and in 3.6 (-13%).  While the drift 7.0>8.0 has been around for some time and thus expected, the drop in 3.6 (amounting to 17,10% with 3.5 thrown in) could well be a result of "gentle prodding", activated lately, with users of those versions getting a note about upgrading.  The increase in 8.0  - in spite of the overall drop of 3% - supports this gut feeling.

Dienstag, 15. November 2011

How can you improve the average quality of spoken English?

Simple: stop trying and give Anglosaxons a chance at your own mother language.

from Süddeutsche Zeitung Nr. 263/ p 11:Audienzgerecht
Would anybody come to Germany to get exposed to bad English?
German as language for sciences ? A tricky question.


In Lessing's "Minna von Barnhelm"  the baffled Frenchman de Riccaut la Marliniere asks: "Vous speakez pas French, Madame? " and the heroïne gives him a straightforward, common sense answer: "Sir, I would give it a try in France." This proud modesty in relation to our own mother language is missing everywhere in this country.

A New Zealander of Chinese Origin, living in Germany for years, complained, that she hardly gets a chance to speak German:  even the clerk at the post office counter will grab the opportunity to practice his English on her.This is rather rude from Germans, even when meant as a polite gesture. A participant in the discusssion brought it to the point: do we seriously expect foreign students will come  to Germany to  listen to lectures in bad English?...


Sonntag, 6. November 2011

Thar She Blows - Quantitative Look at the Localize.Mozilla.org site


Note: this is a draft version of the article and will be changed resp. expanded in the future. stay in touch;) The intention of this communication is look into the rear-view mirror, presented at this site and learn about how we get the work done.  

A number of localization projects - for instance the SuMo, AddOns, MDN  to name the biggest - is running via Localize/Verbatim site:

Here's the basic statistics of the work in progress:
  • active projects:  13
  • languages involved: 85
  • number of active l10n projects:  335
The data, discussed and analyzed, have been culled from this site. I appologize in advance for possible inconsistencies. Note also this site is alive and kicking and has been changing all the time.Still I do not think they would change the results and the over-all impression presented here.

About projects

Here's the diagram of running projects (sorted by percent finished):

Fig 1: Degree of completion - overall in %

The bars above are all scaled to the same total size of 100% on the web site. Using a different metrics, i.e. the number of words involved,  the situation looks quite different:

                                           Fig 2: Degree of completion - overall in words

This is a much more relevant point of view - after all we localize words and not percents. Note that Addons Mozilla (325k) and Support Mozilla (190k)  together have the amount of red twice the size of green in the rest of the projects.

For every project and language involved the date and time of last access is also recorded. Below the amount of unfinished work in words vs date of last access:

Fig 3: Amount of unfinished work vs date of last access

For 41% of the unfinished business the date of the last access is unknown. However, given the level of "unfinished" for these records one can safely assume they have never been really active. The following diagram shows the average levels of project completion  for the four date/time slots:

Fig 4: Percent unfinished vs date of last access

The "unknown" slot (leftmost column) could indeed be named "never". Quite a different story the other three time slots:  they paint a much greener picture. The rising trend shows the site is alive: fresh new projects turn up to be localized, while older projects may be getting a fresh layer of paint.

About languages

The above analysis gives an overall picture. What can we say about different languages and their record? The analysis below is based on the following data for each language
  • the number of ongoing projects
  • the total number of words in the portfolio
  • the average completion level
Alltogether 335 records for 85 languages (Aafrikans - White Russian)  X  12 projects (Addons - Webify me)  have been clustered using Ward's method:

Fig 4: dendrogram of active languages 

There's clearly at least four clusters present, so the decision was to ask for 6 clusters, which turned out to be a pretty good guess. Based on their average properties, the six clusters were named as follows:

cluster # languages
Window shoppers 32
Tyre kickers 11
The cautious 7
VTOLs 12
Working class heroes 15
The awesome 8

Let us have a closer look at the suggested clusters:
  • Window shoppers: completion level  ~ 1%, number of project 1-2, of small sizes. The favourite project in this group is Bugzilla Component Descriptions with 121 words to be translated.
  • Tyre kickers: they have on average 4-5 projects, completed 25% on average, and working on a portfolio half the possible maximum size.
  • The cautious: one would be tempted to include them in the window shoppers' group (1-2 small size projects), but their average completion level is at respectable 94%. One can expect they will come back for more.
  • VTOLS: they have 3-4 projects of ~20% of the total, with the completion level close to 50%. They are taking off.
  • Working class heroes: a group with distinctly higher number of projects (6-7) , average portfolio size above 50% and an attractive average completion level of ~70%.
  • The awesome: they have ~10 projects active, amounting to close to 90% of the total portfolio, with the average competion level of 92%.
Note: for the curious, who would want to know, which cluster their language falls into:
  • check the number of active projects for your language
  • check their total size, expressed as percent of the total portfolio size (57313)
  • check you over-all competion level
  • consult the description above
    The difference between clusters goes deeper and it involves the internal workings of a given localization group and its mode of operation. More on that later.

    Who done it!? - about language teams

    The analysis in this section is based on on-line activity log, available for every language (example SL - AMO project):

    Fig 5:example of the on-line activity log
    The totals for the language group can serve as an indicator for the on-line activity of the language group respectively the individual.  Note that verbatim site serves also as a VCS gate and the batch-mode activity of translators is not recorded here - as we will see, this can be gauged only indirectly.

    Fig 6: average on-line activity (left) and number of projects (right) for the clusters

    The top two clusters have significantly higher on-line activity, both in terms of events (left) as well as in the number of projects, active on-line.

    Behind all these numbers are people - altogether ~460 participants. Let us assume there's two projects running for the XY language, with Alice doing the majority of the work for project #1 and Bob  for project #2. Their share of work is for instance 80-20 for Alice, and 30-70 for Bob. The maximum score (80% in this case, or rather 100-80 = 20%) is a measure of collaboration within the language group and is shown below for the language clusters:

    Fig 7: cooperativity level within the language clusters

    In case of the four clusters on the left it does not make much sense to talk about cooperativity, due to the number of projects involved (1 or 2, maximum 4 for tyre kickers). The two clusters on the right hand side indicate however a growing level of cooperation, accompanied by the specialization and spreading of tasks.

    Verbatim is just a part of the Mozilla l10n landscape!

    Here's an off-the-cuff summary of the localizeable materrial within Mozilla:

    Fig 8: Mozilla l10n segmentation
    The material discussed in this article is about a quarter of the total task. It is also second to the product localization. This all can mean different things to different people / clusters:

    • Window shoppers: they are for sure involved with the product localizaction, at least with localizing Firefox. Their level of completion can not be much different there.
    • Tyre kickers: ditto, possibly first kicking visits ton TB and SuMo support articles
    • The cautious: they may branch out - or may have already done so
    • VTOLS: make come unexpectedly at the top some time soon
    • Working class heroes: trying to do all of it with a mixed level of success
    • The awesome: doing it all and succeeding, but teething pains
    The immediate problem, facing anybody branching out of any given segment in the Fig 8, is the change of the working environment:

    segment repository env. file types
    Mozilla SL - aurora HG Dtd, Properties
    Verbatim CVS (verbatim) Po
    SuMo articles Django online UTF8 + syntax
    Web contents SVN Html

    The localizer, whose qualification should above all be of a lingustic nature, should thus also be a wizzard in all kinds of incompatible environments. There's not many of such kind in any locale, and below some critical language size I am afraid there's none. Of course tyre kicking and window shopping does not require that much speciation...


    Every one of the described language group types has its own specific immediate problems, with the common one of "too much work and not enough time and manpower. Here's my personal suggestions

    Window shoppers, tyre kickers

    It is easy to make a commitment, but the road gets more and more slippery with the time spent on the job. They should focus on short projects - even if slightly ridiculous, Bugzilla Component Descriptions is perfect project to kick off, with Mozillians and Affiliates next on the list. There should be more of projects like this.

    The cautious, VTOLs

    I am sure that at least the cautious will not jump right into - say - Mozilla AddOns. Some guidelines on what gives a localizers more bang for their bucks would be welcome. Support articles are a good example for that: the number of articles is humungous by now, but doint the first 10 or 20 covers a lot of ground. Unfortunately this is not so in some other cases - see  Fx, TB, Amo; MDN projects.

    Working class heroes

    They should look for fresh members in the l10n team and for productivity tools. Way to higher levels in any case involves building a team and division of tasks, plus a high level of cooperativity among the members of the team. "Owning" some project is a sure way down into home-made problems.

    The awesome

    The localizers need to hear their story. And Mozilla should listen closely when they raise their hand.

    Sonntag, 3. April 2011

    Open-Source-Software for translators

    by Alexandra Kleijn - originally published in Heise C't Open source

    The world of translators is a Windows world: MS Word, and the translation tool SDL Trados are the measure of things. Whoever prefers a different operating system or does not want to bow to the dictates of SDL, has so far had a hard time. But now alternative tools are available - even for Mac and Linux users. And they are in part Open Source.  

    Hardly any professional group is as loyal to Microsoft as translators. Not surprising when you consider that their clients provide basic texts in many cases still in the well-known doc format -. The migration to Office 2007 and the still relatively new 2010 - and on XML-based, open file formats - is taking place only slowly.

    With the open source office suite OpenOffice, however,  a free alternative to MS Office is available, which is hardly inferior to its competition in terms of its functionality. A major advantage of the open-source suite is the fact that it runs equally well on Windows, Mac OS X and Linux. OpenOffice as an independent office software is a good alternative for a number of users as well.

    OpenOffice is open source: the application is available together with its source code and anyone who feels called upon can change the application according to his own ideas. Open-source software may also be in general distributed without any restrictions.  Even if the developer of an open source program is free to ask for a license fee for his application, open-source software is usually free of licensing costs. Open source software is generally based on open standards and data exchange between different programs is easy.

    Document formats: MS Word reigns
    OpenOffice Writer, the word processing application in OpenOffice, can handle the well-known Doc format, used in  Microsoft  2003 and its earlier versions is. The problem, however, are MS-Office documents with macros, many embedded images and forms. Even with complex formatting, such as a division of the text in columns, must often be handled manually.

    It does not look so good for the exchange of documents using the new Microsoft's XML-based Office Open XML
    format (OOXML), the default file format for MS Office 2007 and 2010 (file extension. Docx). The original format is often lost when opening docx files in OpenOffice. The way back  is (still) blocked: Documents can't be saved as docx in OpenOffice Writer does not save as docx, at least not under Windows and the Mac - the Linux version of OpenOffice offers this possibility.  This rather limits the  usefulness of OpenOffice as a MS-Office replacement for translators: the docx file received from the client for translation, must be returned  in the old DOC format or in the ODF format, available in OpenOffice.

    ODF, a standard format  for text documents, approved by ISO, can be opened directly in Word 2007 with Service Pack 2 and in Word 2010. In older MS Office versions one can upgrade to ODF support using the Sun ODF Plug-in. The new owner Oracle will ask for money, but one can still download the plug-in from  Softpedia free of charge. The plug-in knows the functions of the new ODF specification 1.2, Microsoft Office 2007 and 2010 only support ODF 1.1. Mac users are left with nothing: Microsoft Office for Mac 2008 and 2011 do not recognize the ODF format.
    LibreOffice saves in docx format as well
    Since autumn 2010, the OpenOffice offshoot LibreOffice has been courting the attention of users. This so-called fork was created after the Oracle's takeover of the Sun, lead developer of the open source office suite. The current LibreOffice 3.3, based on the same source code as OpenOffice 3.3,  can open as well save docx files. It is plagued, however, by the same formatting problems applies  OpenOffice.
    For Mac users Microsoft Office for Mac 2011 is available
    since last October. The new version has move closer to the Windows version. It also finally supports Visual Basic for Applications (VBA), so that Office macros should  work across platforms.

    Translation Tool Translation Memory
    For many  translators a Translation Memory System is indispensable. In the market for TM environments one can meet a lot of providers. In the last few years the Top Dog SDL, the manufacturer of the TMS Trados, has gotten quite some competition breathing down its neck. In Germany, for example,  Across and MemoQ. All these tools are, however, proprietary software. After all, cross-platform TM tools such as Wordfast Pro and Swordfish have broken Microsoft's hegemony somewhat. They are written in Java and run also under Mac OS X and Linux.

    Few years ago everybody was cooking his own little dish, with all the resulting compatibility issues. The latest trend goes increasingly toward open standards: XLIFF (XML Localization Interchange File Format) for document exchange and TMX (Translation Memory Exchange) for translation memories. Both are based on XML. The eXtensible Markup Language separates the content and the other information such as formatting and meta tags, and has established itself as a standard for cross-platform and cross-program data exchange of all kinds. The situation is by no means optimal: many manufacturers push their own interpretations of these standards - so for many, the "dirty" Bilingual Word document remains the measure of things.


    OmegaT - OpenSource TM-based translation tool
    for Windows, Mac OS X and Linux
    At the moment the only open source TM tool, ready for the production use, is OmegaT.  It is written in Java  and can work directly on Microsoft's OOXML texts. Documents, created in earlier versions of Office documents , must first be converted either with MS Office 2007/2010/2011 (Mac) or to open office before they can be translated. When converting to OpenOffice, the previously mentioned risk still exists, namely that expensive-looking documents do not survive the transition without any formatting changes.

    The project openTM2 has yet to grow beyond the beta stage. The focus here is the open-source implementation of a TM oldie: the IBM Translation Manager. The lofty goal of openTM2 has been to become the reference platform for the translation memory exchange standard TMX. The trial version currently available runs only on Windows. 

    ...to be continued ... 1/2                  

    Translation: smo 

    Montag, 7. März 2011

    A week of awesome tweets

     Note: this has been posted Jan 16th, 2011 9:04pm somewhere else. I decided to bring it in here, to keep my chickens in one place.

    One of the features in the new SUMO is the invitation to help Firefox users on Twitter (https://support.mozilla.com/en-US/army-of-awesome). Tweets, mentioning Firefox,  get filtered out from the main stream and the Army of awesome (AoA) gets the opportunity to help.
    Here's one typical yelp for help, with user's Firefox evidently falling to pieces:

    The careless spelling and the four-letter wording is typical for this environment: I have a problem and I want to have it fixed now. The answer provided is a copy of the 16-odd boilerplate answers available to AoA.
    I have collected about a week of answers to find out
    • how many are involved in AoA
    • what are the problems they try to address
    • how successful they are
    • ... just curious
    How many are involved?
    In the week of 7-15 Jan there were at least 772 tweets with #fxhelp tag; at least, because my own participation in this army has been consistently and completely stricken from the public tag listing, statistics etc since around the middle of December. I have thus added a typical week of my own saved contributions. Note that it's quite possible that the dark number is even bigger;  nobody but Twitter can tell.
    How many are involved in the army: there's been 54 contributors. Here's a general picture:

    legend: how many participants tweeted how many times; for instance 5 tweeted 10-20x.
    On the low side there's a group of casual participants (<10) with 36 members, with 20 out them turning up only once. On the high side there's  the group at the top with 7 participants who twitted considerably more.
    The following diagram shows the total number of tweets for every of the five groups above:

    The top group, counting 7, accounts for 538 / 853 = ~60% of the total traffic.
    What are the problems?
    The answers provided indicate what the original question has been. The boilerplate suggestions have the possible problems and suggestions pretty well covered. The suggestions are provided in five groups, tagged with #~ in the pie-chart below:

    The support questions were one third of twits and,  surprisingly,  the "no URL provided" group was second biggest at 25%. Here's the inner workings of the Support group:
    •   107 Fix crashes
    •    66 slow Firefox startup / Firefox is slow
    •    53 Firefox does not behave
    •    32 High RAM usage 
    •    16 Quick Firefox fixes
    The "Get involved" group on the other side of the spectrum looks as follows:
    • 10 Become a beta tester
    •   9 Report a bug
    •   6 Get involved with Mozilla
    •   3 Mozilla Developer Network 
    •   0 Join Drumbeat 
    The question of course remains: does it matter? Is there any traction behind it? Surprisingly I have received a lot of feedback, about 50 since the beginning of January. Negative reactions were just a few, some of them sobering and useful like this one:
    • ... that was lame. give me a link when I have no browser.LOL I rebooted & now i'm good. thanks. 
    The majority, however, was worth a smiley each. Here's a few:
    • thanks :) yeah I had a couple to update but it is all good now :) I have bookmarked that link too and I will check it regularly :)
    • My addons/plugins are all up to date. I'll try disabling them to see if any are causing problems. Thanks for the suggestion!
    • Thanks for the suggestion will check it out. Happened when we attached files in google mail.
    • Hey thanks a lot!!! :)
    To avoid ending on the high-fiving mote, here's a few thoughts:
    • Tweeting can turn into spam if used in a fire-and-forget mode. Do not forget to check the feedback.
    • The perfect score of 0 for Drumbeat provokes me. To be honest, it is hard to sell Drumbeat to desperate teenagers, who can't tweet to their heart's content because Firefox is giving them hives.
    • Thinking of those 20 single-twit contributors: fact is they tagged their message with fxhelp. Our future warriors? A pool of dig-Mozilla aka 5$ Mozillians? Dig it!