Google AJAX Language API first impressions

By Kevin Pirkl (Intel) (21 posts) on March 21, 2008 at 12:24 pm

Darn I have been one-upped by Google and they finally implemented a serviceable API onto their language translation services!


Google announced yesterday a new AJAX Language API and while language detection is not high on my priorities the actual “contextual translation” of HTML and TEXT has always been of great interest to me.  In fact I create a screen scraper version of Google HTML Translator page and wrapped it into a nice jQuery JavaScript library for fun. 

My original work (out I might mention it was done over a year ago) was finally posted to my personal blog ZombieBob's Weblog on March 11 with the full article on Google Docs “Spinning SOA onto Google Language Translator”  shows off some original work creating a translator API that screen scrapes the Google language Translator page.

You might want to peek at the examples page that contains my original work on the subject quite a few implementation samples Localizing Web Page Elements using jQuery

Here is the documentation for the Google AJAX Language API and the Class reference can be found there as well.

My work started from this Google example translator page saved to disk and modified a bit to support a little more text and jQuery style code and here are some of my notes.

 

I started with a TEXTAREA field so I could submit more data at one time and stuffed in the following HTML for test purposes.

<div>This is an test article for DPD. This <div>article</div> <div>belong sto Multi Core.</div></div>
<h2 class="sectionHeading">Section I - This is good stuff</h2>
<p>This is the stuff that I like to see..</p>
<h2 class="sectionHeadingText">Article Heading A</h2>
<h3 class="sectionBody">Subsection - I.1</h3>

Browser imposed HTTP GET limitations

Around 2048 characters or 2000 characters for the path portion. I Googled the terms “Browser imposed HTTP GET limitations of 2048 characters ” for you and came up with this “WWW FAQs: What is the maximum length of a URL? ”  which is good enough to explain some details the limitation.I stuffed the test page with the HTML above multiple times so the number of characters would go well beyond the HTTP GET limit of most browsers so I could show a point.  In this case you have an character limit and imposed cross domain security constraint if prevents and API (and this one as well) from doing a POST and thus getting around the 2048 characters limit.  The end result is that the API will bomb because Google’s API doesn’t check the URL string length for this limit and warn you about it.  Perhaps they should be checking this. Note that this just means that you have an imposed limit to the size of the text that your planning on submitting to this API and you should plan and code accordingly. 
 Context - First letter capitalized




Note that after a period the return translator text capitalizes the first character in the DIV tag.

DIV is moved - Single word context nested in HTML

 

When a word is encapsulated within an HTML tag it gets moved.  Notice that the word "article" on the return translation gets left shifted.  If you put in an HTML STRONG tag you can see the same results.  This is pretty annoying.

 Carriage Return/LineFeed and HTML Entities translation resultsHere is a another test to understand some more about data processing glitches.  Data submitted was the following  hello\r\nworld

I used Microsoft Fiddler to capture the following underpinnings

 

Microsoft Fiddler copy of the captured URL is

http://www.google.com/uds/Gtranslate?callback=google.language.callbacks.id101&context=22&q=%26nbsp%3Bhello%0D%0Aworld&langpair=en%7Ces&key=internal&v=1.0&nocache=1206122786074Note that the HTML entity   is there sent in the request header as is the \r\n characters..

The Fiddler Response only shows the following return results.  Note that the HTML entity is gone and the only the newline character is sent back (the \r\n context was lost an I think that a big deal.) 

google.language.callbacks.id101('22',{"translatedText":"Hola \n Mundo"}, 200, null, 200)


I did a quick test of trying a space instead of the HTML non-breaking space and this is the result.  Note the extra two spaces in front on the translation (space space Hola \n Mundo.) 

google.language.callbacks.id103('22',{"translatedText":"  Hola \n Mundo"}, 200, null, 200) 

Rewriting the code to a more favored JS Library style

Code rewritten into a more jQuery like fashion and as I would get extremely tired of typing google.language.translate() and since the pattern/callback almost always remains the same I rewrapped it like so.The original code taken from the Google example Page

   <script type="text/javascript">
    google.load("language", "1");
    google.setOnLoadCallback(submitChange);
    function submitChange() {
      var value = document.getElementById('source').value;
      var langpair = document.getElementById('langpair');
      var pair = langpair.options[langpair.selectedIndex].value.split('|');
      var src = pair[0];
      var dest = pair[1];
      google.language.translate(value, src, dest, translateResult);
      return false;
    }
    function translateResult(result) {
      var resultBody = document.getElementById("results_body");
      if (result.translation) {
        resultBody.innerHTML = result.translation;
      } else {
        resultBody.innerHTML = '<span style="color:red">Error Translating</span>';
      }
    }
  </script>

I did not let my code auto-run so I just removed that line and I removed the HTML FORM as well can the jQuery stylized everything and just placed a click event on the button.   Here is the modified version of the code (in this case I make multiple calls to gt(...) to translate multiple items.

     <script type="text/javascript">
      google.load("language", "1");
      $("#cmdTranslate").click(function(){
            var langpair = $("#langpair").get(0);
            var pair = langpair.options[langpair.selectedIndex].value.split('|');
            var src = pair[0];
            var dest = pair[1];
            var gt = function(fld){
                        google.language.translate($(fld).text(),src,dest,function(result){
                              if (result.translation)
                                    $(fld).text(result.translation);
                        });
                  }
            gt(".pageHeading");
            $("#PublishedModifiedDate b").each(function(){gt(this);});
            gt("#DocumentBody");
            return false;          
      });
  </script>

What I would consider on a live implementation

Working from our content delivery system and making some modifications to add the translator API yields this small screen snip

 


In this case I added only the Article Title, some Labels on the page and the main Document Body for translation.  Given what I learned about the Google Translator API above I know that stripping out all the HTML, keeping the carriage returns, then translating the whole text and post translation wrapping it in an HTML PRE tag would be the way to go.  Internet Explorer and Firefox seem to display the \n differently though which messes the result and some string replace of the translated results might be in order (convert the /n to /r/n) for a better unified experience in both browsers. 

In my own wanderings I find that this style jQuery code is cleaner and easier on the brain.  jQuery give more comprehensive bite to JavaScript code.  I think I would probably use the jQuery Core Data Cache for storage of the original HTML so I can revert it all if necessary as I had that concept in my own earlier work example and before I knew about the jQuery Data Cache. 

Since I did not get around to all of these things I mention I will just leave you with the end result anyway.  Given the API Querystring length limitation such a bit of work would not work on a large document anyway.

 

There are a ton of different possible uses here from auto text chat style translations to Twitter translation on the fly or auto label text localization on data entry forms.  Take your pick..

One last thought before I hit the Publish key and this goes down on the record books.. 


My thinking is that context aware HTML page translation or machine translation of HTML snippets is perhaps a long ways away. While I am sure there are tools you can hook that make page translation an easy thing to do and the fact that Google Page Translator services engine does a excellent job of keeping the look and feel (I would not mind seeing some notes from the Google Team on how they do that) I still find the first version of this API lacking.  For the small stuff it's very cool but who wants to sweat the small stuff anyway...

Well that is about all I have to say for now and I hope these details help you in making your own decisions.

Kevin Pirkl

Categories: Uncategorized

Comments (6)

March 24, 2008 3:00 AM PDT


choonkeat
google translate + a wiki interface to contribute corrections (to the site only) might take us far enough to giving 80% website that 80% completeness in i18n. better than 99% websites with 0% i18n support?
March 24, 2008 1:54 PM PDT


Kevin Pirkl
One big item that I missed is that the return results have the special characters like > or < unicode encoded... If you want to call the API with anything but text you will need to decode that return data... Outbound %3E and inbound \u003e
March 26, 2008 5:32 PM PDT


Roland
If you folks want to see the API in action, we've implemented in a live webchat solution....

http://hab.la/articles/2008/03/21/hab-la-can-now-translate-t.....our-choice
September 22, 2008 10:29 AM PDT

Kevin Pirkl (Intel)
Total Points:
240
Status Points:
190
Green Belt
Cool work Roland, sweet!

I also recently ran across this bookmark scriptlett that converts HTML blocks directly on any web page (since the TOS for the translation API only supports 500 characters) http://code.google.com/p/jquery-translate/
November 5, 2008 9:50 PM PST


chandan paul
I need translate code . please if you have this code intregrating google api or others api please send me this code in my mail.
December 19, 2008 6:30 AM PST


Online Translation Service
I have developed http://translator.vndv.com/ page which uses Google AJAX Language API. I also used Google AJAX Language API to translate the user interface of Online Translation Service to the following languages: English, Arabic, Bulgarian, Chinese, Croatian, Czech, Danish, Dutch, Finnish, French, German, Greek, Hindi, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Romanian, Russian, Spanish, Swedish.

Trackbacks (0)


Leave a comment  

To obtain technical support, please go to Software Support.
Name (required)*

Email (required; will not be displayed on this page)*

Your URL (optional)


Comment*