Contact Me

 LinkedinFacebookFacebookFlickrTwitterSlideshare Google Buzz



C-DAC and Indic URLs

Suggested framework for Malayalam URLs ‘flawed’

Technical inconsistencies and ‘mistakes’ mar initial spadework

C-DAC game for another round of consultations

Developer community seeks technical debate

KOCHI: The initial technical spadework done to pave the way eventually for the use of Malayalam in Internet domain names (website addresses) is riddled with technical inconsistencies and “mistakes,” say key members of a developer community associated with Malayalam open-source software projects.

Unless these are sorted out, Internet users may be saddled with certain problems and limitations in making use of Malayalam website names later on, when the language is brought under the purview of the Internationalised Domain Name (IDN) regime, say members of the Swathanthra Malayalam Computing (SMC) community.

The draft technical policy for the use of Malayalam in website addresses was released by the Centre for Development of Advanced Computing (C-DAC) about a year ago.

The IDN regime was launched globally by the Internet Corporation for Assigned Names and Numbers (ICANN), the body that oversees the functioning of the Internet web address system, in November 2009. A top C-DAC functionary in Pune said the draft policy could be open to changes if “deemed” necessary.

It would not be appropriate to talk immediately of “mistakes” and “inconsistencies” in the policy without going into the details of the logic behind these, Mahesh Kulkarni, Programme Coordinator, C-DAC, Pune, told The Hindu.

The SMC has, however, been told earlier this week that C-DAC did not mind having one more round of consultation on the issues raised.

Many languages ahead

Since the launch of the IDN, the languages of a growing number of countries have become part of the global web address system, but India is in the process of getting technical clearance for the use of the first set of Indian languages in website addresses.

The IDN system makes it possible for languages with non-Latin scripts to be used in website names. Its launch, about a year ago, has technically opened up the possibility for using many of the world’s key languages, spoken by large sections of the world’s population, such as Chinese, Arabic and Hindi, in website addresses.

India has drawn up a draft policy that envisages the introduction of the country’s languages to the Internet web address system in phases. However, the first clutch of languages that are likely to be introduced soon does not include Malayalam. There are several challenges in using different, yet related, languages in web addresses because of possible mix-ups in the way the characters that make up these languages are used for the purpose. There are several characters that could be used individually by more than one language.

One particular concern is that such issues can be exploited by those with malicious intent to mislead and cheat Internet users.

It is to ensure the safety of users that ICANN has asked the respective countries to prepare technical rules that spell out the range of characters that are to be used in website addresses — in combination with which other characters and in what manner. Much of it has to do with Unicode, an e-coding standard that covers a wide range of letters, punctuation, and symbols in use in all languages.

Modern script

One of the issues is that the draft IDN policy for Malayalam is based on the use of the modern script. SMC members question the technical assumption that bonds the IDN policy with modern Malayalam script, and say that there was no precise definition as to what constituted the modern script. And the issue of security should be taken up “independent” of the kind of script used, they argue.

The SMC members have also called to question the C-DAC position that the policy has been framed on the basis of a study of script rendering by existing browsers. Standards and policy that point to future implementation should not be based on how existing browsers render text, they contend and point out that incorrect rendering could be due to the “buggy behaviour” of the browsers.

One other issue is that the SMC community is concerned about is that the technical policy ties up web addresses to version 5.1 of Unicode, which is said to have thrown up technical problems as far as Malayalam is concerned. Its members say that web addresses should work with any Malayalam font that complies with Unicode version 5.0 and “correctly implements Malayalam language rules.”

The SMC community has called for a technical debate on the draft IDN policy covering all Malayalam stakeholders — language experts, language computing experts and developers. C-DAC has now indicated that it was game for a round of consultations in this regard.

Courtesy: The Hindu.

1 comment to C-DAC and Indic URLs

  • സിഡാക്കിന്റെ മറുപടി നിങ്ങള്‍ വായിച്ചോ എന്നറിയില്ല. എങ്കിലും കുറച്ചു്
    ലളിതമായി വിശദീകരിക്കാം.
    മലയാളത്തിലുള്ള ഇന്റര്‍നെറ്റ് വിലാസങ്ങള്‍ക്കുള്ള മാനകം ആണു് നമ്മള്‍
    ചര്‍ച്ച ചെയ്യുന്ന ഡോക്യുമെന്റ്. ഇതെത്രമാത്രം
    കുറ്റമറ്റതായിരിക്കണമെന്നു് വിശദീകരിക്കേണ്ട ആവശ്യമുണ്ടെന്നു
    തോന്നുന്നില്ല. ഇംഗ്ലീഷ് വിലാസങ്ങള്‍ തന്നെ സ്പൂഫ് ചെയ്യുന്ന
    പലവാര്‍ത്തകള്‍ നിങ്ങള്‍ വായിച്ചിരിക്കും. സുരക്ഷ എന്നതിനു പുറമേ,
    മലയാളത്തില്‍ ഉപയോഗത്തിലിരിക്കുന്ന വാക്കുകളെയൊക്കെ ഇന്റര്‍നെറ്റ്
    വിലാസത്തിനായി ഉപയോഗിക്കാന്‍ സൌകര്യപ്പെടുത്തുന്ന തരത്തിലായിരിക്കണമല്ലോ

    പക്ഷേ, സിഡാക്ക് തയ്യാറാക്കിയ നയരേഖയില്‍ മലയാളത്തിലെ കുറേ അക്ഷരങ്ങളെ
    ഇന്റര്‍നെറ്റ് അഡ്രസ്സില്‍ നിന്നു ഒഴിവാക്കുന്നു. ഉദാഹരണം ന്ത, ന്ന
    എന്നിവ കാഴ്ചയില്‍ ഒരേ പോലെ ആണെന്നു പറഞ്ഞു് variant characters എന്ന
    പട്ടികയില്‍ ഉള്‍പ്പെടുത്തിയിരിക്കുന്നു.
    “Variant characters are characters with two or more representations
    (that may appear confusingly similar to each other)”
    ഇതെത്രമാത്രം മണ്ടത്തരമാണെന്നു് മലയാളമറിയുന്ന
    കൊച്ചുകുട്ടികള്‍ക്കുപോലുമറിയാം. ഈ അക്ഷരങ്ങള്‍ എങ്ങനെ കാഴ്ചയില്‍ ഒരു

    ഇതിനെ ചോദ്യം ചെയ്ത നമുക്ക് കിട്ടിയ മറുപടി നോക്കൂ

    “The variant table is based on the observations how Malayalam
    characters and conjuncts are rendered in the address bars of standard
    browsers like IE, Mozilla and Safari. While ന്ത and ന്ന are perfectly
    rendered in Mozilla and Safari, they are not legibly rendered in
    various versions of IE. The mirror imaged nature of the glyphs was not
    the criterion for the two glyphs to be qualified as variants. Also
    note that the variant table is not a full-proof mechanism which can
    prevent spoofing.”

    അതായതു് ഇന്റര്‍നെറ്റ് എക്സ്പ്ലോറരില്‍ ന്തയും ന്നയും
    ചിത്രീകരിക്കുന്നതില്‍ കുഴപ്പമുണ്ടെന്നതാണു് കാരണം!. ഇതുപോലത്തെ
    മണ്ടത്തരത്തിനു് എന്തു മറുപടി പറയണം?

    ഇനി സിഡാക്ക് പറഞ്ഞിരിക്കുന്ന അടുത്ത മണ്ടത്തരം നോക്കൂ.
    “The IDN system devised for Malayalam is based only on the modern
    script. It doesn’t address the old script or the fonts based on old
    script. Also, a detailed study was done before proposing homographs in
    each of the languages. The study included observing the visual form of
    the conjunct in the point size of the Address bars of major browsers.
    The mirror imaged nature of the glyphs was not the criterion for the
    two glyphs to be qualified as variants.”

    IDN പുതിയ ലിപി ഫോണ്ടേ പിന്തുണയ്ക്കൂ. പഴയ ലിപിയില്‍ എഴുതിയ മലയാളം
    അഡ്രസ് idn ല്‍ പറ്റില്ല എന്നു്. പോരാത്തതിനു് നിലവിലെ ബ്രൌസറുകളിലെ
    അഡ്രസ് ബാറില്‍ കൂട്ടക്ഷരങ്ങള്‍ എങ്ങനെ വരുന്നു എന്നു
    നിരീക്ഷിച്ചശേഷമാണു് പോലും ഇങ്ങനെ തീരുമാനിച്ചതു്.
    ഫോണ്ടെന്തു് ? റെന്‍ഡറിങ്ങെന്തു്? ലിപിരൂപങ്ങളെന്ത്? യുണിക്കോഡ് എന്തു്?
    ബ്രൌസറെന്തു്? ഇവയെപ്പറ്റി ഏകദേശധാരണയുള്ളവരാരെങ്കിലും ഇത്തരം മണ്ടത്തരം

    ബാക്കി എന്ന പേജില്‍ നിന്നും വായിക്കുക.