Language Examples
Here are some examples of Indic languages displayed on a webpage.
- Sanskrit परकत हगदपुतक रबह गद हबगलस ले्बूीांम
- Hindi इन भाषाओं में
- Bengali বাংলা
- Telugu తెలుగు
- Marathi मराठी
- Tamil தமிழ்
- Gujarati ગુજરાતી
- Kannada ಕನ್ನಡ
- Malayalam മലയാളം
- Punjabi ਪੰਜਾਬੀ
UTF-8
Use UTF-8 character encoding everywhere. More specifically, use UTF-8 without the byte-order mark (BOM). Set the character encoding to UTF-8 in your HTML, your editor, your database, and your web server (Apache).
Set UTF-8 in Your Editor
Save all your files - .html, .php, etc. - as UTF-8 without the BOM.
UltraEdit
In the status bar, to the right of the Ln and Col numbers, you will see either DOS or U8-DOS, to indicate the encoding as either ASCII or UTF-8 respectively.
To change an open ASCII file to UTF-8, click File -> Conversions -> ASCII to UTF-8 (Unicode editing).
By default, every new file you create in UltraEdit uses ASCII encoding. To change the default, click Advanced -> Configuration -> Editor -> New File Creation, and select the option Create new files as Unicode.
To open existing files, correctly, set the config like this: click Advanced -> Configuration -> File Handling -> Unicode/UTF-8 Detection, and check Auto detect UTF-8 files.
In DreamWeaver
Modify --> Page Properties --> Title/Encoding , use the Encoding DD to change it to UTF-8.
Set UTF-8 in HTML
Add the charset meta-tag to the head section of every HTML webpage. Make this the first line after the <head> tag, even before the title tag.
<meta http-equiv="content-type" content="text/html"; charset="UTF-8">
Or the new shorter version for HTML5:
<meta charset="UTF-8">
Set UTF-8 in Database
Both mySQL and Postgres normally default to UTF-8. If this is not the case in your installation, then set the charset option when creating the database.
Keyboard Input
After changing language, bring up on-screen keyboard.
Control panel -> Ease of Access Center -> Start on-screen keyboard
An alternative to installing the Windows language keyboards:
http://www.branah.com/
Sanskrit and Malayalam Language Tools
Alphabets
Sanskrit is the mother tongue of Indo-European languages, including Latin, English, Spanish, French, German and Italian. Sanskrit is also the basis of most languages spoken in India, especially in the north.
Hindi, the most common Indian language, shares the same written script as Sanskrit, although the sentence structure and grammar are quite different. Pure Hindi, called "shuddh Hindi," is directly from Sanskrit and thus shares many words with it. Hindustani is the version of Hindi that is half Urdu, which is from Persian. Hindustani is the most common form of Hindi spoken in India today.
It is important to distinguish between a written script and a language. A language is a way of communicating, and has a grammar which defines its word and sentence formation. For example, "I eat soup" is a simple sentence in the English language. To write a language on paper, you need a script. In English, we use what is called Roman script. The Sanskrit language is usually written in a script called Devanagari.
Terms
language a way of communicating, and has a grammar which defines its word and sentence formation. For example, "I eat soup" is a simple sentence in the English language.
script a way of writing a language on paper
glyph each letter in the script is a glyph
diacritical marks little marks added above and below the letters Other Indo-European languages use the Roman script as well, sometimes with little marks added above or below certain letters, called "diacritical" marks. For example, the French ague or the German umlaut.
99.9% of the time, one particular language will be written in one particular script. Sanskrit is the exception.
Sanskrit uses Devanagari script, same as Hindi, Marathi, and Nepali. But some vedic documents uses additional characters and accents that are not in this script, and not in unicode. This is a special issue being addressed by committees.
Assumption: script = alphabet
Language Families
Indo-European Greek Italic Romance, developed from Latin in 6th thru 9th centures Spanish, Portuguese, French, Italian, Romanian, and 18 others proto-Germanic, iron age language Germanic German, English, Swedish, Dutch Greek Early Cyrillic, 9th century AD Cyrillic Slavic, Russian Arab Persion Indic Sanskrit
Latin Alphabet
- Also known as the Roman Alphabet.
- originated from the greek alphabet
- first used to write latin
- Much of the world now uses the Latin alphabet.
- Most european languages, including English, German, Spanish
- Czech, Polish, Romanian, Vietnamese, Igbo
- When the Soviet Union broke up, many Eastern European countries switched from Cyrillic alphabet to Latin alphabet
- After World War II, many Turkish countries, Turkmenistan, Uzbekistan, and Azerbaijan, changed their original alphabets (Arab, Persian or Cyrillic) to the Latin alphabet.
Indic Alphabets
Many Indic languages share the same alphabetic sound structure. Each uses a different written glyph for the same actual sound. Because the sounds are the same, you can write many different languages using the same script. For example, T. Krishnamacarya, the teacher of BKS Iyengar, Pattabhi Jois and TKV Desikachar, wrote Sanskrit verses using his native Telegu script. The Sanskrit language is usually written in a script called Devanagari. Indic scripts have no capitalization. The Sanskrit alphabet in Devanagari is shown below.
Sources
- Sanskrit Pronunciation
- omniglot
- SanskritDocuments
- Monier-Williams dictionary
- tilakpyle
- malayalam pronunciation
- remifa unicode chart
- Wikner Tutorial Narrative explanation of tongue position. 12 vowels
- Tilakpyle Audio 13 vowels
- Omniglot 14 vowels
- Sanskrit Sounds Narrative explanation. 14 vowels
- Unicode table 19 vowels
- Unicode table via la re mi fa so 19 vowels