/ Website languages in the .at-zone
/ nic.at News - 28.08.2020 07:32
Website languages in the .at-zone
Every few weeks, our Research & Development team analyses the home pages of all .at websites with regard to their language. This is done not only to get a more detailed overview of the .at-zone, but also for security reasons. After all, if the language of a website suddenly changes, this could sometimes indicate a hacking incident.
Not surprisingly, the majority of .at-sites are in German (88%), followed by English (10%) and a small percentage of other languages - this is based on the language in which a site is presented when it is accessed.
By the way, the examination of the start pages is similar to that of search engines - with the difference that the aim is not to analyze the actual content of the website. Instead, language recognition is based on a simple idea: certain letter combinations and characters occur more frequently in some languages than in others. For example, a "¿" indicates that a text was written in Spanish, while a "Z" at the beginning of a word is characteristic of the German language. Based on statistical evaluation of the letter combinations that occur on a page, it is therefore possible to predict the language with a very high probability - without having to understand the content of a page.
"The crawling of .at web pages is a basic technology that is also an essential building block in the context of IT security: The same techniques used to detect language are also useful to determine whether a web page has been hacked or not," says Aaron Kaplan (CERT.at). "Imagine that the language of a community's home page changes from one day to the next and suddenly online betting games are sold there".
The program library developed for this purpose is freely available on github. The full project description can be found on the CEF website.
The project is co-financed by the "Connecting Europe" facility of the European Union. This European Union fund for pan-European infrastructure investments in transport, energy and digital projects is intended to enable better connectivity between the member states of the European Union. |
The contents of this publication are the sole responsibility of nic.at and do not necessarily reflect the opinion of the European Union.