o n_L@sbdZddlmZmZddlmZGdddeZidedddgd d d gd d ed ddgdddgd dedddgdddgd dedddddgddgd dedddgd d!d"gd d#ed#d$dd%d&gd'd(gd d)ed)d*dd+d,gd-d.gd d/ed/d0dd%d&gd1gd2d3ed3d4dd5gd6d7gd d8ed8d9dgd d:d;gd dd?d@gd dAedAdBdgd dCdDgd dEedEdFdgd dGdHdIgd dJedJdKddLdMgdNdOgd dPedPdQdddgdRdSgd dTedTdUdddgdVdWgd dXedXdYdgd dZd[gd ed\d]dgd^d_d`gd edadbdgd^dcddgd ededfdgdgdhdigd edjdkdd%d&gdlgd2edmdndddgdodpgd edqdrdgd dsdtgd edudvdddgdwdxgd edydzdgd{d|d}gd ed~ddddgddgd edddddgddgd edddgdgdgdedddgdddgd edddgdddgd eddddgddgd dZdS)z Metadata about languages used by our model training code for our SingleByteCharSetProbers. Could be used for other things in the future. This code is based on the language metadata from the uchardet project. )absolute_importprint_function) ascii_letterscs.eZdZdZ  dfdd ZddZZS) LanguageaMetadata about a language useful for training models :ivar name: The human name for the language, in English. :type name: str :ivar iso_code: 2-letter ISO 639-1 if possible, 3-letter ISO code otherwise, or use another catalog as a last resort. :type iso_code: str :ivar use_ascii: Whether or not ASCII letters should be included in trained models. :type use_ascii: bool :ivar charsets: The charsets we want to support and create data for. :type charsets: list of str :ivar alphabet: The characters in the language's alphabet. If `use_ascii` is `True`, you only need to add those not in the ASCII set. :type alphabet: str :ivar wiki_start_pages: The Wikipedia pages to start from if we're crawling Wikipedia for training data. :type wiki_start_pages: list of str NTcsrtt|||_||_||_||_|jr |r|t7}n t}n|s&td|r1d t t |nd|_ ||_ dS)Nz*Must supply alphabet if use_ascii is False)superr__init__nameiso_code use_asciicharsetsr ValueErrorjoinsortedsetalphabetwiki_start_pages)selfr r r r rr __class__7s z$Language.__repr__..)rr__name__r__dict__items)rrrr__repr__5s  zLanguage.__repr__)NNTNNN)r __module__ __qualname____doc__rr" __classcell__rrrrrs rArabicarF)z ISO-8859-6z WINDOWS-1256CP720CP864ubءآأؤإئابةتثجحخدذرزسشصضطظعغػؼؽؾؿـفقكلمنهوىيًٌٍَُِّuالصفحة_الرئيسية)r r r r rr Belarusianbe) ISO-8859-5 WINDOWS-1251IBM866 MacCyrillicuАБВГДЕЁЖЗІЙКЛМНОПРСТУЎФХЦЧШЫЬЭЮЯабвгдеёжзійклмнопрстуўфхцчшыьэюяʼu!Галоўная_старонка Bulgarianbg)r-r.IBM855uxАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЬЮЯабвгдежзийклмнопрстуфхцчшщъьюяuНачална_страницаCzechczTz ISO-8859-2z WINDOWS-1250u<áčďéěíňóřšťúůýžÁČĎÉĚÍŇÓŘŠŤÚŮÝŽuHlavní_stranaDanishda) ISO-8859-1z ISO-8859-15 WINDOWS-1252u æøåÆØÅForsideGermander8r9uäöüßÄÖÜzWikipedia:HauptseiteGreekelz ISO-8859-7z WINDOWS-1253uαβγδεζηθικλμνξοπρσςτυφχψωάέήίόύώΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΣΤΥΦΧΨΩΆΈΉΊΌΎΏuΠύλη:ΚύριαEnglishen Main_Page)r r r r r Esperantoeo ISO-8859-3uDabcĉdefgĝhĥijĵklmnoprsŝtuŭvzABCĈDEFGĜHĤIJĴKLMNOPRSŜTUŬVZuVikipedio:ĈefpaĝoSpanishesuñáéíóúüÑÁÉÍÓÚÜzWikipedia:PortadaEstonianet) ISO-8859-4 ISO-8859-13 WINDOWS-1257u6ABDEGHIJKLMNOPRSTUVÕÄÖÜabdeghijklmnoprstuvõäöüEsilehtFinnishfiuÅÄÖŠŽåäöšžzWikipedia:EtusivuFrenchfru,œàâçèéîïùûêŒÀÂÇÈÉÎÏÙÛÊuWikipédia:Accueil_principaluBœuf (animal)Hebrewhez ISO-8859-8z WINDOWS-1255u<אבגדהוזחטיךכלםמןנסעףפץצקרשתװױײuעמוד_ראשיCroatianhru@abcčćdđefghijklmnoprsštuvzžABCČĆDĐEFGHIJKLMNOPRSŠTUVZŽGlavna_stranica HungarianhuuPabcdefghijklmnoprstuvzáéíóöőúüűABCDEFGHIJKLMNOPRSTUVZÁÉÍÓÖŐÚÜŰu KezdőlapItalianituÀÈÉÌÒÓÙàèéìòóùPagina_principale Lithuanianlt)rJrKrIuRAĄBCČDEĘĖFGHIĮYJKLMNOPRSŠTUŲŪVZŽaąbcčdeęėfghiįyjklmnoprsštuųūvzžPagrindinis_puslapisLatvianlvuXAĀBCČDEĒFGĢHIĪJKĶLĻMNŅOPRSŠTUŪVZŽaābcčdeēfgģhiījkķlļmnņoprsštuūvzžu Sākumlapa Macedonianmk)r-r.r0r3u|АБВГДЃЕЖЗЅИЈКЛЉМНЊОПРСТЌУФХЦЧЏШабвгдѓежзѕијклљмнњопрстќуфхцчџшuГлавна_страницаDutchnl HoofdpaginaPolishpluRAĄBCĆDEĘFGHIJKLŁMNŃOÓPRSŚTUWYZŹŻaąbcćdeęfghijklłmnńoóprsśtuwyzźżuWikipedia:Strona_główna Portugueseptu0ÁÂÃÀÇÉÊÍÓÔÕÚáâãàçéêíóôõúuWikipédia:Página_principalRomanianrouăâîșțĂÂÎȘȚuPagina_principalăRussianru)r-r.zKOI8-Rr0r/r3uабвгдеёжзийклмнопрстуфхцчшщъыьэюяАБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯu#Заглавная_страницаSlovakskuDáäčďéíĺľňóôŕšťúýžÁÄČĎÉÍĹĽŇÓÔŔŠŤÚÝŽuHlavná_stránkaSloveneslu8abcčdefghijklmnoprsštuvzžABCČDEFGHIJKLMNOPRSŠTUVZŽ Glavna_stranSerbiansruxАБВГДЂЕЖЗИЈКЛЉМНЊОПРСТЋУФХЦЧЏШабвгдђежзијклљмнњопрстћуфхцчџшuГлавна_страна)r r rr rThaith)z ISO-8859-11zTIS-620CP874uกขฃคฅฆงจฉชซฌญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรฤลฦวศษสหฬอฮฯะัาำิีึืฺุู฿เแโใไๅๆ็่้๊๋์ํ๎๏๐๑๒๓๔๕๖๗๘๙๚๛uหน้าหลักTurkishtr)rDz ISO-8859-9z WINDOWS-1254uRabcçdefgğhıijklmnoöprsştuüvyzâîûABCÇDEFGĞHIİJKLMNOÖPRSŞTUÜVYZÂÎÛ Ana_Sayfa Vietnameseviz WINDOWS-1258uHaăâbcdđeêghiklmnoôơpqrstuưvxyAĂÂBCDĐEÊGHIKLMNOÔƠPQRSTUƯVXYuChữ_Quốc_ngữ)r[r^r`rbrergrirkrmrorrrtrwrzN) r% __future__rrstringrobjectr LANGUAGESrrrrs , !(.5:BISZbhpx