o at|@sdZddlZddlZddlZddlmZmZddlmZddl m Z m Z m Z m Z ddlmZmZmZmZmZmZddlmZgdZgd Zed d ZGd d d eZGdddedZGdddeZGdddeZ GdddZ!e!Z"Gddde#Z$GdddZ%ddZ&GdddZ'e'Z(dd Z)Gd!d"d"Z*Gd#d$d$eZ+Gd%d&d&eZ,Gd'd(d(ee,dZ-Gd)d*d*Z.Gd+d,d,e-Z/d-d.Z0Gd/d0d0e,Z1Gd1d2d2e-e1dZ2dS)3z pygments.lexer ~~~~~~~~~~~~~~ Base lexer classes. :copyright: Copyright 2006-2021 by the Pygments team, see AUTHORS. :license: BSD, see LICENSE for details. N) apply_filtersFilter)get_filter_by_name)ErrorTextOther _TokenType) get_bool_opt get_int_opt get_list_optmake_analysatorFuture guess_decode) regex_opt) Lexer RegexLexerExtendedRegexLexerDelegatingLexer LexerContextincludeinheritbygroupsusingthisdefaultwords))sutf-8)szutf-32)szutf-32be)szutf-16)szutf-16becCdS)Nxrr0/usr/lib/python3/dist-packages/pygments/lexer.py!r#c@eZdZdZddZdS) LexerMetaz This metaclass automagically converts ``analyse_text`` methods into static methods which always return float values. cCs(d|vr t|d|d<t||||S)N analyse_text)r type__new__)mcsnamebasesdrrr"r)*szLexerMeta.__new__N)__name__ __module__ __qualname____doc__r)rrrr"r&$s r&c@sZeZdZdZdZgZgZgZgZdZ ddZ ddZ dd Z d d Z dd dZddZdS)ra Lexer for a specific language. Basic options recognized: ``stripnl`` Strip leading and trailing newlines from the input (default: True). ``stripall`` Strip all leading and trailing whitespace from the input (default: False). ``ensurenl`` Make sure that the input ends with a newline (default: True). This is required for some lexers that consume input linewise. .. versionadded:: 1.3 ``tabsize`` If given and greater than 0, expand tabs in the input (default: 0). ``encoding`` If given, must be an encoding name. This encoding will be used to convert the input string to Unicode, if it is not already a Unicode string (default: ``'guess'``, which uses a simple UTF-8 / Locale / Latin1 detection. Can also be ``'chardet'`` to use the chardet library, if it is installed. ``inencoding`` Overrides the ``encoding`` if given. NrcKs||_t|dd|_t|dd|_t|dd|_t|dd|_|dd |_|d p-|j|_g|_ t |d d D]}| |q8dS) NstripnlTstripallFensurenltabsizerencodingguess inencodingfiltersr) optionsr r2r3r4r r5getr6r9r add_filter)selfr:filter_rrr"__init__^s zLexer.__init__cCs$|jr d|jj|jfSd|jjS)Nzz)r: __class__r.r=rrr"__repr__js  zLexer.__repr__cKs*t|ts t|fi|}|j|dS)z8 Add a new stream filter to this lexer. N) isinstancerrr9append)r=r>r:rrr"r<qs zLexer.add_filtercCr)a~ Has to return a float between ``0`` and ``1`` that indicates if a lexer wants to highlight this text. Used by ``guess_lexer``. If this method returns ``0`` it won't highlight it in any case, if it returns ``1`` highlighting with this lexer is guaranteed. The `LexerMeta` metaclass automatically wraps this function so that it works like a static method (no ``self`` or ``cls`` parameter) and the return value is automatically converted to `float`. If the return value is an object that is boolean `False` it's the same as if the return values was ``0.0``. Nr)textrrr"r'yr$zLexer.analyse_textFc sttsyjdkrt\}nujdkrezddl}Wnty-}ztd|d}~wwd}tD]\}}|rIt|d |d}nq2|durb| dd}  | dp_d d}|n! jd rxtd dn d rtd d d d  d d j rnjrd jdkrjjrd sd 7fdd} | } |st| j} | S)a= Return an iterable of (tokentype, value) pairs generated from `text`. If `unfiltered` is set to `True`, the filtering mechanism is bypassed even if filters are defined. Also preprocess the text, i.e. expand tabs and strip it if wanted and applies registered filters. r7chardetrNzkTo enable chardet encoding guessing, please install the chardet library from http://chardet.feedparser.org/replaceir6ruz   c3s&D] \}}}||fVqdSN)get_tokens_unprocessed)_tvr=rErr"streamers z"Lexer.get_tokens..streamer)rCstrr6rrF ImportError _encoding_map startswithlendecodedetectr;rGr3stripr2r5 expandtabsr4endswithrr9) r=rE unfilteredrLrFedecodedbomr6encrPstreamrrOr" get_tokenssZ              zLexer.get_tokenscCst)z Return an iterable of (index, tokentype, value) pairs where "index" is the starting position of the token within the input text. In subclasses, implement this method as a generator to maximize effectiveness. )NotImplementedErrorrOrrr"rKszLexer.get_tokens_unprocessed)F)r.r/r0r1r+aliases filenamesalias_filenames mimetypespriorityr?rBr<r'rarKrrrr"r0s   ;r) metaclassc@s$eZdZdZefddZddZdS)ra  This lexer takes two lexer as arguments. A root lexer and a language lexer. First everything is scanned using the language lexer, afterwards all ``Other`` tokens are lexed using the root lexer. The lexers from the ``template`` lexer package use this base lexer. cKs<|di||_|di||_||_tj|fi|dSNr) root_lexerlanguage_lexerneedlerr?)r= _root_lexer_language_lexer_needler:rrr"r?szDelegatingLexer.__init__cCsd}g}g}|j|D]$\}}}||jur(|r#|t||fg}||7}q ||||fq |r<|t||ft||j|S)N)rkrKrlrDrU do_insertionsrj)r=rEbuffered insertions lng_bufferirMrNrrr"rKs   z&DelegatingLexer.get_tokens_unprocessedN)r.r/r0r1rr?rKrrrr"rs rc@eZdZdZdS)rzI Indicates that a state should include rules from another state. Nr.r/r0r1rrrr"rsrc@r%)_inheritzC Indicates the a state should inherit from its superclass. cCr)NrrrArrr"rBz_inherit.__repr__N)r.r/r0r1rBrrrr"rxs rxc@s eZdZdZddZddZdS)combinedz: Indicates a state combined from multiple states. cGs t||SrJ)tupler))clsargsrrr"r) s zcombined.__new__cGsdSrJr)r=r}rrr"r? szcombined.__init__N)r.r/r0r1r)r?rrrr"rzs rzc@sFeZdZdZddZdddZdddZdd d Zd d Zd dZ dS) _PseudoMatchz: A pseudo match object constructed from a string. cCs||_||_dSrJ)_text_start)r=startrErrr"r?s z_PseudoMatch.__init__NcCs|jSrJ)rr=argrrr"rsz_PseudoMatch.startcCs|jt|jSrJ)rrUrrrrr"endsz_PseudoMatch.endcCs|rtd|jS)Nz No such group) IndexErrorrrrrr"group!sz_PseudoMatch.groupcCs|jfSrJ)rrArrr"groups&sz_PseudoMatch.groupscCsiSrJrrArrr" groupdict)ryz_PseudoMatch.groupdictrJ) r.r/r0r1r?rrrrrrrrr"r~s    r~csdfdd }|S)zL Callback that yields multiple actions for each group in the match. Nc3stD]O\}}|durqt|tur)||d}|r(||d||fVq||d}|durT|r>||d|_||t||d||D]}|rS|VqLq|r^||_dSdS)N) enumerater(rrrposr~r)lexermatchctxruactiondataitemr}rr"callback1s, zbygroups..callbackrJr)r}rrrr"r-src@rv)_ThiszX Special singleton used for indicating the caller class. Used by ``using``. Nrwrrrr"rGsrc slidvrd}t|ttfr|d<nd|fd<tur+dfdd }|Sdfdd }|S) a Callback that processes the match with a different lexer. The keyword arguments are forwarded to the lexer, except `state` which is handled separately. `state` specifies the state that the new lexer will start in, and can be an enumerable such as ('root', 'inline', 'string') or a simple string which is assumed to be on top of the root state. Note: For that to work, `_other` must not be an `ExtendedRegexLexer`. statestackrootNc3sxr|j|jdi}n|}|}|j|fiD] \}}}||||fVq#|r:||_dSdSri)updater:r@rrKrrrrrrlxsrurMrN) gt_kwargskwargsrr"rfs  zusing..callbackc3sl|jdi}|}|j|fiD] \}}}||||fVq|r4||_dSdSri)rr:rrKrrrr_otherrrrr"rus  rJ)poprClistr{r)rrrrrrr"rPs     rc@r%)rz Indicates a state or state action (e.g. #pop) to apply. For example default('#pop') is equivalent to ('', Token, '#pop') Note that state tuples may be used as well. .. versionadded:: 2.0 cCs ||_dSrJ)r)r=rrrr"r?s zdefault.__init__N)r.r/r0r1r?rrrr"rs rc@s"eZdZdZdddZddZdS) rz Indicates a list of literal words that is transformed into an optimized regex that matches any of the words. .. versionadded:: 2.0 rpcCs||_||_||_dSrJ)rprefixsuffix)r=rrrrrr"r?s zwords.__init__cCst|j|j|jdS)Nrr)rrrrrArrr"r;sz words.getN)rprp)r.r/r0r1r?r;rrrr"rs  rc@sJeZdZdZddZddZddZdd Zdd d Zd dZ ddZ d S)RegexLexerMetazw Metaclass for RegexLexer, creates the self._tokens attribute from self.tokens on the first instantiation. cCs t|tr |}t||jS)zBPreprocess the regular expression component of a token definition.)rCr r;recompiler)r|regexrflagsrrrr"_process_regexs zRegexLexerMeta._process_regexcCs&t|tust|sJd|f|S)z5Preprocess the token component of a token definition.z2token type must be simple type or callable, not %r)r(rcallable)r|tokenrrr"_process_tokenszRegexLexerMeta._process_tokencCst|tr/|dkr dS||vr|fS|dkr|S|dddkr)t|dd SJd|t|trbd |j}|jd 7_g}|D]}||ksPJd ||||||qD|||<|fSt|tr||D]}||vsy|d vsyJd |qi|SJd|)z=Preprocess the state transition action of a token definition.#pop#pushNz#pop:Fzunknown new state %rz_tmp_%drzcircular state ref %r)rrzunknown new state zunknown new state def %r)rCrQintrz_tmpnameextend_process_stater{)r| new_state unprocessed processed tmp_stateitokensistaterrr"_process_new_states<        z!RegexLexerMeta._process_new_statec Cst|tus Jd||ddksJd|||vr ||Sg}||<|j}||D]}t|trK||ks>Jd|||||t|q-t|trQq-t|trk| |j ||}| t djd|fq-t|tuswJd|z ||d||}Wnty} z td |d||| f| d} ~ ww||d } t|d krd}n | |d ||}| || |fq-|S) z%Preprocess a single state definition.zwrong state name %rr#zinvalid state name %rzcircular state reference %rrpNzwrong rule def %rz+uncompilable regex %r in state %r of %r: %sr)r(rQflagsrCrrrrxrrrrDrrrr{r Exception ValueErrorrrU) r|rrrtokensrtdefrrexerrrrrr"rsL        zRegexLexerMeta._process_stateNcCs<i}|j|<|p |j|}t|D] }||||q|S)z-Preprocess a dictionary of token definitions.) _all_tokensrrr)r|r+ tokendefsrrrrr"process_tokendefs  zRegexLexerMeta.process_tokendefc Csi}i}|jD]_}|jdi}|D]Q\}}||}|dur;|||<z|t}Wn ty5Yqw|||<q||d}|durFq||||d<z|t} Wn ty^Yqw|| ||<qq|S)a Merge tokens from superclasses in MRO order, returning a single tokendef dictionary. Any state that is not defined by a subclass will be inherited automatically. States that *are* defined by subclasses will, by default, override that state in the superclass. If a subclass wishes to inherit definitions from a superclass, it can use the special value "inherit", which will cause the superclass' state definition to be included at that point in the state. rNr)__mro____dict__r;itemsindexrrr) r|r inheritablectoksrrcuritems inherit_ndx new_inh_ndxrrr" get_tokendefss6      zRegexLexerMeta.get_tokendefscOsRd|jvri|_d|_t|dr|jrn |d||_tj |g|Ri|S)z:Instantiate cls after preprocessing its token definitions._tokensrtoken_variantsrp) rrrhasattrrrrrr(__call__)r|r}kwdsrrr"r8s zRegexLexerMeta.__call__rJ) r.r/r0r1rrrrrrrrrrr"rs# , 1rc@s$eZdZdZejZiZdddZdS)rz Base for simple stateful regular expression-based lexers. Simplifies the lexing process so that you need only provide a list of states and regular expressions. rc csd}|j}t|}||d} |D]\}}} |||} | r|dur:t|tur2||| fVn||| EdH| }| durt| trm| D]"} | dkrZt|dkrY| qI| dkrf| |dqI| | qIn+t| t rt | t|kr|dd=n|| d=n| dkr| |dnJd| ||d}n3qz'||d krd g}|d }|t d fV|d7}Wq|t||fV|d7}Wn tyYdSwq) z} Split ``text`` into (tokentype, text) pairs. ``stack`` is the inital stack (default: ``['root']``) rrrNrrFwrong state def: %rrHr)rrr(rrrrCr{rUrrDrabsrrr) r=rErrr statestack statetokensrexmatchrrmrrrr"rKis`           #    z!RegexLexer.get_tokens_unprocessedNr) r.r/r0r1r MULTILINErrrKrrrr"rFs rc@s"eZdZdZdddZddZdS)rz9 A helper object that holds lexer position data. NcCs*||_||_|p t||_|pdg|_dS)Nr)rErrUrr)r=rErrrrrr"r?szLexerContext.__init__cCsd|j|j|jfS)NzLexerContext(%r, %r, %r))rErrrArrr"rBs zLexerContext.__repr__NN)r.r/r0r1r?rBrrrr"rs  rc@seZdZdZdddZdS)rzE A RegexLexer that uses a context object to store its state. Nc cs|j}|st|d}|d}n |}||jd}|j} |D]\}}}|||j|j} | r|durYt|turG|j|| fV| |_n||| |EdH|sY||jd}|durt |t r|D]'} | dkrwt |jdkrv|j qd| dkr|j |jdqd|j | qdn0t |trt|t |jkr|jdd=n|j|d=n|dkr|j |jdnJd |||jd}nHqz;|j|jkrWdS||jd krdg|_|d}|jtd fV|jd7_Wq|jt||jfV|jd7_Wn ty YdSwq) z Split ``text`` into (tokentype, text) pairs. If ``context`` is given, use this lexer context instead. rrrrNrrFrrH)rrrrErrr(rrrCr{rUrrDrrrrrr) r=rEcontextrrrrrrrrrrr"rKsn        # z)ExtendedRegexLexer.get_tokens_unprocessedr)r.r/r0r1rKrrrr"rsrc cst|}zt|\}}Wnty|EdHYdSwd}d}|D]{\}}}|dur.|}d} |r|t||kr|| ||} | rP||| fV|t| 7}|D]\} } } || | fV|t| 7}qR||} zt|\}}Wn ty{d}Yn w|r|t||ks:| t|kr|||| dfV|t|| 7}q#|r|pd}|D]\}}}|||fV|t|7}qzt|\}}Wn tyd}YdSw|sdSdS)ag Helper for lexers which must combine the results of several sublexers. ``insertions`` is a list of ``(index, itokens)`` pairs. Each ``itokens`` iterable should be inserted at position ``index`` into the token stream given by the ``tokens`` argument. The result is a combined token stream. TODO: clean up the code here. NTrF)iternext StopIterationrU)rsrrrrealposinsleftrurMrNolditmpvalit_indexit_tokenit_valueprrr"rqs\          rqc@r%)ProfilingRegexLexerMetaz>Metaclass for ProfilingRegexLexer, collects regex timing info.csLt|trt|j|j|jdn|t|tjffdd }|S)Nrcs`jdfddg}t}|||}t}|dd7<|d||7<|S)Nrrrr) _prof_data setdefaulttimer)rErendposinfot0rest1r|compiledrrrr" match_funcJsz:ProfilingRegexLexerMeta._process_regex..match_func) rCrrrrrrsysmaxsize)r|rrrrrrr"rBs   z&ProfilingRegexLexerMeta._process_regexN)r.r/r0r1rrrrr"r?s rc@s"eZdZdZgZdZdddZdS)ProfilingRegexLexerzFDrop-in replacement for RegexLexer that does profiling of its regexes.rc#sjjit||EdHjj}tdd|Dfdddd}tdd|D}t t djj t ||ft d t d d t d |D]}t d |qSt d dS)NcssP|]#\\}}\}}|t|ddddd|d|d||fVqdS)zu'z\\\NAi)reprrXrG).0rrnrMrrr" `s  z=ProfilingRegexLexer.get_tokens_unprocessed..cs |jSrJ)_prof_sort_indexr rArr"r#cs z.T)keyreversecss|]}|dVqdS)Nr)rr!rrr"resz2Profiling result for %s lexing %d chars in %.3f mszn==============================================================================================================z$%-20s %-64s ncalls tottime percall)rrzn--------------------------------------------------------------------------------------------------------------z%-20s %-65s %5d %8.4f %8.4f) r@rrDrrKrsortedrsumprintr.rU)r=rErrawdatar sum_totalr-rrAr"rK[s*    z*ProfilingRegexLexer.get_tokens_unprocessedNr)r.r/r0r1rrrKrrrr"rUs r)3r1rrrpygments.filterrrpygments.filtersrpygments.tokenrrrr pygments.utilr r r r r rpygments.regexoptr__all__rS staticmethod_default_analyser(r&rrrQrrxrr{rzr~rrrrrrrrrrrqrrrrrr"sF      ' 2 )aH@