PNG  IHDR;IDATxܻn0K )(pA 7LeG{ §㻢|ذaÆ 6lذaÆ 6lذaÆ 6lom$^yذag5bÆ 6lذaÆ 6lذa{ 6lذaÆ `}HFkm,mӪôô! x|'ܢ˟;E:9&ᶒ}{v]n&6 h_tڠ͵-ҫZ;Z$.Pkž)!o>}leQfJTu іچ\X=8Rن4`Vwl>nG^is"ms$ui?wbs[m6K4O.4%/bC%t Mז -lG6mrz2s%9s@-k9=)kB5\+͂Zsٲ Rn~GRC wIcIn7jJhۛNCS|j08yiHKֶۛkɈ+;SzL/F*\Ԕ#"5m2[S=gnaPeғL lذaÆ 6l^ḵaÆ 6lذaÆ 6lذa; _ذaÆ 6lذaÆ 6lذaÆ RIENDB` 3 ]9Y@s0ddlZddlZddlmZGdddeZdS)N) ProbingStatec@sneZdZdZdddZddZeddZd d Zed d Z d dZ e ddZ e ddZ e ddZdS) CharSetProbergffffff?NcCsd|_||_tjt|_dS)N)_state lang_filterloggingZ getLogger__name__Zlogger)selfrr #/usr/lib/python3.6/charsetprober.py__init__'szCharSetProber.__init__cCs tj|_dS)N)rZ DETECTINGr)r r r r reset,szCharSetProber.resetcCsdS)Nr )r r r r charset_name/szCharSetProber.charset_namecCsdS)Nr )r bufr r r feed3szCharSetProber.feedcCs|jS)N)r)r r r r state6szCharSetProber.statecCsdS)Ngr )r r r r get_confidence:szCharSetProber.get_confidencecCstjdd|}|S)Ns([-])+ )resub)rr r r filter_high_byte_only=sz#CharSetProber.filter_high_byte_onlycCsbt}tjd|}xJ|D]B}|j|dd|dd}|j rP|dkrPd}|j|qW|S)u9 We define three types of bytes: alphabet: english alphabets [a-zA-Z] international: international characters [€-ÿ] marker: everything else [^a-zA-Z€-ÿ] The input buffer can be thought to contain a series of words delimited by markers. This function works to filter all words that contain at least one international character. All contiguous sequences of markers are replaced by a single space ascii character. This filter applies to all scripts which do not use English characters. s%[a-zA-Z]*[-]+[a-zA-Z]*[^a-zA-Z-]?Nrrr) bytearrayrfindallextendisalpha)rfilteredZwordsZwordZ last_charr r r filter_international_wordsBs  z(CharSetProber.filter_international_wordscCst}d}d}xtt|D]r}|||d}|dkr>d}n |dkrJd}|dkr|j r||kr| r|j||||jd|d}qW|s|j||d |S) a Returns a copy of ``buf`` that retains only the sequences of English alphabet and high byte characters that are not between <> characters. Also retains English alphabet and high byte characters immediately before occurrences of >. This filter can be applied to all scripts which contain both English characters and extended ASCII characters, but is currently only used by ``Latin1Prober``. Frr>s