m i:Nc@swdZdklZdZdZdZdZdklZl Z dk Z dk Z dk Z dk Z dkZydklZWnej o hZnXyeWn ej od klZnXe id e_e id ie _d Zd ZdefdYZdeefdYZdefdYZdefdYZ defdYZ!defdYZ"defdYZ#dfdYZ$de%fdYZ&d Z'd!e#efd"YZ(d#e(fd$YZ)d%e*fd&YZ+d'e)fd(YZ,d)e)fd*YZ-d+e(fd,YZ.d-e(fd.YZ/d/e)fd0YZ0d1e,fd2YZ1d3e-fd4YZ2d5e.fd6YZ3y dk4Z4Wnej o e5Z4nXy dk6Z7Wnej onXy dk8Z8Wnej onXd7fd8YZ9e:d9jo'dk;Z;e)e;i<Z=e=i>GHndS(:s Beautiful Soup Elixir and Tonic "The Screen-Scraper's Friend" http://www.crummy.com/software/BeautifulSoup/ Beautiful Soup parses a (possibly invalid) XML or HTML document into a tree representation. It provides methods and Pythonic idioms that make it easy to navigate, search, and modify the tree. A well-formed XML/HTML document yields a well-formed data structure. An ill-formed XML/HTML document yields a correspondingly ill-formed data structure. If your document is only locally well-formed, you can use this library to find and process the well-formed part of it. Beautiful Soup works with Python 2.2 and up. It has no external dependencies, but you'll have more success at converting data to UTF-8 if you also install these three packages: * chardet, for auto-detecting character encodings http://chardet.feedparser.org/ * cjkcodecs and iconv_codec, which add more encodings to the ones supported by stock Python. http://cjkpython.i18n.org/ Beautiful Soup defines classes for two main parsing strategies: * BeautifulStoneSoup, for parsing XML, SGML, or your domain-specific language that kind of looks like XML. * BeautifulSoup, for parsing run-of-the-mill HTML code, be it valid or invalid. This class has web browser-like heuristics for obtaining a sensible parse tree in the face of common HTML errors. Beautiful Soup also defines a class (UnicodeDammit) for autodetecting the encoding of an HTML or XML document, and converting it to Unicode. Much of this code is taken from Mark Pilgrim's Universal Feed Parser. For more than you ever wanted to know about Beautiful Soup, see the documentation: http://www.crummy.com/software/BeautifulSoup/documentation.html Here, have some legalese: Copyright (c) 2004-2010, Leonard Richardson All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the the Beautiful Soup Consortium and All Night Kosher Bakery nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE, DAMMIT. (s generatorss*Leonard Richardson (leonardr@segfault.org)s3.2.0s*Copyright (c) 2004-2010 Leonard Richardsons New-style BSD(s SGMLParsersSGMLParseErrorN(sname2codepoint(sSets[a-zA-Z][-_.:a-zA-Z0-9]*s-zA-Z][-_.:a-zA-Z0-9]*\s*sutf-8cCstid|S(s(Build a RE to match the given CSS class.s(^|.*\s)%s($|\s)N(tretcompiletstr(R((t0/home/member/dmb/public_html/gs/BeautifulSoup.pyt_match_css_classkst PageElementcBstZdZeedZdZdZdZdZdZ dZ ehedZ eheed Z ehed Z eheed ZeZehed Zeheed ZeZehedZeheedZeZehdZehedZeZdZdZdZdZdZdZdZedZ edZ!RS(seContains the navigational information for some part of the page (either a tag or a piece of text)cCsk||_||_d|_d|_d|_|io0|iio#|iid|_||i_ndS(sNSets up the initial relations between this element and other elements.iN(tparenttselftprevioustNonetnexttpreviousSiblingt nextSiblingtcontents(RRR((Rtsetupus     cCs|i}|ii|}t|doK|i|ijo8|ii|}|o||jo|d}qvn|i|i||dS(NRi( RRt oldParenttindextmyIndexthasattrt replaceWithtextracttinsert(RRRRR((RRs # cCsc|i}|ii|}|it|i}|i x|D]}|i ||qEWdS(N( RRtmyParentRRRtlistR treversedChildrentreversetchildR(RRRRR((RtreplaceWithChildrens   cCs|io7y|ii|ii|=WqAtj oqAXn|i}|i}|i o||i _n|o|i |_ nd|_ d|_d|_|i o|i |i _ n|i o|i |i _ nd|_ |_ |S(s0Destructively rips this element out of the tree.N( RRR Rt ValueErrort_lastRecursiveChildt lastChildR t nextElementRR R R (RRR((RRs*          cCs9|}x,t|do|io|id}q W|S(s8Finds the last element beneath this object to be parsed.R iN(RRRR (RR((RRs c Cs%t|to!t|t ot|}nt|t|i}t |do\|i dj oL|i |jo.|i |}||jo|d}qn|i n||_ d}|djod|_||_n6|i|d}||_||i_|i|_|io||i_n|i}|t|ijocd|_|}d}x*|p"|i}|i }|pPqpqpW|o ||_qd|_n:|i|}||_|io||i_n||_|io||i_n|ii||dS(NRii(t isinstancetnewChildt basestringtNavigableStringtmintpositiontlenRR RRR RRt previousChildR RR RR tnewChildsLastElementtparentsNextSiblingt nextChildR( RR%R!R)RR(RR'R*((RRsT!                     cCs|it|i|dS(s2Appends the given tag to the contents of this tag.N(RRR&R ttag(RR+((RtappendscKs|i|i||||S(sjReturns the first item that matches the given criteria and appears after this Tag in the document.N(Rt_findOnet findAllNexttnametattrsttexttkwargs(RR/R0R1R2((RtfindNextscKs|i|||||i|S(sbReturns all items that match the given criteria and appear after this Tag in the document.N(Rt_findAllR/R0R1tlimitt nextGeneratorR2(RR/R0R1R5R2((RR.scKs|i|i||||S(s{Returns the closest sibling to this Tag that matches the given criteria and appears after this Tag in the document.N(RR-tfindNextSiblingsR/R0R1R2(RR/R0R1R2((RtfindNextSiblingscKs|i|||||i|S(sqReturns the siblings of this Tag that match the given criteria and appear after this Tag in the document.N(RR4R/R0R1R5tnextSiblingGeneratorR2(RR/R0R1R5R2((RR7scKs|i|i||||S(skReturns the first item that matches the given criteria and appears before this Tag in the document.N(RR-tfindAllPreviousR/R0R1R2(RR/R0R1R2((Rt findPreviousscKs|i|||||i|S(scReturns all items that match the given criteria and appear before this Tag in the document.N(RR4R/R0R1R5tpreviousGeneratorR2(RR/R0R1R5R2((RR:scKs|i|i||||S(s|Returns the closest sibling to this Tag that matches the given criteria and appears before this Tag in the document.N(RR-tfindPreviousSiblingsR/R0R1R2(RR/R0R1R2((RtfindPreviousSibling#scKs|i|||||i|S(srReturns the siblings of this Tag that match the given criteria and appear before this Tag in the document.N(RR4R/R0R1R5tpreviousSiblingGeneratorR2(RR/R0R1R5R2((RR=)scKs4d}|i||d}|o|d}n|S(sOReturns the closest parent of this Tag that matches the given criteria.iiN(R trRt findParentsR/R0tl(RR/R0R2RBR@((Rt findParent1s cKs|i||d||i|S(sFReturns the parents of this Tag that match the given criteria.N(RR4R/R0R R5tparentGeneratorR2(RR/R0R5R2((RRA<scKs7d}||||d|}|o|d}n|S(Nii(R R@tmethodR/R0R1R2RB(RRER/R0R1R2RBR@((RR-Fs cKst|to |}n|djo| o| o| o|t jo:g} |D]!} t| t o | | qZqZ~ Sqt|toJg} |D]1} t| t o| i|jo | | qq~ Sqt||||}nt||||}t|}|} xt o|y| i} Wntj oPnX| oJ|i| } | o0|i| |ot||joPqqq&q&W|S(s8Iterates over a generator looking for things that match.N(R R/t SoupStrainertstrainerR1R R5R0R2tTruet_[1]t generatortelementtTagR"t ResultSettresultstgR tit StopIterationtsearchtfoundR,R&(RR/R0R1R5RJR2RNRGRORPRKRIRS((RR4Ms2 % :J   ccs,|}x|dj o|i}|Vq WdS(N(RRPR R (RRP((RR6ss   ccs,|}x|dj o|i}|Vq WdS(N(RRPR R (RRP((RR9ys   ccs,|}x|dj o|i}|Vq WdS(N(RRPR R(RRP((RR<s   ccs,|}x|dj o|i}|Vq WdS(N(RRPR R (RRP((RR?s   ccs,|}x|dj o|i}|Vq WdS(N(RRPR R(RRP((RRDs   cCs|pd}|id|S(Nsutf-8s%SOUP-ENCODING%(tencodingRtreplace(RRRT((RtsubstituteEncodings cCst|to|o|i|}qnjt|to*|o|i|}qt|}n0|o|it||}n t|}|S(sHEncodes an object to a string in some encoding, or to Unicode. .N(R tstunicodeRTtencodeRRt toEncoding(RRWRT((RRZs ("t__name__t __module__t__doc__R RRRRRRR,R3R.R8R7tfetchNextSiblingsR;R:t fetchPreviousR>R=tfetchPreviousSiblingsRCRAt fetchParentsR-R4R6R9R<R?RDRVRZ(((RRqs>     ;    &      R#cBs8tZdZdZdZdZedZRS(NcCs7t|toti||Snti||tS(s-Create a new NavigableString. When unpickling a NavigableString, this method is called with the string in DEFAULT_OUTPUT_ENCODING. That encoding needs to be passed in to the superclass's __new__ or the superclass won't know how to handle non-ASCII characters. N(R tvalueRXt__new__tclstDEFAULT_OUTPUT_ENCODING(RdRb((RRcscCsti|fS(N(R#t__str__R(R((Rt__getnewargs__scCs2|djo|Sntd|ii|fdS(stext.string gives you text. This is for backwards compatibility for Navigable*String, but for CData* it lets you get the string without the CData wrapper.tstrings!'%s' object has no attribute '%s'N(tattrRtAttributeErrort __class__R[(RRi((Rt __getattr__s cCst|itS(N(RRtdecodeRe(R((Rt __unicode__scCs |o|i|Sn|SdS(N(RTRRY(RRT((RRfs(R[R\RcRgRlRnReRf(((RR#s   tCDatacBstZedZRS(NcCsdti||S(Ns(R#RfRRT(RRT((RRfs(R[R\ReRf(((RRostProcessingInstructioncBstZedZRS(NcCs=|}d|jo|i||}nd|i||S(Ns%SOUP-ENCODING%s(RtoutputRVRTRZ(RRTRq((RRfs (R[R\ReRf(((RRpstCommentcBstZedZRS(NcCsdti||S(Ns (R#RfRRT(RRT((RRfs(R[R\ReRf(((RRrst DeclarationcBstZedZRS(NcCsdti||S(Ns(R#RfRRT(RRT((RRfs(R[R\ReRf(((RRssRLcBs#tZdZdZhdd<dd<dd<dd <d d cCs |id}|io|tjott|Sn||ijo%|io|i|Sqd|Snt |djoh|ddjoWt |djo,|ddjott |ddSqtt |dSn|i o d|Sn d|Sd S( sUsed in a call to re.sub to replace HTML, XML, and numeric entities with the appropriate Unicode characters. If HTML entities are being converted, any unrecognized entities are escaped.iu&%s;it#txiiu&%s;N( tmatchtgroupRRtconvertHTMLEntitiestname2codepointtunichrtXML_ENTITIES_TO_SPECIAL_CHARStconvertXMLEntitiesR&tinttescapeUnrecognizedEntities(RRR((Rt_convertEntitiess  $$  cs|i_|i|_|_|djo g}n!t |t o|i }n|_g_ i ||t_t_|i_|i_|i_d}t|i_dS(sBasic constructor.cs(|\}}|tidi|fS(Ns&(#\d+|#x[0-9a-fA-F]+|\w+);(RvtvalRtsubRR(t.0RvR(R(Rt"sN(tparserRkRt parserClasstisSelfClosingTagR/t isSelfClosingR0R R tdictRuR RRRtFalsethiddentcontainsSubstitutionsRRRtconverttmap(RRR/R0RRR((RRt__init__ s$            cCs@t|idjo&t|idto|idSndS(Nii(R&RR R R#(R((Rt getString(s-cCs|i|i|dS(s-Replace the contents of the tag with a stringN(RtclearR,Rh(RRh((Rt setString-s ucCst|ipdSn|ii}g}|id}xB||j o4t|t o|i |i n|i}q=W|i |S(Nui(R&RR RR tstopNodetstringstcurrentR R#R,tstript separatortjoin(RRRRR((RtgetText4s  cCs|ii||S(sReturns the value of the 'key' attribute for the tag, or the value given for 'default' if it doesn't have that attribute.N(Rt _getAttrMaptgettkeytdefault(RRR((RRBscCs#x|iD]}|iq WdS(sExtract all children.N(RR RR(RR((RRHs cCsEx2t|iD]!\}}||jo|SqqWtddS(NsTag.index: element not in tag(t enumerateRR RPRRKR(RRKRPR((RRMs    cCs|ii|S(N(RRthas_keyR(RR((RRSscCs|i|S(sqtag[key] returns the value of the 'key' attribute for the tag, and throws an exception if it's not there.N(RRR(RR((Rt __getitem__VscCs t|iS(s0Iterating over a tag iterates over its contents.N(titerRR (R((Rt__iter__[scCs t|iS(s:The length of a tag is the length of its list of contents.N(R&RR (R((Rt__len___scCs ||ijS(N(RRR (RR((Rt __contains__cscCstS(s-A tag is non-None even if it has no contents.N(RH(R((Rt __nonzero__fscCs|i||i|]|s&(?!#\d+;|#x[0-9a-fA-F]+;|\w+;)t)cCs d|i|idddS(smUsed with a regular expression to substitute the appropriate XML entity for an XML special character.R~it;N(RtXML_SPECIAL_CHARS_TO_ENTITIESRR(RR((Rt _sub_entitysicCs|i|i|}g} |iox|iD]\} }d}t |t o|i o#d|jo|i ||}nd|jo-d}d|jo|i dd}qn|ii|i|}n| i||i| ||i||fq/Wnd} d}|io d} n d |}d\}}|o"|}d |d } |d }n|i|||}|io |} ng} d}| od d i| }n|o| i| n| id ||| f|o| idn| i||o)|o"|ddjo| idn|o|o| i| n| i||o"|o|i o| idndi| } | S(sReturns a string or Unicode representation of this tag and its contents. To get Unicode, pass None for encoding. NOTE: since Python's HTML parser consumes whitespace, this method is not certain to reproduce the whitespace present in the original string.s%s="%s"s%SOUP-ENCODING%R|s%s='%s'Rzs&squot;ts /sit is<%s%s%s>s iN(ii(!RRZR/RTt encodedNameR0RRtfmtR R"RRVRUtBARE_AMPERSAND_OR_BRACKETRRR,tclosetcloseTagRt indentTagtindentContentst prettyPrintt indentLeveltspacetrenderContentsR RRWtattributeStringRR (RRTRRRRRRRRRWR0RRRRR ((RRfs`    7        cCs|it|idjodSn|id}xi|dj o[|i}t|to |i2nd|_ d|_ d|_ d|_d|_ |}q8WdS(s/Recursively destroys the contents of this tree.iN( RRR&R RR R R RLRRR R (RR R((Rt decompose s           cCs|i|tS(N(RRfRTRH(RRT((RtprettifyscCsg}x|D]}d}t|to|i|}n1t|t o |i |i|||n|o|o|i }n|oI|o|i d|dn|i ||o|i dqq q Wdi|S(s{Renders the contents of this tag as a string in the given encoding. If encoding is None, returns a Unicode string..Ris RN(RWRtcR R1R R#RfRTRLR,RRRR(RRTRRRR1RW((RRs$  cKs=d}|i||||d|}|o|d}n|S(sLReturn only the first child of this Tag matching the given criteria.iiN( R R@RRR/R0t recursiveR1R2RB(RR/R0RR1R2RBR@((RR5s cKs9|i}|p |i}n|i||||||S(sExtracts a list of Tag objects that match the given criteria. You can specify the name of the Tag and any attributes you want the Tag to have. The value of a key-value pair in the 'attrs' map can be a string, a list of strings, a regular expression object, or a callable that takes a string and returns whether or not the string matches for some custom definition of 'matches'. The same is true of the tag name.N( RtrecursiveChildGeneratorRJRtchildGeneratorR4R/R0R1R5R2(RR/R0RR1R5R2RJ((RR@s   cCs|id|d|d|S(NR1RR5(RRR1RR5(RR1RR5((Rt fetchTextUscCs|id|d|S(NR1R(RRR1R(RR1R((Rt firstTextXscCsKt|dp4h|_x(|iD]\}}||i|" actually means "". [Another possible explanation is "", but since this class defines no SELF_CLOSING_TAGS, it will never use that explanation.] This class is useful for parsing XML or made-up markup languages, or when BeautifulSoup makes an assumption counter to what you were expecting.s (<[^<>]*)/>cCs|iddS(Nis />(RR(R((RR%ss]*)>cCsd|iddS(Ns (No space between name of closing tag and tag close) (Extraneous whitespace in declaration) You can pass in a custom list of (RE object, replace method) tuples to get Beautiful Soup to scrub your input the way you want.treadtisHTMLN(tparseOnlyTheseRt fromEncodingt smartQuotesTotconvertEntitiesR t HTML_ENTITIESRRRHRRtXHTML_ENTITIESt XML_ENTITIESRtselfClosingTagstinstanceSelfClosingTagst SGMLParserRRRRt markupMassaget_feedRt StopParsing( RRRRRRRRR((RR8sB                    cCs]yt|}Wntj o dSnXd|jo djnpdSn|i|S(s/This method fixes a bug in Python's SGMLParser.Nii(RR/tnRRtconvert_codepoint(RR/R ((Rtconvert_charref}s cCs@|i}t|to!t|dp d|_qnIt||i|gd|i d|}|i}|i|_|i |_ |og|ioYt|idp|i|_nx)|iD]\}}|i||}qW|`qn|iti|||ix%|ii|ijo|iqWdS(NtoriginalEncodingRRR(RRR RXRR Rt UnicodeDammitRtinDocumentEncodingRRtdammittdeclaredHTMLEncodingRtMARKUP_MASSAGEtfixtmRtresetRtfeedtendDatat currentTagR/t ROOT_TAG_NAMEtpopTag(RRRRRRR((RR s.        cCsr|idp |idp|idoti||Sn+|idpti||SntdS(sThis method routes method call requests to either the SGMLParser superclass or the Tag superclass, depending on the method name.tstart_tend_tdo_RN(t methodNamet startswithRRlRRLRj(RR((RRls 0cCs#|ii|p|ii|S(seReturns true iff the given string is the name of a self-closing tag according to this parser.N(RtSELF_CLOSING_TAGSRR/R(RR/((RRscCsati|||id|_ti|g|_d|_ g|_ g|_ |i |dS(Ni( RLRRRRRRt currentDataR RttagStackt quoteStacktpushTag(R((RRs      cCs4|ii}|io|id|_n|iS(Ni(RR#tpopR+R(RR+((RRs cCsE|io|iii|n|ii||id|_dS(Ni(RRR R,R+R#(RR+((RR%s cCsD|io6di|i}|i|idjo\tg}|iD]}||i qF~i |i  o!d|jo d}qd}ng|_|i o@t |idjo*|i i p|i i| odSn||}|i|i|i|io||i_n||_|iii|ndS(NuRs Ri(RR"Rt translatetSTRIP_ASCII_SPACEStsetRIR#R+R/t intersectiontPRESERVE_WHITESPACE_TAGSRR&R1RRtcontainerClasstoRRRR R R,(RR,RIR+R-R"((RRs T    B   cCs||ijodSnd}d}xVtt|idddD]5}||i|ijot|i|}PqDqDW|p|d}nx#td|D]}|i }qW|S(sPops the tag stack up to and including the most recent instance of the given tag. If inclusivePop is false, pops the tag stack up to but *not* including the most recent instqance of the given tag.Niii( R/RRtnumPopsR t mostRecentTagRR&R#RPt inclusivePopR(RR/R0R.RPR/((Rt _popToTags   c Cs!|ii|}|dj}|ii|}d}t }xt t|idddD]}|i|}| p|i|jo| o |}Pn|dj o|i|jp*|djo1|o*|ii|io|i}t}Pn|i}q\W|o|i||ndS(sWe need to pop up to the previous tag of this type, unless one of this tag's nesting reset triggers comes between this tag and the previous tag of this type, OR unless this tag is a generic nesting trigger and another generic nesting trigger comes between this tag and the previous tag of this type. Examples:

FooBar *

* should pop to 'p', not 'b'.

FooBar *

* should pop to 'table', not 'p'.

Foo

Bar *

* should pop to 'tr', not 'p'.

    • *
    • * should pop to 'ul', not the first 'li'.
  • ** should pop to 'table', not the first 'tr' tag should implicitly close the previous tag within the same
    ** should pop to 'tr', not the first 'td' iiiN(Rt NESTABLE_TAGSRR/tnestingResetTriggersR t isNestabletRESET_NESTING_TAGSRtisResetNestingtpopToRHt inclusiveRR&R#RPtpRRR1( RR/R9R8RPR4R7R3R6((Rt _smartPops&    G  icCs|ioYdig}|D]\}}|d||fq~}|id||fdSn|i |i | o| o|i |n|i oBt|idjo,|i ip|i i|| odSnt||||i|i}|io||i_n||_|i||p|i |o|in||ijo|ii|d|_n|S(NRs %s="%s"s<%s%s>i(RR$RRIR0Rtyt handle_dataR/RRt selfClosingR:RR&R#R1RRLRRR+R R%Rt QUOTE_TAGSR,tliteral(RR/R0R=RIR+R;R((Rtunknown_starttag/s( : D    cCs|io-|id|jo|id|dSn|i|i||io=|id|jo)|iit|idj|_ndS(Nisi( RR$R/R<RR1R&R&R?(RR/((Rtunknown_endtagMs   cCs|ii|dS(N(RR"R,tdata(RRB((RR<ZscCs(|i|i||i|dS(sOAdds a certain piece of text to the tree as a NavigableString subclass.N(RRR<R1tsubclass(RR1RC((Rt_toStringSubclass]s  cCs/|d djo d}n|i|tdS(sHandle a processing instruction as a ProcessingInstruction object, possibly one with a %SOUP-ENCODING% slot into which an encoding will be plugged later.iRu,xml version='1.0' encoding='%SOUP-ENCODING%'N(R1RRDRp(RR1((Rt handle_pids cCs|i|tdS(s#Handle comments as Comment objects.N(RRDR1Rr(RR1((Rthandle_commentlscCs;|iott|}n d|}|i|dS(s$Handle character references as data.s&#%s;N(RRRRtrefRBR<(RRGRB((Rthandle_charrefps   cCsd}|io.ytt|}Wq>tj oq>Xn| o |io|i i |}n| o,|io"|i i | od|}n|pd|}n|i |dS(sHandle entity references as data, possibly converting known HTML and/or XML entity references to the corresponding Unicode characters.s&%ss&%s;N( R RBRRRRRGtKeyErrorRRRR<(RRGRB((Rthandle_entityrefxs  &cCs|i|tdS(s4Handle DOCTYPEs and the like as Declaration objects.N(RRDRBRs(RRB((Rt handle_declscCsd}|i||d!djog|iid|}|djot|i}n|i|d|!}|d}|i |t nWyt i ||}Wn=t j o1|i|}|i||t|}nX|S(s`Treat a bogus SGML declaration as raw data. Treat a CDATA declaration as a CData object.i s iiN(R tjRtrawdataRPRRvR&RBRDRoRtparse_declarationtSGMLParseErrorttoHandleR<(RRPRvRLRPRB((RRNs     (*R[R\R]R!R2R5R>R+RRRRRRRt ALL_ENTITIESR R(RHRRR R RlRRRR%R#RR1R:R@RAR<RDRERFRHRJRKRN(((RR sD 03!E !      .       + t BeautifulSoupc Bs[tZdZdZeed/Zed d gZhde<d e tag should implicitly close the previous

    tag.

    Para1

    Para2 should be transformed into:

    Para1

    Para2 Some tags can be nested arbitrarily. For instance, the occurance of a

    tag should _not_ implicitly close the previous
    tag. Alice said:
    Bob said:
    Blah should NOT be transformed into: Alice said:
    Bob said:
    Blah Some tags can be nested, but the nesting is reset by the interposition of other tags. For instance, a
    , but not close a tag in another table.
    BlahBlah should be transformed into:
    BlahBlah but, Blah
    Blah should NOT be transformed into Blah
    Blah Differing assumptions about tag nesting rules are a major source of problems with the BeautifulSoup class. If BeautifulSoup is not treating as nestable a tag your page author treats as nestable, try ICantBelieveItsBeautifulSoup, MinimalSoup, or BeautifulStoneSoup before writing your own subclass.cOsB|idp|i|d    #    ( RSshrsinputRVsmetaRXslinksframesbaseR\(sspansfontRbsobjectRdssubRescenter(RgsdivRiRjRk(saddressRzR9spre(R[R\R]RRR R!R)R+R>tNESTABLE_INLINE_TAGStNESTABLE_BLOCK_TAGStNESTABLE_LIST_TAGStNESTABLE_TABLE_TAGStNON_NESTABLE_BLOCK_TAGSR5R2RRtMRR(((RRRs& .  H`     R cBstZRS(N(R[R\(((RR TstICantBelieveItsBeautifulSoupcBs2tZdZdZdZegeieeZRS(syThe BeautifulSoup class is oriented towards skipping over common HTML errors like unclosed tags. However, sometimes it makes errors of its own. For instance, consider this fragment: FooBar This is perfectly valid (if bizarre) HTML. However, the BeautifulSoup class will implicitly close the first b tag when it encounters the second 'b'. It will think the author wrote "FooBar", and didn't close the first 'b' tag, because there's no real-world reason to bold something that's already bold. When it encounters '' it will close two more 'b' tags, for a grand total of three tags closed instead of two. This can throw off the rest of your document structure. The same is true of a number of other tags, listed below. It's much more common for someone to forget to close a 'b' tag than to actually use nested 'b' tags, and the BeautifulSoup class handles the common case. This class handles the not-co-common case: where you can't believe someone wrote what they did, but it's valid HTML and BeautifulSoup screwed up by assuming it wouldn't be.temtbigRPtsmallttttabbrtacronymtstrongtcitetcodetdfntkbdtsamptvartbR{(RRRPRsttRRRRscodeRRRRRRR(snoscript(R[R\R]t*I_CANT_BELIEVE_THEYRE_NESTABLE_INLINE_TAGSt)I_CANT_BELIEVE_THEYRE_NESTABLE_BLOCK_TAGSRRRR2(((RRWs  t MinimalSoupcBs tZdZedZhZRS(sThe MinimalSoup class is for parsing HTML that contains pathologically bad markup. It makes no assumptions about tag nesting, but it does know which tags are self-closing, that