Package org.jsoup.parser
Class HtmlTreeBuilder
java.lang.Object
org.jsoup.parser.TreeBuilder
org.jsoup.parser.HtmlTreeBuilder
HTML Tree Builder; creates a DOM from Tokens.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate booleanprivate Elementprivate Token.EndTagprivate FormElementprivate booleanprivate booleanprivate booleanprivate Elementprivate static final intstatic final intprivate static final intprivate HtmlTreeBuilderStateprivate List<Token.Character> private final String[]private HtmlTreeBuilderState(package private) static final String[](package private) static final String[](package private) static final String[](package private) static final String[](package private) static final String[](package private) static final String[](package private) static final String[](package private) static final String[](package private) static final String[](package private) static final String[]private ArrayList<HtmlTreeBuilderState> Fields inherited from class org.jsoup.parser.TreeBuilder
baseUri, currentToken, doc, parser, reader, seenTags, settings, stack, tokeniser, trackSourceRange -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescription(package private) ElementaboveOnStack(Element el) (package private) void(package private) void(package private) voidprivate voidclearStackToContext(String... nodeNames) Removes elements from the stack until one of the supplied HTML elements is removed.(package private) void(package private) void(package private) void(package private) voidcloseElement(String name) (package private) ElementcreateElementFor(Token.StartTag startTag, String namespace, boolean forcePreserveCase) (package private) HtmlTreeBuilderState(package private) ParseSettingsprivate voiddoInsertElement(Element el, Token token) Inserts the Element onto the stack.(package private) voiderror(HtmlTreeBuilderState state) (package private) boolean(package private) voidframesetOk(boolean framesetOk) (package private) void(package private) voidgenerateImpliedEndTags(boolean thorough) Pops HTML elements off the stack according to the implied end tag rules(package private) voidgenerateImpliedEndTags(String excludeTag) 13.2.6.3 Closing elements that have implied end tags When the steps below require the UA to generate implied end tags, then, while the current node is a dd element, a dt element, an li element, an optgroup element, an option element, a p element, an rb element, an rp element, an rt element, or an rtc element, the UA must pop the current node off the stack of open elements.(package private) ElementgetActiveFormattingElement(String nodeName) (package private) String(package private) Document(package private) FormElement(package private) ElementgetFromStack(String elName) Gets the nearest (lowest) HTML element with the given name from the stack.(package private) Element(package private) List<Token.Character> getStack()(package private) booleaninButtonScope(String targetName) protected voidinitialiseParse(Reader input, String baseUri, Parser parser) (package private) booleaninListItemScope(String targetName) (package private) boolean(package private) boolean(package private) boolean(package private) booleaninSelectScope(String targetName) (package private) voidinsertCharacterNode(Token.Character characterToken) Inserts the provided character token into the current element.(package private) voidinsertCharacterToElement(Token.Character characterToken, Element el) Inserts the provided character token into the provided element.(package private) voidinsertCommentNode(Token.Comment token) (package private) ElementinsertElementFor(Token.StartTag startTag) Inserts an HTML element for the given tag)(package private) ElementinsertEmptyElementFor(Token.StartTag startTag) (package private) ElementinsertForeignElementFor(Token.StartTag startTag, String namespace) Inserts a foreign element.(package private) FormElementinsertFormElement(Token.StartTag startTag, boolean onStack, boolean checkTemplateStack) (package private) void(package private) void(package private) voidinsertOnStackAfter(Element after, Element in) private booleaninSpecificScope(String[] targetNames, String[] baseTypes, String[] extraTypes) private booleaninSpecificScope(String targetName, String[] baseTypes, String[] extraTypes) (package private) booleaninTableScope(String targetName) protected booleanisContentForTagData(String normalName) (An internal method, visible for Element.(package private) boolean(package private) boolean(package private) static boolean(package private) boolean(package private) static booleanprivate static boolean(package private) static boolean(package private) Element(package private) void(package private) voidmaybeSetBaseUri(Element base) (package private) HtmlTreeBuilderCreate a new copy of this TreeBuilder(package private) booleanChecks if there is an HTML element with the given name on the stack.private static boolean(package private) boolean(package private) booleanonStackNot(String[] allowedTags) Tests if there is some element on the stack that is not in the provided set.(package private) HtmlTreeBuilderStateparseFragment(String inputFragment, Element context, String baseUri, Parser parser) (package private) ElementpopStackToClose(String elName) Pops the stack until the given HTML element is removed.(package private) voidpopStackToClose(String... elNames) Pops the stack until one of the given HTML elements is removed.(package private) ElementpopStackToCloseAnyNamespace(String elName) Pops the stack until an element with the supplied name is removed, irrespective of namespace.(package private) HtmlTreeBuilderState(package private) intprotected boolean(package private) booleanprocess(Token token, HtmlTreeBuilderState state) (package private) void(package private) void(package private) voidpushWithBookmark(Element in, int bookmark) (package private) void(package private) void(package private) boolean(package private) Element(package private) voidreplaceActiveFormattingElement(Element out, Element in) private static voidreplaceInQueue(ArrayList<Element> queue, Element out, Element in) (package private) voidreplaceOnStack(Element out, Element in) (package private) voidPlaces the body back onto the stack and moves to InBody, for cases in AfterBody / AfterAfterBody when more content comes(package private) booleanReset the insertion mode, by searching up the stack for an appropriate insertion mode.(package private) void(package private) voidsetFormElement(FormElement formElement) (package private) voidsetFosterInserts(boolean fosterInserts) (package private) voidsetHeadElement(Element headElement) (package private) HtmlTreeBuilderStatestate()(package private) inttoString()(package private) voidtransition(HtmlTreeBuilderState state) (package private) booleanuseCurrentOrForeignInsert(Token token) Methods inherited from class org.jsoup.parser.TreeBuilder
currentElement, currentElementIs, currentElementIs, defaultNamespace, error, error, onNodeClosed, onNodeInserted, parse, pop, processEndTag, processStartTag, processStartTag, push, runParser, tagFor, tagFor
-
Field Details
-
TagsSearchInScope
-
TagSearchList
-
TagSearchButton
-
TagSearchTableScope
-
TagSearchSelectScope
-
TagSearchEndTags
-
TagThoroughSearchEndTags
-
TagSearchSpecial
-
TagMathMlTextIntegration
-
TagSvgHtmlIntegration
-
MaxScopeSearchDepth
public static final int MaxScopeSearchDepth- See Also:
-
state
-
originalState
-
baseUriSetFromDoc
private boolean baseUriSetFromDoc -
headElement
-
formElement
-
contextElement
-
formattingElements
-
tmplInsertMode
-
pendingTableCharacters
-
emptyEnd
-
framesetOk
private boolean framesetOk -
fosterInserts
private boolean fosterInserts -
fragmentParsing
private boolean fragmentParsing -
maxQueueDepth
private static final int maxQueueDepth- See Also:
-
specificScopeTarget
-
maxUsedFormattingElements
private static final int maxUsedFormattingElements- See Also:
-
-
Constructor Details
-
HtmlTreeBuilder
public HtmlTreeBuilder()
-
-
Method Details
-
defaultSettings
ParseSettings defaultSettings()- Specified by:
defaultSettingsin classTreeBuilder
-
newInstance
HtmlTreeBuilder newInstance()Description copied from class:TreeBuilderCreate a new copy of this TreeBuilder- Specified by:
newInstancein classTreeBuilder- Returns:
- copy, ready for a new parse
-
initialiseParse
- Overrides:
initialiseParsein classTreeBuilder
-
parseFragment
- Specified by:
parseFragmentin classTreeBuilder
-
process
- Specified by:
processin classTreeBuilder
-
useCurrentOrForeignInsert
-
isMathmlTextIntegration
-
isHtmlIntegration
-
process
-
transition
-
state
HtmlTreeBuilderState state() -
markInsertionMode
void markInsertionMode() -
originalState
HtmlTreeBuilderState originalState() -
framesetOk
void framesetOk(boolean framesetOk) -
framesetOk
boolean framesetOk() -
getDocument
Document getDocument() -
getBaseUri
String getBaseUri() -
maybeSetBaseUri
-
isFragmentParsing
boolean isFragmentParsing() -
error
-
createElementFor
-
insertElementFor
Inserts an HTML element for the given tag) -
insertForeignElementFor
Inserts a foreign element. Preserves the case of the tag name and of the attributes. -
insertEmptyElementFor
-
insertFormElement
-
doInsertElement
Inserts the Element onto the stack. All element inserts must run through this method. Performs any general tests on the Element before insertion.- Parameters:
el- the Element to insert and make the current elementtoken- the token this element was parsed from. If null, uses a zero-width current token as intrinsic insert
-
insertCommentNode
-
insertCharacterNode
Inserts the provided character token into the current element. -
insertCharacterToElement
Inserts the provided character token into the provided element. -
getStack
-
onStack
-
onStack
Checks if there is an HTML element with the given name on the stack. -
onStack
-
getFromStack
Gets the nearest (lowest) HTML element with the given name from the stack. -
removeFromStack
-
popStackToClose
Pops the stack until the given HTML element is removed. -
popStackToCloseAnyNamespace
Pops the stack until an element with the supplied name is removed, irrespective of namespace. -
popStackToClose
Pops the stack until one of the given HTML elements is removed. -
clearStackToTableContext
void clearStackToTableContext() -
clearStackToTableBodyContext
void clearStackToTableBodyContext() -
clearStackToTableRowContext
void clearStackToTableRowContext() -
clearStackToContext
Removes elements from the stack until one of the supplied HTML elements is removed. -
aboveOnStack
-
insertOnStackAfter
-
replaceOnStack
-
replaceInQueue
-
resetInsertionMode
boolean resetInsertionMode()Reset the insertion mode, by searching up the stack for an appropriate insertion mode. The stack search depth is limited tomaxQueueDepth.- Returns:
- true if the insertion mode was actually changed.
-
resetBody
void resetBody()Places the body back onto the stack and moves to InBody, for cases in AfterBody / AfterAfterBody when more content comes -
inSpecificScope
-
inSpecificScope
-
inScope
-
inScope
-
inScope
-
inListItemScope
-
inButtonScope
-
inTableScope
-
inSelectScope
-
onStackNot
Tests if there is some element on the stack that is not in the provided set. -
setHeadElement
-
getHeadElement
Element getHeadElement() -
isFosterInserts
boolean isFosterInserts() -
setFosterInserts
void setFosterInserts(boolean fosterInserts) -
getFormElement
FormElement getFormElement() -
setFormElement
-
resetPendingTableCharacters
void resetPendingTableCharacters() -
getPendingTableCharacters
List<Token.Character> getPendingTableCharacters() -
addPendingTableCharacters
-
generateImpliedEndTags
13.2.6.3 Closing elements that have implied end tags When the steps below require the UA to generate implied end tags, then, while the current node is a dd element, a dt element, an li element, an optgroup element, an option element, a p element, an rb element, an rp element, an rt element, or an rtc element, the UA must pop the current node off the stack of open elements. If a step requires the UA to generate implied end tags but lists an element to exclude from the process, then the UA must perform the above steps as if that element was not in the above list. When the steps below require the UA to generate all implied end tags thoroughly, then, while the current node is a caption element, a colgroup element, a dd element, a dt element, an li element, an optgroup element, an option element, a p element, an rb element, an rp element, an rt element, an rtc element, a tbody element, a td element, a tfoot element, a th element, a thead element, or a tr element, the UA must pop the current node off the stack of open elements.- Parameters:
excludeTag- If a step requires the UA to generate implied end tags but lists an element to exclude from the process, then the UA must perform the above steps as if that element was not in the above list.
-
generateImpliedEndTags
void generateImpliedEndTags() -
generateImpliedEndTags
void generateImpliedEndTags(boolean thorough) Pops HTML elements off the stack according to the implied end tag rules- Parameters:
thorough- if we are thorough (includes table elements etc) or not
-
closeElement
-
isSpecial
-
lastFormattingElement
Element lastFormattingElement() -
positionOfElement
-
removeLastFormattingElement
Element removeLastFormattingElement() -
pushActiveFormattingElements
-
pushWithBookmark
-
checkActiveFormattingElements
-
isSameFormattingElement
-
reconstructFormattingElements
void reconstructFormattingElements() -
clearFormattingElementsToLastMarker
void clearFormattingElementsToLastMarker() -
removeFromActiveFormattingElements
-
isInActiveFormattingElements
-
getActiveFormattingElement
-
replaceActiveFormattingElement
-
insertMarkerToFormattingElements
void insertMarkerToFormattingElements() -
insertInFosterParent
-
pushTemplateMode
-
popTemplateMode
HtmlTreeBuilderState popTemplateMode() -
templateModeSize
int templateModeSize() -
currentTemplateMode
HtmlTreeBuilderState currentTemplateMode() -
toString
-
isContentForTagData
Description copied from class:TreeBuilder(An internal method, visible for Element. For HTML parse, signals that script and style text should be treated as Data Nodes).- Overrides:
isContentForTagDatain classTreeBuilder
-