Package org.jsoup.parser
Class HtmlTreeBuilder
java.lang.Object
org.jsoup.parser.TreeBuilder
org.jsoup.parser.HtmlTreeBuilder
HTML Tree Builder; creates a DOM from Tokens.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate boolean
private Element
private Token.EndTag
private FormElement
private boolean
private boolean
private boolean
private Element
private static final int
static final int
private static final int
private HtmlTreeBuilderState
private List<Token.Character>
private String[]
private HtmlTreeBuilderState
(package private) static final String[]
(package private) static final String[]
(package private) static final String[]
(package private) static final String[]
(package private) static final String[]
(package private) static final String[]
(package private) static final String[]
(package private) static final String[]
private ArrayList<HtmlTreeBuilderState>
Fields inherited from class org.jsoup.parser.TreeBuilder
baseUri, currentToken, doc, parser, reader, seenTags, settings, stack, tokeniser
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescription(package private) Element
aboveOnStack
(Element el) (package private) void
(package private) void
(package private) void
private void
clearStackToContext
(String... nodeNames) (package private) void
(package private) void
(package private) void
(package private) void
closeElement
(String name) (package private) HtmlTreeBuilderState
(package private) ParseSettings
(package private) void
error
(HtmlTreeBuilderState state) (package private) boolean
(package private) void
framesetOk
(boolean framesetOk) (package private) void
(package private) void
generateImpliedEndTags
(boolean thorough) Pops elements off the stack according to the implied end tag rules(package private) void
generateImpliedEndTags
(String excludeTag) 13.2.6.3 Closing elements that have implied end tags When the steps below require the UA to generate implied end tags, then, while the current node is a dd element, a dt element, an li element, an optgroup element, an option element, a p element, an rb element, an rp element, an rt element, or an rtc element, the UA must pop the current node off the stack of open elements.(package private) Element
getActiveFormattingElement
(String nodeName) (package private) String
(package private) Document
(package private) FormElement
(package private) Element
getFromStack
(String elName) (package private) Element
(package private) List<Token.Character>
getStack()
(package private) boolean
inButtonScope
(String targetName) protected void
initialiseParse
(Reader input, String baseUri, Parser parser) (package private) boolean
inListItemScope
(String targetName) (package private) boolean
(package private) boolean
(package private) boolean
(package private) boolean
inSelectScope
(String targetName) (package private) void
private void
(package private) void
insert
(Token.Character characterToken) Inserts the provided character token into the current element.(package private) void
insert
(Token.Character characterToken, Element el) (package private) void
insert
(Token.Comment commentToken) (package private) Element
insert
(Token.StartTag startTag) (package private) Element
insertEmpty
(Token.StartTag startTag) (package private) FormElement
insertForm
(Token.StartTag startTag, boolean onStack, boolean checkTemplateStack) (package private) void
(package private) void
private void
insertNode
(Node node, Token token) Inserts the provided character token into the provided element.(package private) void
insertOnStackAfter
(Element after, Element in) (package private) Element
insertStartTag
(String startTagName) private boolean
inSpecificScope
(String[] targetNames, String[] baseTypes, String[] extraTypes) private boolean
inSpecificScope
(String targetName, String[] baseTypes, String[] extraTypes) (package private) boolean
inTableScope
(String targetName) protected boolean
isContentForTagData
(String normalName) (An internal method, visible for Element.(package private) boolean
(package private) boolean
(package private) boolean
private boolean
(package private) boolean
(package private) Element
(package private) void
(package private) void
maybeSetBaseUri
(Element base) (package private) HtmlTreeBuilder
Create a new copy of this TreeBuilder(package private) boolean
private static boolean
(package private) boolean
(package private) boolean
onStackNot
(String[] allowedTags) Tests if there is some element on the stack that is not in the provided set.(package private) HtmlTreeBuilderState
parseFragment
(String inputFragment, Element context, String baseUri, Parser parser) (package private) Element
pop()
(package private) void
popStackToBefore
(String elName) (package private) Element
popStackToClose
(String elName) (package private) void
popStackToClose
(String... elNames) (package private) HtmlTreeBuilderState
(package private) int
protected boolean
(package private) boolean
process
(Token token, HtmlTreeBuilderState state) (package private) void
(package private) void
(package private) void
(package private) void
pushWithBookmark
(Element in, int bookmark) (package private) void
(package private) void
(package private) boolean
(package private) Element
(package private) void
replaceActiveFormattingElement
(Element out, Element in) private void
replaceInQueue
(ArrayList<Element> queue, Element out, Element in) (package private) void
replaceOnStack
(Element out, Element in) (package private) void
Places the body back onto the stack and moves to InBody, for cases in AfterBody / AfterAfterBody when more content comes(package private) boolean
Reset the insertion mode, by searching up the stack for an appropriate insertion mode.(package private) void
(package private) void
setFormElement
(FormElement formElement) (package private) void
setFosterInserts
(boolean fosterInserts) (package private) void
setHeadElement
(Element headElement) (package private) HtmlTreeBuilderState
state()
(package private) int
toString()
(package private) void
transition
(HtmlTreeBuilderState state) Methods inherited from class org.jsoup.parser.TreeBuilder
currentElement, currentElementIs, error, error, onNodeClosed, onNodeInserted, parse, processEndTag, processStartTag, processStartTag, runParser, tagFor
-
Field Details
-
TagsSearchInScope
-
TagSearchList
-
TagSearchButton
-
TagSearchTableScope
-
TagSearchSelectScope
-
TagSearchEndTags
-
TagThoroughSearchEndTags
-
TagSearchSpecial
-
MaxScopeSearchDepth
public static final int MaxScopeSearchDepth- See Also:
-
state
-
originalState
-
baseUriSetFromDoc
private boolean baseUriSetFromDoc -
headElement
-
formElement
-
contextElement
-
formattingElements
-
tmplInsertMode
-
pendingTableCharacters
-
emptyEnd
-
framesetOk
private boolean framesetOk -
fosterInserts
private boolean fosterInserts -
fragmentParsing
private boolean fragmentParsing -
maxQueueDepth
private static final int maxQueueDepth- See Also:
-
specificScopeTarget
-
maxUsedFormattingElements
private static final int maxUsedFormattingElements- See Also:
-
-
Constructor Details
-
HtmlTreeBuilder
public HtmlTreeBuilder()
-
-
Method Details
-
defaultSettings
ParseSettings defaultSettings()- Specified by:
defaultSettings
in classTreeBuilder
-
newInstance
HtmlTreeBuilder newInstance()Description copied from class:TreeBuilder
Create a new copy of this TreeBuilder- Specified by:
newInstance
in classTreeBuilder
- Returns:
- copy, ready for a new parse
-
initialiseParse
@ParametersAreNonnullByDefault protected void initialiseParse(Reader input, String baseUri, Parser parser) - Overrides:
initialiseParse
in classTreeBuilder
-
parseFragment
List<Node> parseFragment(String inputFragment, @Nullable Element context, String baseUri, Parser parser) - Specified by:
parseFragment
in classTreeBuilder
-
process
- Specified by:
process
in classTreeBuilder
-
process
-
transition
-
state
HtmlTreeBuilderState state() -
markInsertionMode
void markInsertionMode() -
originalState
HtmlTreeBuilderState originalState() -
framesetOk
void framesetOk(boolean framesetOk) -
framesetOk
boolean framesetOk() -
getDocument
Document getDocument() -
getBaseUri
String getBaseUri() -
maybeSetBaseUri
-
isFragmentParsing
boolean isFragmentParsing() -
error
-
insert
-
insertStartTag
-
insert
-
insert
-
insertEmpty
-
insertForm
-
insert
-
insert
Inserts the provided character token into the current element. -
insert
-
insertNode
Inserts the provided character token into the provided element. Use when not going onto stack element -
pop
Element pop() -
push
-
getStack
-
onStack
-
onStack
-
onStack
-
getFromStack
-
removeFromStack
-
popStackToClose
-
popStackToClose
-
popStackToBefore
-
clearStackToTableContext
void clearStackToTableContext() -
clearStackToTableBodyContext
void clearStackToTableBodyContext() -
clearStackToTableRowContext
void clearStackToTableRowContext() -
clearStackToContext
-
aboveOnStack
-
insertOnStackAfter
-
replaceOnStack
-
replaceInQueue
-
resetInsertionMode
boolean resetInsertionMode()Reset the insertion mode, by searching up the stack for an appropriate insertion mode. The stack search depth is limited tomaxQueueDepth
.- Returns:
- true if the insertion mode was actually changed.
-
resetBody
void resetBody()Places the body back onto the stack and moves to InBody, for cases in AfterBody / AfterAfterBody when more content comes -
inSpecificScope
-
inSpecificScope
-
inScope
-
inScope
-
inScope
-
inListItemScope
-
inButtonScope
-
inTableScope
-
inSelectScope
-
onStackNot
Tests if there is some element on the stack that is not in the provided set. -
setHeadElement
-
getHeadElement
Element getHeadElement() -
isFosterInserts
boolean isFosterInserts() -
setFosterInserts
void setFosterInserts(boolean fosterInserts) -
getFormElement
-
setFormElement
-
resetPendingTableCharacters
void resetPendingTableCharacters() -
getPendingTableCharacters
List<Token.Character> getPendingTableCharacters() -
addPendingTableCharacters
-
generateImpliedEndTags
13.2.6.3 Closing elements that have implied end tags When the steps below require the UA to generate implied end tags, then, while the current node is a dd element, a dt element, an li element, an optgroup element, an option element, a p element, an rb element, an rp element, an rt element, or an rtc element, the UA must pop the current node off the stack of open elements. If a step requires the UA to generate implied end tags but lists an element to exclude from the process, then the UA must perform the above steps as if that element was not in the above list. When the steps below require the UA to generate all implied end tags thoroughly, then, while the current node is a caption element, a colgroup element, a dd element, a dt element, an li element, an optgroup element, an option element, a p element, an rb element, an rp element, an rt element, an rtc element, a tbody element, a td element, a tfoot element, a th element, a thead element, or a tr element, the UA must pop the current node off the stack of open elements.- Parameters:
excludeTag
- If a step requires the UA to generate implied end tags but lists an element to exclude from the process, then the UA must perform the above steps as if that element was not in the above list.
-
generateImpliedEndTags
void generateImpliedEndTags() -
generateImpliedEndTags
void generateImpliedEndTags(boolean thorough) Pops elements off the stack according to the implied end tag rules- Parameters:
thorough
- if we are thorough (includes table elements etc) or not
-
closeElement
-
isSpecial
-
lastFormattingElement
Element lastFormattingElement() -
positionOfElement
-
removeLastFormattingElement
Element removeLastFormattingElement() -
pushActiveFormattingElements
-
pushWithBookmark
-
checkActiveFormattingElements
-
isSameFormattingElement
-
reconstructFormattingElements
void reconstructFormattingElements() -
clearFormattingElementsToLastMarker
void clearFormattingElementsToLastMarker() -
removeFromActiveFormattingElements
-
isInActiveFormattingElements
-
getActiveFormattingElement
-
replaceActiveFormattingElement
-
insertMarkerToFormattingElements
void insertMarkerToFormattingElements() -
insertInFosterParent
-
pushTemplateMode
-
popTemplateMode
-
templateModeSize
int templateModeSize() -
currentTemplateMode
-
toString
-
isContentForTagData
Description copied from class:TreeBuilder
(An internal method, visible for Element. For HTML parse, signals that script and style text should be treated as Data Nodes).- Overrides:
isContentForTagData
in classTreeBuilder
-