Class DataUtil

java.lang.Object
org.jsoup.helper.DataUtil

public final class DataUtil extends Object
Internal static utilities for handling data.
  • Field Details

    • charsetPattern

      private static final Pattern charsetPattern
    • UTF_8

      public static final Charset UTF_8
    • defaultCharsetName

      static final String defaultCharsetName
    • firstReadBufferSize

      private static final int firstReadBufferSize
      See Also:
    • bufferSize

      static final int bufferSize
      See Also:
    • mimeBoundaryChars

      private static final char[] mimeBoundaryChars
    • boundaryLength

      static final int boundaryLength
      See Also:
  • Constructor Details

    • DataUtil

      private DataUtil()
  • Method Details

    • load

      public static Document load(File file, @Nullable String charsetName, String baseUri) throws IOException
      Loads and parses a file to a Document, with the HtmlParser. Files that are compressed with gzip (and end in .gz or .z) are supported in addition to uncompressed files.
      Parameters:
      file - file to load
      charsetName - (optional) character set of input; specify null to attempt to autodetect. A BOM in the file will always override this setting.
      baseUri - base URI of document, to resolve relative links against
      Returns:
      Document
      Throws:
      IOException - on IO error
    • load

      public static Document load(File file, @Nullable String charsetName, String baseUri, Parser parser) throws IOException
      Loads and parses a file to a Document. Files that are compressed with gzip (and end in .gz or .z) are supported in addition to uncompressed files.
      Parameters:
      file - file to load
      charsetName - (optional) character set of input; specify null to attempt to autodetect. A BOM in the file will always override this setting.
      baseUri - base URI of document, to resolve relative links against
      parser - alternate parser to use.
      Returns:
      Document
      Throws:
      IOException - on IO error
      Since:
      1.14.2
    • load

      public static Document load(@WillClose InputStream in, @Nullable String charsetName, String baseUri) throws IOException
      Parses a Document from an input steam.
      Parameters:
      in - input stream to parse. The stream will be closed after reading.
      charsetName - character set of input (optional)
      baseUri - base URI of document, to resolve relative links against
      Returns:
      Document
      Throws:
      IOException - on IO error
    • load

      public static Document load(@WillClose InputStream in, @Nullable String charsetName, String baseUri, Parser parser) throws IOException
      Parses a Document from an input steam, using the provided Parser.
      Parameters:
      in - input stream to parse. The stream will be closed after reading.
      charsetName - character set of input (optional)
      baseUri - base URI of document, to resolve relative links against
      parser - alternate parser to use.
      Returns:
      Document
      Throws:
      IOException - on IO error
    • crossStreams

      static void crossStreams(InputStream in, OutputStream out) throws IOException
      Writes the input stream to the output stream. Doesn't close them.
      Parameters:
      in - input stream to read from
      out - output stream to write to
      Throws:
      IOException - on IO error
    • parseInputStream

      static Document parseInputStream(@Nullable @WillClose InputStream input, @Nullable String charsetName, String baseUri, Parser parser) throws IOException
      Throws:
      IOException
    • readToByteBuffer

      public static ByteBuffer readToByteBuffer(InputStream inStream, int maxSize) throws IOException
      Read the input stream into a byte buffer. To deal with slow input streams, you may interrupt the thread this method is executing on. The data read until being interrupted will be available.
      Parameters:
      inStream - the input stream to read from
      maxSize - the maximum size in bytes to read from the stream. Set to 0 to be unlimited.
      Returns:
      the filled byte buffer
      Throws:
      IOException - if an exception occurs whilst reading from the input stream.
    • emptyByteBuffer

      static ByteBuffer emptyByteBuffer()
    • getCharsetFromContentType

      @Nullable static String getCharsetFromContentType(@Nullable String contentType)
      Parse out a charset from a content type header. If the charset is not supported, returns null (so the default will kick in.)
      Parameters:
      contentType - e.g. "text/html; charset=EUC-JP"
      Returns:
      "EUC-JP", or null if not found. Charset is trimmed and uppercased.
    • validateCharset

      @Nullable private static String validateCharset(@Nullable String cs)
    • mimeBoundary

      static String mimeBoundary()
      Creates a random string, suitable for use as a mime boundary
    • detectCharsetFromBom

      @Nullable private static DataUtil.BomCharset detectCharsetFromBom(ByteBuffer byteData)