第七章 IO 和 XML

File

为什么要学习 File ？—— 使用计算机存储数据，需要用一个对象来表示这个文件。

我们需要了解：

① 文件存储在哪？—— 路径

绝对路径：绝对路径带盘符。

1
2


"C:\VScode\MainProject_rewrite\image\about_img.png"
"C:\VScode\logindownload\image\bg.jpg"

相对路径：相对路径不带盘符，是相对当前项目而言的路径。

1
2


"MainProject_rewrite\image\about_img.png"
"logindownload\image\bg.jpg"

② 数据怎么传输？—— I/O 流

File 类的构造方法

构造一个 File 类的实例，需要文件或者目录的路径来创建。

常用构造方法：

1
2
3
4
5


File file = new File(“pathname”);
  
    String str = "C:\\JavaStudy\\a.txt";
    File f1 = new File(str);
    System.out.println(f1);

创建一个新的 File 实例，该实例的存放路径是 pathname。

其它构造方法：

1
2
3
4
5
6
7
8


File(String parent, String child);

    String parent = "C:\\JavaStudy";
    String child = "a.txt";
    File f2 = new File(parent, child);

    // 等价于自己拼接（不推荐）
    File f3 = new File(parent + "\\" + child);

创建一个新的 File 实例，该实例的存放路径是由 parent 和child 拼接而成的。

1
2
3
4
5


File(File parent, String child);

    File parent = new File("C:\\JavaStudy");
    String child = "a.txt";
    File f3 = new File(parent, child);

创建一个新的 File 实例，parent 代表目录， child 代表文件名，因此该实例的存放路径是 parent 目录中的 child 文件。

1

File(URI uri);

创建一个新的 File 实例，该实例的存放路径是由 URI 类型的参数指定的。

构造 File 时，路径需要符合操作系统的 命名规则。

File.pathSeparator：与系统有关的路径分隔符，值是一个字符串，如在 Windows 中的此值是 ";"，在 Linux 中的此值是 ":"。
File.pathSeparatorChar：与系统有关的路径分隔符，值是一个字符，如在 Windows 中的此值是 ';'，在 Linux 中的此值是 ':'。
File.separator：与系统有关的路径层级分隔符，值是一个字符串，如在 Windows 中的此值是 "\"，在 Linux 中的此值是 "/"。
File.separatorChar：与系统有关的路径层级分隔符，值是一个字符，如在 Windows 中的此值是'\'，在 Linux 中的此值是'/'。

1
2
3
4
5
6
7


import java.io.File;
public class TestFileSeparator {
    public static void main(String[] args) {
        System.out.println("Windows系统中的pathSeparator值为：" + File.pathSeparator);
        System.out.println("Windows系统中的separator值为：" + File.separator);
    }
}

File API

File 的判断与获取

方法名称	说明
public boolean isDirectory()	判断此路径表示的 File 是否为文件夹
public boolean isFile()	判断此路径表示的 File 是否为文件
public boolean exists()	判断此路径表示的 File 是否存在
public long length()	返回文件的大小（字节数量）
publi String getAbsoluteFile()	返回文件的绝对路径
publi String getPath()	返回定义文件时使用的路径
publi String getName()	返回文件的名称，带后缀
public long lastModified()	返回文件的最后修改时间（时间毫秒值）

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43


File f1 = new File("C:\\JavaStudy\\HelloWorld.java");
File f2 = new File("C:\\JavaStudy");
File f3 = new File("MyProjcet\\src");


System.out.println("f1是否为文件夹： " + f1.isDirectory());
System.out.println("f2是否为文件夹：" + f2.isDirectory());
System.out.println("f1是否为文件：" + f1.isFile());
System.out.println("f1是否存在：" + f1.exists());


// length() 方法返回文件大小
long a = f1.length();
System.out.println("f1的大小：" + a);
// 使用文件夹调用 length() 方法，不同操作系统返回的值不一样，但都是不对的
System.out.println("f2的大小：" + f2.length());


// getAbsoluteFile() 返回绝对路径
System.out.println("f1的绝对路径：" + f1.getAbsoluteFile());
System.out.println("f2的绝对路径：" + f2.getAbsoluteFile());
System.out.println("f3的绝对路径：" + f3.getAbsoluteFile());


// getPath() 方法返回定义时的路径
System.out.println("f1的定义时路径：" + f1.getPath());
System.out.println("f3的定义时路径：" + f3.getPath());


// 文件返回文件名+后缀，文件夹返回文件夹名
String str1 =  f1.getName();
String str2 =  f2.getName();
System.out.println("f1的名字：" + str1);
System.out.println("f2的名字：" + str2);


// lastModified() 方法返回最后修改时间（时间毫秒值）
long b = f1.lastModified();
System.out.println("f1的最后修改时间：" + b);
// 把毫秒值变为字符串表示的时间
SimpleDateFormat sdf = new SimpleDateFormat("yyyy年MM月dd日 HH:mm:ss");
Date date = new Date(f1.lastModified());
System.out.println(sdf.format(date));

File 的创建与删除

方法名称	说明
public boolean createNewFile()	创建一个新的空文件
public boolean mkdir()	创建单级文件夹
public boolean mkdirs()	创建多级文件夹
public boolean delete()	删除文件、文件夹（不走回收站）

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34


File f1 = new File("C:\\JavaStudy\\createNewFile.txt");
boolean a = f1.createNewFile();
System.out.println("创建文件：" + a);
boolean b = f1.createNewFile();
// 1.若不存在，创建成功，返回true；若已存在，创建失败，返回false
System.out.println("重复创建文件：" + b);
// 2.尝试在不存在的父级目录下新建文件，出现异常IOException
File f2 = new File("C:\\JavaStudy\\test\\createNewFile.txt");
// 3.如果不写后缀名，则会创建一个没有后缀的文件
File f3 = new File("C:\\JavaStudy\\createNewFile");
boolean c = f3.createNewFile();
System.out.println("创建没有后缀的文件：" + c);


// 路径不能重复
File f4 = new File("C:\\JavaStudy\\createNewFile");
boolean d = f4.mkdir();
System.out.println("创建路径相同的文件夹：" + d);
// 只能创建单级文件夹
File f5 = new File("C:\\JavaStudy\\test");
boolean e = f5.mkdir();
System.out.println("创建文件夹：" + e);


// mkdirs() 方法可以创建单级文件夹，也可以创建多级文件夹
File f6 = new File("C:\\JavaStudy\\test\\oth");
boolean f = f6.mkdirs();
System.out.println("创建多级文件夹：" + f);


// 1.如果删除的是文件，直接删除
// 2.如果删除的是文件夹，空文件夹直接删除，非空文件夹删除失败
boolean g = f6.delete();
System.out.println("删除文件或文件夹：" + g);

File 的获取并遍历

方法名称	说明
🍀 public File[] listFiles()	获取当前路径下所有内容

1
2
3
4
5
6


File f1 = new File("C:\\JavaStudy");
File[] files = f1.listFiles();
for (File file : files) {
    // file 依次表示 JavaStudy 文件夹中的每一个文件
    System.out.println(file);
}

当调用者 File 表示的路径不存在时，返回 null。
当调用者 File 表示的路径是文件时，返回 null。
当调用者 File 表示的路径是一个空文件夹时，返回一个长度为 0 的数组。
当调用者 File 表示的路径是一个有内容的文件夹时，将里面所有文件和文件夹的路径放在 File 数组中返回。
当调用者File表示的路径是一个有隐藏文件的文件夹时，将里面所有文件和文件夹的路径放在 File 数组中返回，包含隐藏文件。
当调用者 File 表示的路径是需要权限才能访问的文件夹时，返回 null。

File 的其他获取并遍历方法

方法名称	说明
public static File[] listRoots()	列出可用的文件系统根
public string[ ] list()	获取当前该路径下所有内容（仅名字）
public string[ ] list(FilenameFilter filter)	利用文件名过滤器获取当前该路径下所有内容
public File[] listFiles()	获取当前该路径下所有内容
public File[] listFiles(FileFilter filter)	利用文件名过滤器获取当前该路径下所有内容
public File[ ] listFiles(FilenameFilter filter)	利用文件名过滤器获取当前该路径下所有内容

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65


// 获取系统中所有盘符
File[] arr = File.listRoots();
System.out.println("系统盘符："+Arrays.toString(arr));


// 获取当前路径下所有内容，仅能获取名字
File f1 = new File("C:\\JavaStudy");
String[] list = f1.list();
String str = "";
for (String s : list) {
    if (str != ""){
        str = str + ", " + s;
    }else str = s;
}
System.out.println("获取当前路径下所有内容为：" + "["+str+"]");


// 利用文件名过滤器获取当前目录下所有内容
// 例：获取当前目录下所有 txt 文件
String[] arr1 = f1.list(new FilenameFilter() {	// 匿名内部类
    @Override
    public boolean accept(File dir, String name) {
        // 拼接父级路径和子级路径
        File src = new File(dir,name);
        // 如果返回值为true，则表示当前路径保留；否则舍弃
        return src.isFile() && name.endsWith(".txt");
    }
});
System.out.println("前目录下所有 txt 文件：" + Arrays.toString(arr1));


// 通过 f1.listFiles() 获取所有内容再根据后缀获取 txt 文件
File[] files = f1.listFiles();
System.out.println("当前目录下所有内容：" + Arrays.toString(files));
System.out.print("当前目录下txt文件：");
for (File file : files) {
    if (file.isFile() && file.getName().endsWith(".txt")) {
        System.out.println(file);
    }
}


// public File[] listFiles(FileFilter filter)
File[] arr2 = f1.listFiles(new FileFilter() {
    @Override
    public boolean accept(File pathname) {
        // return true;
        return pathname.isFile() && pathname.getName().endsWith(".txt");
    }
});
System.out.println
        ("listFiles(FileFilter filter)方法返回txt文件：" + Arrays.toString(arr2));


// public File[ ] listFiles(FilenameFilter filter)
File[] arr3 = f1.listFiles(new FilenameFilter() {
    @Override
    // 与 listFiles(FileFilter filter) 不同之处在于返回的是父级路径和子级路径
    public boolean accept(File dir, String name) {
        File src = new File(dir,name);
        return src.isFile() && name.endsWith(".txt");
    }
});
System.out.println
        ("listFiles(FilenameFilter filter)方法返回txt文件：" + Arrays.toString(arr3));

综合练习

① 在当前模块下的 aaa 文件夹中创建一个 a.txt 文件。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


import java.io.File;
import java.io.IOException;

public class Test1 {
    public static void main(String[] args) throws IOException {
        File f1 = new File("aaa");
        if(!f1.exists()){
            f1.mkdirs();
        }
        File f2 = new File(f1,"a.txt");
        f2.createNewFile();
    }
}

② 定义一个方法，查找某文件夹内是否有以 mp4 结尾的视频。（不考虑子文件夹）

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


import java.io.File;

public class Test2 {
    public static void main(String[] args) {
        File file = new File("C:\\JavaStudy");
        System.out.println(haveMp4(file));
    }
    
    public static boolean haveMp4(File file) {
        File[] files = file.listFiles();
        for (File f : files) {
            if (f.isFile() && f.getName().endsWith(".mp4")) {
                // System.out.println(f.getName());
                return true;
            }
        }
        return false;
    }
}

③ 找到电脑中所有以 mp4 结尾的视频。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32


import java.io.File;

public class Test3 {
    public static void main(String[] args) {
        findMp4();
    }

    public static void findMp4(File f1){
        File[] files = f1.listFiles();
        // files 不为 null（能够访问盘符或文件夹）
        if (files != null){
            for (File file : files) {
                if (file.isFile()){
                    // 如果是文件，再判断后缀
                    if (file.getName().endsWith(".mp4")){
                        System.out.println(file);
                    }
                }else {
                    // 不是文件，则一定是文件夹，递归
                    findMp4(file);
                }
            }
        }
    }

    public static void findMp4(){
        File[] arr = File.listRoots();
        for (File f : arr) {
            findMp4(f);
        }
    }
}

④ 删除一个多级文件夹。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22


import java.io.File;

public class Test4 {
    public static void main(String[] args) {
        File f1 = new File("aaa");
        deleteFile(f1);
    }

    public static void deleteFile(File f1){
        File[] files = f1.listFiles();
        for (File file : files) {
            if (file.isFile()){
                file.delete();
            }else {
                // 递归
                deleteFile(file);
            }
        }
        // 最后删除自己
        f1.delete();
    }
}

⑤ 统计一个文件夹的总大小

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23


import java.io.File;

public class Test5 {
    public static void main(String[] args) {
        File f1 = new File("C:\\JavaStudy");
        System.out.println(getSize(f1));
    }

    public static long getSize(File f1){
        long size = 0;
        File[] files = f1.listFiles();
        for (File file : files) {
            if (file.isFile()){
                size += file.length();
            }else {
                // 需要定义全局变量，否则递归时重新定义局部变量
                // 在递归时带上 size
                size += getSize(file);
            }
        }
        return size;
    }
}

🏆 ⑥ 统计各种文件的数量。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58


import java.io.File;
import java.util.HashMap;
import java.util.Map;
import java.util.Set;

public class Test6 {
    public static void main(String[] args) {
        /*
            需求：统计文件夹内每种文件的个数并打印。（考虑子文件夹）
            输出：txt：3个   doc：4个   jpg：6个
            涉及知识：File  递归  Map集合
         */
        File f1 = new File("C:\\JavaStudy");
        HashMap<String, Integer> m1 = getCount(f1);

        Set<Map.Entry<String, Integer>> entries = m1.entrySet();
        for (Map.Entry<String, Integer> entry : entries) {
            String key = entry.getKey();
            int value = entry.getValue();
            System.out.println(key + "：" + value + "个");
        }
    }

    public static HashMap<String, Integer> getCount(File f1) {
        HashMap<String, Integer> m1 = new HashMap<>();
        File[] files = f1.listFiles();
        for (File file : files) {
            if (file.isFile()) {
                String name = file.getName();
                String[] arr = name.split("\\.");
                if (arr.length >= 2) {
                    String endName = arr[arr.length - 1];
                    if (m1.containsKey(endName)) {
                        int count = m1.get(endName);
                        count++;
                        m1.put(endName, count);
                    } else {
                        m1.put(endName, 1);}
                }
            } else {
                HashMap<String, Integer> m2 = getCount(file);
                Set<Map.Entry<String, Integer>> entries = m2.entrySet();
                for (Map.Entry<String, Integer> entry : entries) {
                    String key = entry.getKey();
                    int value = entry.getValue();
                    if (m1.containsKey(key)) {
                        int count = m1.get(key);
                        count += value;
                        m1.put(key, count);
                    } else {
                        m1.put(key, value);
                    }
                }
            }
        }
        return m1;
    }
}

I/O 流

I/O 流简介

File 类只能对文件本身进行操作，不能读写文件里面存储的数据，如果想要读写数据，就需要用到 I/O 流。

I/O 流可以做到：

① 将程序中的数据保存到本地文件中（写出 output）

② 将本地文件中的数据加载到程序中（读取 input）

字节流 通常用来处理二进制文件，如音乐、图片文件等，并且由于字节是任何数据都支持的数据类型，因此 字节流实际可以处理任意类型的数据。

而对于 字符流，因为 Java 采用 Unicode 编码，Java 字符流处理的即 Unicode 字符，所以在操作文字、国际化等方面，字符流具有优势。

纯文本文件：windows 自带的记事本打开能读懂的文件。（txt，md，xml，lrc ✔；docx，xlsx ✖）

InputStream、OutputStream、Reader、Writer 都是抽象类，不能直接创建他们的对象，需要学习他们的子类。以字节流为例：FileInputStream 表示操作本地文件的字节输入流，File 表示作用，InputStream 表示其继承结构；FileOutputStream 表示操作本地文件的字节输出流；BufferedInputStream 表示带有缓冲区的字节输入流。

字节流

文件字节输出流 FileOutputStream

FileOutputStream：操作本地文件的字节输出流，可以把程序中的数据写到本地文件中。

书写步骤：① 创建字节输出对象，② 写数据，③ 释放资源。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;

public class Test7 {
    public static void main(String[] args) throws IOException {
        File f1 = new File("src\\a.txt");
        if (!f1.exists()){
            f1.createNewFile();
        }

        // 创建对象
        FileOutputStream fos1 = new FileOutputStream("src\\a.txt");
        // FileOutputStream fos1 = new FileOutputStream(new File("src\\a.txt"));
        // 写出数据
        fos1.write(65);
        // 释放资源
        fos1.close();
    }
}

① 创建对象时

FileOutputStream() 中传入的参数可以是字符串表示的路径或 File 对象。

如果 文件不存在 会创建一个新的文件，但要 保证父级路径是存在的。

如果文件已经存在，会清空文件内容重新写出数据。

② 写出数据时

write 方法的参数是 int 类型整数，但实际写到本地文件中的是整数在 ASCII 表上对应的字符。

如果要写数字 97，则可以执行 fos1.write(57); fos1.write(55); 即可得到结果。

③ 释放资源时

每次使用完流之后都要 释放资源，解除资源的占用。

FileOutputStream 写数据的 3 种方式：

方法名称	说明
void write (int b)	一次写一个字节数据
void write (byte[] b)	一次写一个字节数组数据
void write (byte[] b, int off, int len)	一次写一个字节数组的部分数据

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


import java.io.FileOutputStream;
import java.io.IOException;

public class Test8 {
    public static void main(String[] args) throws IOException {
        FileOutputStream fos = new FileOutputStream("src\\a.txt");
        fos.write(65);  // A

        byte[] bytes = {66, 67, 68, 69, 70};    // B C D E F
        fos.write(bytes);

        // void write (byte[] b, int off, int len)
        // off 表示开始位置，len 表示字节个数
        fos.write(bytes, 1, 2);

        fos.close();
    }
}

思考：① FileOutputStream 怎么换行输入？

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23


import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Arrays;

public class Test9 {
    public static void main(String[] args) throws IOException {
        FileOutputStream fos = new FileOutputStream("src\\a.txt");
        String str1 = "Hello World";
        // 每个操作系统换行符不一样
        // windows： \r\n
        // linux：   \n
        // macos：   \r
        // 在 windows 系统中，java 对回车换行进行了优化，只要写一个就行
        String str2 = "\r\n";
        byte[] bytes1 = str1.getBytes();
        byte[] bytes2 = str2.getBytes();
        System.out.println(Arrays.toString(bytes1));
        fos.write(bytes1);
        fos.write(bytes2);
        fos.write(bytes1);
        fos.close();
    }
}

思考：② FileOutputStream 在重新写出时怎么在保存原有的数据？

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


// FileOutputStream 后还可以传递一个参数，默认为 false，当值为 true 表示可以续写
public FileOutputStream(String name) throws FileNotFoundException {
    this(name != null ? new File(name) : null, false);
}

FileOutputStream fos = new FileOutputStream("src\\a.txt", true);
...
if (f1.getName().endsWith(".txt") && f1.length() != 0){
    fos.write(bytes2);
    fos.write(bytes1);
}else {
    fos.write(bytes1);
}

文件字节输入流 FileInputStream

FileInputStream：操作本地文件的字节输入流，可以把本地文件中的数据读取到程序中。

书写步骤：① 创建字节输入对象，② 读数据，③ 释放资源。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


import java.io.FileInputStream;
import java.io.IOException;

public class Test10 {
    public static void main(String[] args) throws IOException {
        FileInputStream fis = new FileInputStream("src\\a.txt");
        int a = fis.read();
        System.out.println((char) a);
        int b = fis.read();
        System.out.println((char) b);
        // 当 read() 方法无法获取文件中的数据时，返回 -1
        fis.close();
    }
}

① 创建对象时

FileInputStream() 中传入的参数可以是字符串表示的路径或 File 对象。

如果文件不存在，直接报错 FileNotFoundException。

② 读取数据时

按字节读取，一次只能读一个字节，读出来的是数据在 ASCII 字符集上对应的数字。

读到文件末尾，read 方法返回 -1。

③ 释放资源时

每次使用完流之后都要 释放资源，解除资源的占用。

FileInputStream 的循环读取：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


import java.io.FileInputStream;
import java.io.IOException;

public class Test11 {
    public static void main(String[] args) throws IOException {
        FileInputStream fis = new FileInputStream("src\\a.txt");
        int a;
        while ((a = fis.read()) != -1){
            System.out.print((char) a);
        }
        fis.close();
    }
}

文件拷贝

文件拷贝核心思想：通过遍历的方式，一边读取数据一边写出数据。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;

public class Test12 {
    public static void main(String[] args) throws IOException {
        // 读数据
        FileInputStream fis = new FileInputStream("src\\note.pdf");
        // 写数据
        FileOutputStream fos = new FileOutputStream("out\\copy.pdf");
        int a;
        while ((a = fis.read()) != -1) {
            fos.write(a);
        }
        // 释放资源规则：先开的流最后再关闭
        fos.close();
        fis.close();
    }
}

如果拷贝的文件比较大，速度会非常慢，因为拷贝的过程中一次只读取一个字节。

如何一次读取多个字节，提高拷贝速度呢？

FileInputStream 一次读取多个字节：

方法名称	说明
public int read()	一次读一个字节数据
public int read(byte[] buffer)	一次读一个字节组数据

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33


import java.io.FileInputStream;
import java.io.IOException;

public class Test13 {
    public static void main(String[] args) throws IOException {
        FileInputStream fis = new FileInputStream("src\\a.txt");
        byte[] bytes = new byte[5];
        // 返回 int 类型的数据，代表本次读取到了多少个数据
        // 同时将读取到的数据存储到 bytes 数组中
        int num = fis.read(bytes);
        System.out.println(num);
        // 将数组里面所有的数据变成字符串并打印输出，bytes 中存储 "Hello"
        // 打印 "Hello"
        String str = new String(bytes);
        System.out.println(str);

        // 第二次遍历 Hello World，bytes 中存储 " Worl"
        // 打印 " Worl"
        int num2 = fis.read(bytes);
        System.out.println(num2);
        String str2 = new String(bytes);
        System.out.println(str2);

        // 第三次遍历 Hello World，bytes 中存储 "dWorl"
        // 添加起始位置 0 索引和打印个数 num3，打印 "d"
        int num3 = fis.read(bytes);
        System.out.println(num3);
        String str3 = new String(bytes, 0, num3);
        System.out.println(str3);

        fis.close();
    }
}

① 在拷贝大文件时，可以通过 创建字节数组 的方式读取数据，数组大小一般用 1024 的整数倍，如 1024 * 1024 * 5，因为字节数组也占用内存，因此在创建时不宜过大。

② 数组读取的时候会尽可能的把数组装满，覆盖上一次读取的结果。

③ 和普通的 read 方法一样，在读取不到数据时同样返回 -1。

③ 可以通过添加起始索引和转换个数，把字节数组的一部分变成字符串。

练习：编写程序将文件夹中的一个 10MB 左右的文件拷贝到另一个文件夹中（需要定义字节数组）。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;

public class Test14 {
    public static void main(String[] args) throws IOException {
        FileInputStream fis = new FileInputStream("src\\note.pdf");
        FileOutputStream fos = new FileOutputStream("out\\copy.pdf");
        byte[] bytes = new byte[1024 * 1024 * 5];
        int len;
        while ((len = fis.read(bytes)) != -1) {
            fos.write(bytes, 0, len);
        }
        fos.close();
        fis.close();
    }
}

字符集

GBK 与 ASCII

乱码：读取到的数据与原始数据不一样时，叫做出现乱码。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


import java.io.FileInputStream;
import java.io.IOException;

public class Test15 {
    public static void main(String[] args) throws IOException {
        // b.txt 文件内容为 "你好。"
        FileInputStream fis = new FileInputStream("src\\b.txt");
        int a;
        while ((a = fis.read()) != -1){
            System.out.print((char) a);
        }
        // 输出 "ä½ å¥½ã ,"
    }
}

在计算机中，任意数据都是以二进制的形式来存储的，如 01100100，一个 0 或 1 叫做 1 bit，我们把 8 个 bit 分为一组，这样一组数据可以存 2^8 = 256 个数据，我们把它叫做字节。字节是计算机中最小的存储单元，而计算机在存储英文时，只需要一个字节就可以了。

英文存储：ASCII

英文只需要一个字节存储，二进制第一位一定是 0 ，不足 8 位时在前面补 0 。

中文存储：GBK

存储英文时：

存储中文时：

汉字需要两个字节才能存储，其中左边的字节叫做高位字节，右边的字节叫做低位字节。高位字节以 1 开头，用于与英文区分，且转换为十进制之后一定为负数，而低位字节可能以 1 开头也有可能以 0 开头。

① GBK 中，一个英文字母一个字节，二进制第一位是 0；

② GBK 中，一个中文汉字两个字节，二进制第一位是 1。

练习1： 以下为 GBK 字符集中编码之后的二进制，请说出有几个中文，几个英文？

10111010 10111010 01100001
01100001 01100010 01100011
10110000 10100010 11100111 11100010 10111010 11000011 11001011 1010011

Unicode

在 Unicode 中，有很多种编码方案，最先提出的编码规则叫做 UTF-16，其中，UTF 是英文 Unicode Transfer Format，即将 Unicode 里的文字进行格式转换的一种方式。在 UTF-16 的编码规则下，文字可以用 2 ~ 4 个字节保存，常转换为 16 个比特位；后来，又提出了 UTF-32，其内文字固定使用 4 个字节保存，这些编码方式造成了极大的内存浪费，因此，最后采用了 UTF-8 的编码规则。

UTF-8 编码规则：使用 1 ~ 4 个字节存储。

其中 ASCII 中的英文使用 1 个字节存储，简体中文采用 3 个字节存储。

例：a 对应的数字为 97，二进制为 110 0001，

使用 UTF-8 编码保存的结果是 0110 0001。

设汉字对应的数字为 27721，二进制为 01101100 01001001，

使用 UTF-8 编码保存的结果是 1110 0110 10 110001 10 001001

练习2： UTF-8 是一个字符集吗？

练习3： 以下为 Unicode 字符集中利用 UTF-8 编码之后的二进制数据，请说出有几个中文，几个英文？

01001010 01100001 01110110 01100001
01100001 01101001 11100100 10111101 10100000 11100101 10010011 10011111

乱码、编码、解码

学习完各种编码方式之后，我们再来思考，为什么会有乱码呢？

① 读取数据时未读完整个汉字（使用字节流读取数据，一次只能读取一个字节）

② 编码和解码的方式不统一（使用 UTF-8 编码，使用 GBK 解码）

解决方式：

① 不要用字节流读取文本文件；

② 编码解码时使用同一个码表，同一个编码方式。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32


package com.company.io;

import java.io.UnsupportedEncodingException;
import java.util.Arrays;

public class Test16 {
    public static void main(String[] args) throws UnsupportedEncodingException {
        /*
            Java中编码的方法
                public byte[] getBytes()                         	使用默认方式进行编码
                public byte[] getBytes(stringI charsetName)         使用指定方式进行编码
            Java中解码的方法
                string(byte[] bytes)                                使用默认方式进行解码
                String(byte[] bytes, string charsetName)            使用指定方式进行解码
        */
        
        // 编码
        String str = "Hello你好";
        byte[] bytes = str.getBytes();	// 编码后的字节数据，在存储时再以二进制的形式存储
        System.out.println(Arrays.toString(bytes));

        byte[] gbks = str.getBytes("GBK");
        System.out.println(Arrays.toString(gbks));
        
        // 解码
        String trans1 = new String(bytes);
        System.out.println(trans1);

        String trans2 = new String(bytes, "GBK");   // 尝试用GKB解码，出现乱码
        System.out.println(trans2);
    }
}

字符流

字符流的底层就是字节流，是在字节流的基础上加上了字符集的概念。

输入流：一次读一个字节，遇到中文时一次读多个字节。

输出流：底层会把数据按照指定的编码方式进行编码，编程字节再写到文件中。

文件字符输入流 FileReader

FileReader：从纯文本文件中读取数据。

书写步骤：① 创建字符输入对象，② 读取数据，③ 释放资源。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


package com.company.io;

import java.io.FileReader;
import java.io.IOException;

public class Test17 {
    public static void main(String[] args) throws IOException {
        // 创建对象
        FileReader fr = new FileReader("src\\c.txt");

        // 定义变量存储fr.read()的返回值避免重复调用导致跳读
        int ch;
        while ((ch = fr.read()) != -1){
            // 使用强制类型转换输出字符类型
            System.out.println((char) ch);
        }

        // 释放资源
        fr.close();
    }
}

① 创建对象时

FileReader() 中传入的参数可以是字符串表示的路径或 File 对象。

如果文件不存在，直接报错 FileNotFoundException。

② 写出数据时

按字节进行读取，遇到中文则一次读多个字节，读取后解码并返回一个整数；读到文件末尾返回 -1。

方法底层会对读取到的数据进行解码并转换为 十进制，将十进制作为返回值返回。

返回的十进制代表了该汉字或字符在 字符集上对应的数字；可以通过强制类型转换输出字符。

③ 释放资源时

每次使用完流之后都要 释放资源，解除资源的占用。

FileReader 读取数据的 2 种方式：

方法名称	说明
public int read()	读取数据，读到末尾返回 -1
public int read(char[] buffer)	读取多个数据，读到末尾返回 -1

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30


package com.company.io;

import java.io.FileReader;
import java.io.IOException;

public class Test17 {
    public static void main(String[] args) throws IOException {
        FileReader fr = new FileReader("src\\c.txt");

        //        int ch;
        //        while ((ch = fr.read()) != -1){
        //            System.out.println((char) ch);
        //        }
        
        // 字节流 byte 字符流 char
        char[] chars = new char[2];
        int len;
        // 读取数据，解码，强转三步合并了，把强转之后的字符放到了数组当中
        // 空参read方法 + 强制类型转换
        while ((len = fr.read(chars)) != -1) {
            // 转换为字符串，输出每次chars数组里的元素
            System.out.println(new String(chars, 0, len));
        //            for (char aChar : chars) {
        //                System.out.print(aChar);
        //            }
        }

        fr.close();
    }
}

文件字符输出流 FileWriter

FileWriter：操作本地文件的字符输出流，可以把程序中的数据写到本地文件中。

书写步骤：① 创建字符输出对象，② 写数据，③ 释放资源。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28


package com.company.io;

import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.IOException;

public class Test18 {
    public static void main(String[] args) throws IOException {
        FileOutputStream fos = new FileOutputStream("src\\a.txt");
        fos.write(25105);
        fos.close();

        FileWriter fw = new FileWriter("src\\b.txt", true);
        // 根据字符集的编码方式进行编码，把编码后的数据写到文件中
        // IDEA 默认使用 UTF-8 进行编码
        fw.write(25105);

        // 使用 str 字符串写出
        String str = "\r\n今天天气真不错。";
        fw.write(str);

        //使用 char 数组写出
        char chars[] = {'\r', '\n', 'h', 'i', '你', '好'};
        fw.write(chars);

        fw.close();
    }
}

① 创建对象时

FileWriter() 中传入的参数可以是字符串表示的路径或 File 对象。

如果 文件不存在 会创建一个新的文件，但要 保证父级路径是存在的。

如果文件已经存在，则会清空文件，如果不想清空可以打开续写开关。

② 写出数据时

如果 write 方法的参数是整数，但是实际上写到本地文件中的是整数在字符集上对应的字符。

③ 释放资源时

每次使用完流之后都要 释放资源，解除资源的占用。

FileWriter 的 4 种构造方法：

方法名称	说明
public Filewriter(File file)	创建字符输出流关联本地文件
public Filewriter(string pathname)	创建字符输出流关联本地文件
public Filewriter(File file, boolean append)	创建字符输出流关联本地文件，续写
public Filewriter(string pathname, boolean append)	创建字符输出流关联本地文件，续写

FileWriter 写出数据的 5 种方式：

方法名称	说明
void write(int c)	写出一个字符
void write(String str)	写出一个字符串
void write(string str, int off, int len)	写出一个字符串的一部分
void write(char[ ] cbuf)	写出一个字符数组
void write(char[] cbuf, int off, int len)	写出字符数组的一部分

小结

笔记

1. 字节流和字符流的使用场景

字节流：拷贝 任意类型 的文件。

字符流：① 读取 纯文本文件 中的数据；② 往纯文本文件中写出数据。

2. 书写步骤

① 创建字节 / 字符输出对象；

② 读 / 写数据；

③ 释放资源。

3. 字符集

字符集种类：

① 中文 —— GBK；

② 英文 —— ASCII；

③ 万国码 —— Unicode（UTF-8 编码）

出现乱码的原因是：

① 读取数据时未读完整个汉字（使用字节流读取数据，一次只能读取一个字节）

② 编码和解码的方式不统一（使用 UTF-8 编码，使用 GBK 解码）

中文占用 3 个字节，二进制以 1 开头；英文占用 1 个字节，二进制以 0 开头。

4. 常用语句和方法

File 文件：

1
2
3
4
5
6
7
8


File file = new File(“pathname”);
public boolean isFile()
public boolean exists()
public long length()
public String getName()
public boolean createNewFile()
public boolean mkdirs()
public File[] listFiles()

字节流中：

1
2
3
4


FileOutputStream fos = new FileOutputStream(“pathname”);
FileInputStream fis = new FileInputStream(“pathname”);
void write(byte[] b, int off, int len)
public int read(byte[] buffer)	

字符流中：

1
2
3
4


FileWriter fw = new FileWriter(“pathname”);
FileReader fr = new FileReader(“pathname”);
void write(string str, int off, int len)
public int read(char[] buffer)

拷贝文件夹练习

拷贝一个文件夹，考虑子文件夹。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45


package com.company.io;

import java.io.*;

public class Pra1 {
    public static void main(String[] args) throws IOException {
        // 源文件地址
        File f1 = new File("C:\\JavaStudy");
        // 拷贝目的地
        File f2 = new File("Copy");
        copy(f1, f2);
    }

    // 创建一个方法用来拷贝文件夹，参数1：源文件地址；参数2：拷贝目的地
    public static void copy(File f1, File f2) throws IOException {
        // 需要先把拷贝目的地的文件创建出来
        f2.mkdirs();
        /*
            先进入源文件地址，获取所有文件的地址，存到一个数组中
            遍历数组，判断是否是文件，如果是文件则拷贝，是文件夹则递归
         */
        File[] files = f1.listFiles();
        if (files != null) {
            for (File file : files) {
                if (file.isFile()) {
                    // 如果是文件，则拷贝（用字节流）
                    // 在拷贝时，是从文件1拷贝到文件2中，因此需要创建一个文件1的同名文件文件2
                    FileInputStream fis = new FileInputStream(file);
                    // 父级路径是f2文件夹，子级路径是创建一个新的且与file同名的文件
                    FileOutputStream fos = new FileOutputStream(new File(f2, file.getName()));
                    byte[] bytes = new byte[1024];
                    int len;
                    while ((len = fis.read(bytes)) != -1) {
                        // 写出bytes数组内的元素，从0索引开始，一共len个元素
                        fos.write(bytes, 0, len);
                    }
                    fos.close();
                    fis.close();
                } else {
                    copy(file, new File(f2, file.getName()));
                }
            }
        }
    }
}

其他流

字节流、字符流都是无缓冲的输入、输出流，每次的读、写操作都会交给操作系统来处理。对系统的性能造成很大的影响，因为每次操作都可能引发磁盘硬件的读、写或网络的访问，这些磁盘硬件读、写和网络访问会占用大量系统资源，影响效率。

装饰器模式

后续的缓冲流、转换流、打印流、压缩流等内容，底层都遵循着一个相同的设计模式——装饰器模式。

装饰器模式就是通过方法，将对象进行包装。比如文件字节输出流 FileOutputStream 放在缓冲字节输出流 BufferedOutputStream 的构造方法中时，就变成了 BufferedOutputStream ；再把缓冲字节输出流 BufferedOutputStream 放在 DataOutputStream 的构造方法中，就变成了 DataOutputStream。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


package com.company.io;

import java.io.*;

public class OthDemo1 {
    public static void main(String[] args) throws FileNotFoundException {
        // 创建文件字节输出流对象
        FileOutputStream fos = new FileOutputStream("src\\a.txt");
        // 放到 BufferedOutputStream 的构造方法中
        BufferedOutputStream bos = new BufferedOutputStream(fos);
        // 放到 DataOutputStream 的构造方法中
        DataOutputStream dos = new DataOutputStream(bos);
    }
}

虽然外观都是 OutputStream，但是功能得到了增强，提供了更加丰富的API。

缓冲流

字节缓冲流 BufferedInputStream & BufferedOutputStream

为了提高读写效率，我们要学习缓冲流；缓冲流的目的是让原字节流、字符流新增缓冲的功能。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23


package com.company.io;

import java.io.*;

public class OthDemo2 {
    public static void main(String[] args) throws IOException {
        // 创建缓冲流对象
        BufferedInputStream bis = 
            new BufferedInputStream(new FileInputStream("src\\a.txt"));
        BufferedOutputStream bos = 
            new BufferedOutputStream(new FileOutputStream("src\\b.txt"));

        // 循环读取并写出到文件中
        int len;
        while ((len = bis.read()) != -1){
            bos.write(len);
        }

        // 释放资源
        bos.close();
        bis.close();
    }
}

同样，可以通过创建数组让缓冲流一次读取多个字节。

1
2
3
4
5


byte[] bytes = new byte[1024];
int len;
while ((len = bis.read(bytes)) != -1) {
    bos.write(bytes, 0, len);
}

字符缓冲流 BufferedReader & BufferedWriter

字符缓冲流特有的方法

字符缓冲输入流：读取一行数据，如果没有数据可读了，会返回 null。

方法读到回车换行会结束，但不会把回车换行读到内存当中。（使用 print 输出在同一行）

1

public String readLine()

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23


package com.company.io;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;

public class OthDemo4 {
    public static void main(String[] args) throws IOException {
        BufferedReader br = new BufferedReader(new FileReader("src\\a.txt"));

        //        String str1 = br.readLine();
        //        String str2 = br.readLine();
        //        System.out.println(str1);
        //        System.out.println(str2);

        String line;
        while ((line = br.readLine()) != null){
            System.out.println(line);
        }

        br.close();
    }
}

字符缓冲输出流：跨平台的换行。

方法底层先判断操作系统的种类（如果是 windows 则输出 \r\n，如果是 macos 则输出 \r）。

1

public void newLine()

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


package com.company.io;

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;

public class OthDemo5 {
    public static void main(String[] args) throws IOException {
        // 路径文件不存在会创建新文件，重新运行会清空文件，需要开启续写（注意：续写是 FileWriter 的功能）
        BufferedWriter bw = new BufferedWriter(new FileWriter("src\\d.txt", true));

        bw.write("你好");
        //        bw.write("\r\n");   // 如果使用 \r\n 换行，其他平台可能无法正确表达
        bw.newLine();
        bw.write("今天天气真不错");
        bw.newLine();

        bw.close();
    }
}

转换流 InputStreamReader & OutputStreamWriter

转换流是字符流的一员，是 字符流和字节流之间的桥梁，分为转换输入流 InputStreamReader 和转换输出流 OutputStreamWriter。

应用场景：① 指定字符集读写（在 JDK11 后淘汰了）；② 字节流中想要使用字符流中的方法。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25


package com.company.io;

import java.io.*;
import java.nio.charset.Charset;

public class ConvertDemo1 {
    public static void main(String[] args) throws IOException {
        // 文件另存为默认编码改为ANSI，使用GBK编码
        InputStreamReader isr = new InputStreamReader(new FileInputStream("C:\\JavaStudy\\Convert.txt"), "GBK");
        // 读取数据，用字符流的方式读取
        int len;
        while ((len = isr.read()) != -1){
            System.out.print((char) len);
        }
        isr.close();

        // 现有替代方法
        FileReader fr = new FileReader("C:\\JavaStudy\\Convert.txt", Charset.forName("GBK"));
        int ch;
        while ((ch = fr.read()) != -1){
            System.out.print((char) ch);
        }
        fr.close();
    }
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


package com.company.io;

import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.nio.charset.Charset;

public class ConvertDemo2 {
    public static void main(String[] args) throws IOException {
        OutputStreamWriter osw = new OutputStreamWriter(new FileOutputStream("e.txt"), "GBK");
        // 使用GBK编码写出，IDEA默认使用UTF-8编码，因此显示乱码
        osw.write("你好");
        osw.close();

        // 现有替代方案
        FileWriter fw = new FileWriter("e.txt", Charset.forName("GBK"));
        fw.write("你好");
        fw.close();
    }
}

练习：将本地文件中的 GBK 文件，转换为 UTF-8。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28


package com.company.io;

import java.io.*;
import java.nio.charset.Charset;

public class ConvertDemo3 {
    public static void main(String[] args) throws IOException {
        // JDK11以前
        InputStreamReader isr = new InputStreamReader(new FileInputStream("C:\\JavaStudy\\convert.txt"), "GBK");
        OutputStreamWriter osw = new OutputStreamWriter(new FileOutputStream("src\\convert.txt"), "UTF-8");
        int a;
        while ((a = isr.read()) != -1){
            osw.write(a);
        }
        osw.close();
        isr.close();

        // 替代方案
        FileReader fr = new FileReader("C:\\JavaStudy\\convert.txt", Charset.forName("GBK"));
        FileWriter fw = new FileWriter("src\\convert.txt", Charset.forName("UTF-8"));
        int b;
        while ((b = fr.read()) != -1){
            fw.write(b);
        }
        fw.close();
        fr.close();
    }
}

练习：利用字节流读取文件中的数据，但是每次需要读取一整行，且不能出现乱码。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25


package com.company.io;

import java.io.*;

public class LaDemo1 {
    public static void main(String[] args) throws IOException {
        // 要读取文件，从外往里，因此创建字节输入流对象
        FileInputStream fis = new FileInputStream("src\\c.txt");
        // 要不能出现乱码，因此需要使用转换流（在JDK11后已经可以直接使用字符输入流FileReader）
        InputStreamReader isr = new InputStreamReader(fis);
        // 要一次读取一行数据，因此需要使用字符缓冲流提高读写效率
        BufferedReader br = new BufferedReader(isr);

        // BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream("src\\c.txt")));

        int a;
        while ((a = br.read()) != -1){
            System.out.print((char) a);
        }
        
        br.close();
        // isr.close();
        // fis.close();
    }
}

数据流 DataOutputStream & DataInputStream

数据流 DataStream 允许流直接操作基本数据类型和字符串。

常用的方法有

① dos.writeUTF(); ② dis.readUTF(); ③ dos.writeInt/Double();

④ dis.readInt/Double(); ⑤ dis.readByte(); ⑥ dos.writeChar;

⑦ dis.readChar; ⑧ dos.writeBoolean; ⑨ dis.readBoolean(); ⑩ dos.writeByte();

注意：读取顺序要和写入顺序一致。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31


package com.company.io;

import java.io.*;

public class pra2 {
    public static void main(String[] args) {
        try {
            DataOutputStream dos = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(new File("src\\a.txt"))));
            dos.writeBoolean(true);
            dos.writeChar('A');
            dos.writeDouble(12.3);
            dos.writeInt(4);
            dos.writeUTF("this is");
            dos.writeUTF("DataOutputStream");
            dos.close();
            DataInputStream dis = new DataInputStream(new BufferedInputStream(new FileInputStream(new File("src\\a.txt"))));
            boolean t = dis.readBoolean();
            char y = dis.readChar();
            double x = dis.readDouble();
            int d = dis.readInt();
            String b = dis.readUTF();
            String c = dis.readUTF();
            System.out.println(t + " " + y + " " + x + " " + d + " " + b + " " + c);
            dis.close();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

XML

XML 简介

XML是可拓展标记语言，可以用来 存储数据、系统配置、数据交换。

① XML 的标签可以自定义，元素之间可以嵌套（但不能交叉）。

② XML 文档总是以 XML 声明开始，即告知处理程序，本文档是一个 XML 文档。

③ 在 XML 声明中，通常包括版本、编码等信息，以结尾。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68


<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
  <component name="ChangeListManager">
    <list default="true" id="84e1f6fc-df86-47b2-8556-49f0336f0415" name="Changes" comment="" />
    <option name="SHOW_DIALOG" value="false" />
    <option name="HIGHLIGHT_CONFLICTS" value="true" />
    <option name="HIGHLIGHT_NON_ACTIVE_CHANGELIST" value="false" />
    <option name="LAST_RESOLUTION" value="IGNORE" />
  </component>
  <component name="FileTemplateManagerImpl">
    <option name="RECENT_TEMPLATES">
      <list>
        <option value="Python Script" />
      </list>
    </option>
  </component>
  <component name="MarkdownSettingsMigration">
    <option name="stateVersion" value="1" />
  </component>
  <component name="ProjectId" id="28yhW2JRvVXqQOKiHkafMq6ldRg" />
  <component name="ProjectViewState">
    <option name="hideEmptyMiddlePackages" value="true" />
    <option name="showLibraryContents" value="true" />
  </component>
  <component name="RunManager" selected="Python.demo">
    <configuration name="core" type="PythonConfigurationType" factoryName="Python" temporary="true" nameIsGenerated="true">
      <module name="pythonProject" />
      <option name="INTERPRETER_OPTIONS" value="" />
      <option name="PARENT_ENVS" value="true" />
      <envs>
        <env name="PYTHONUNBUFFERED" value="1" />
      </envs>
      <option name="SDK_HOME" value="" />
      <option name="WORKING_DIRECTORY" value="$PROJECT_DIR$/noknow-python-master/noknow" />
      <option name="IS_MODULE_SDK" value="true" />
      <option name="ADD_CONTENT_ROOTS" value="true" />
      <option name="ADD_SOURCE_ROOTS" value="true" />
      <option name="SCRIPT_NAME" value="$PROJECT_DIR$/noknow-python-master/noknow/core.py" />
      <option name="PARAMETERS" value="" />
      <option name="SHOW_COMMAND_LINE" value="false" />
      <option name="EMULATE_TERMINAL" value="false" />
      <option name="MODULE_MODE" value="false" />
      <option name="REDIRECT_INPUT" value="false" />
      <option name="INPUT_FILE" value="" />
      <method v="2" />
    </configuration>
    <recent_temporary>
      <list>
        <item itemvalue="Python.demo" />
        <item itemvalue="Python.data" />
        <item itemvalue="Python.core" />
        <item itemvalue="Python.test" />
        <item itemvalue="Python.setup" />
      </list>
    </recent_temporary>
  </component>
  <component name="SpellCheckerSettings" RuntimeDictionaries="0" Folders="0" CustomDictionaries="0" DefaultDictionary="application-level" UseSingleDictionary="true" transferred="true" />
  <component name="TaskManager">
    <task active="true" id="Default" summary="Default task">
      <changelist id="84e1f6fc-df86-47b2-8556-49f0336f0415" name="Changes" comment="" />
      <created>1652194761520</created>
      <option name="number" value="Default" />
      <option name="presentableId" value="Default" />
      <updated>1652194761520</updated>
    </task>
    <servers />
  </component>
</project>

标签可以有属性（属性值要加引号）。属性是对标签的进一步描述和说明，一个标签可以有多个属性，每个属性都有自己的名字和值，属性是标签的一部分。

解析XML的技术主要有：

① DOM 即 org.w3c.dom，W3C 推荐的用于使用 DOM 解析 XML 文档的接口

② SAX 即 org.xml.sax，用 SAX 解析 XML 文档的接口

DOM 解析 XML

DOM 把一个 XML 文档映射成一个分层对象模型，而这个层次的结构，是一棵根据 XML 文档生成的节点树。DOM 在对 XML 文档进行分析之后，不管这个文档有多简单或多复杂，其中的信息都会被转化成一棵对象节点树。在这棵节点树中，有一个根节点，其他所有的节点都是根节点的子节点。节点树生成之后，就可以通过 DOM 接口访问、修改、添加、删除树中的节点或内容了。

DOM解析过程：

① 通过 getInstance() 创建 DocumentBuilderFactory，即解析器工厂

② 通过 build() 创建 DocumentBuilder

1
2
3
4
5
6


import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
...
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = dbf.newDocumentBuilder();	

③ 解析文件得到 Document 对象

④ 通过 NodeList，开始解析结点（标签）

拓展：解析一个 xml 文件，将获得到的所有数据存在 List 集合中并返回。

xml 文件：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


<?xml version="1.0" encoding="UTF-8"?>
<friends>
  <friend id="1">
    <name>Alice</name>
    <age>18</age>
    <gender>female</gender>
  </friend>
  <friend id="2">
    <name>Bob</name>
    <age>19</age>
    <gender>male</gender>
  </friend>
  <friend id="3">
    <name>Lisa</name>
    <age>17</age>
    <gender>female</gender>
  </friend>
</friends>

创建 FriendEx 类，根据 xml 文件的内容定义属性：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60


package com.company.io;

public class FriendEx {
    private int id;
    private String name;
    private int age;
    private String gender;

    @Override
    public String toString() {
        return "friends{" +
                "id=" + id +
                ", name='" + name + '\'' +
                ", age=" + age +
                ", gender='" + gender + '\'' +
                '}';
    }

    public FriendEx() {
    }

    public FriendEx(int id, String name, int age, String gender) {
        this.id = id;
        this.name = name;
        this.age = age;
        this.gender = gender;
    }

    public int getId() {
        return id;
    }

    public void setId(int id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public int getAge() {
        return age;
    }

    public void setAge(int age) {
        this.age = age;
    }

    public String getGender() {
        return gender;
    }

    public void setGender(String gender) {
        this.gender = gender;
    }
}

解析 xml 文件，定义一个方法，传入 xml 文件的 String 类型的地址字符串，输出一个 List：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70


package com.company.io;

import com.company.collection.Friend;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class XmlDemo1 {
    public static void main(String[] args) throws ParserConfigurationException, IOException, SAXException {
        List<FriendEx> friends = parseXmlToList("src\\friends.xml");
        System.out.println(friends);
    }
    
    public static List<FriendEx> parseXmlToList(String file) throws ParserConfigurationException, IOException, SAXException {
        List<FriendEx> friends = new ArrayList<>();
        // 解析friend.xml，输出一个List<FriendEx>，集合名为friends
        // 创建一个解析器工厂实例，通过解析器获取内容
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = dbf.newDocumentBuilder();

        // 解析为一个可以被java处理的document对象
        Document document = builder.parse(new FileInputStream(file));
        // 获取所有文档的结点
        Element element = document.getDocumentElement();
        // 根据friend获取所有的friends
        NodeList nodeList = element.getElementsByTagName("friend");

        for (int i = 0;i< nodeList.getLength();i++){
            FriendEx friend = new FriendEx();
            // 获取每个<friend>，里面有id属性，name，age，gender子节点
            Element friendElement = (Element)nodeList.item(i);
            //            Node node = nodeList.item(i);
            // 获取friend的属性id
            int id = Integer.parseInt(friendElement.getAttribute("id"));
            friend.setId(id);

            // 获取friend的子节点childNodes
            NodeList childNodes = friendElement.getChildNodes();
            // 遍历子节点
            for (int j = 0;j<childNodes.getLength();j++){
                Node friendChildNode = childNodes.item(j);
                // 子节点可能是回车空格等
                if (friendChildNode.getNodeType() == Node.ELEMENT_NODE){
                    if (friendChildNode.getNodeName().equals("name")){
                        String name = friendChildNode.getFirstChild().getNodeValue();
                        friend.setName(name);
                    }else if (friendChildNode.getNodeName().equals("age")){
                        int age = Integer.parseInt(friendChildNode.getFirstChild().getNodeValue());
                        friend.setAge(age);
                    }else {
                        String gender = friendChildNode.getFirstChild().getNodeValue();
                        friend.setGender(gender);
                    }
                }
            }
            friends.add(friend);
        }
        return friends;
    }
}

Node 常用方法

方法名称	说明
NodeList getChildNodes()	返回此节点的所有子节点的 NodeList
Node getFirstChild()	返回此节点的第一个子节点
Node getLastChild()	返回此节点的最后一个子节点
Node getNextSibling()	返回此节点之后的节点
Node getPreviousSibling()	返回此节点之前的节点
Document getOwnerDocument()	返回与此节点相关的 Document 对象
Node getParentNode()	返回此节点的父节点
short getNodeType()	返回此节点的类型
String getNodeName()	根据此节点类型，返回节点名称
String getNodeValue()	根据此节点类型，返回节点值
String getTextContent()	返回此节点的文本内容
void setNodeValue(String nodeValue)	根据此节点类型，设置节点值
void setTextContent(String textContent)	设置此节点的文本内容
Node appendChild(Node newChild)	将节点 newChild 添加到此节点的子节点列表末尾
Node insertBefore(Node newChild,Node refChild)	在现有子节点 refChild 之前插入节点 newChild
Node removeChild(Node oldChild)	从子节点列表中移除 oldChild 指示的子节点，并将其返回
Node replaceChild(Node newChild, oldChild)	将子节点列表中的子节点 oldChild 替换为 newChild，并返回 oldChild 节点

Document 常用方法

方法名称	说明
Element getDocumentElement()	返回代表这个 DOM 树根节点的 Element 对象
NodeList getElementsByTagName(String tagname)	按文档顺序返回包含在文档中且具有给定标记名称的所有 Element 的 NodeList

NodeList常用方法

方法名称	说明
int getLength()	返回有序集合中的节点数
Node item(int index)	返回有序集合中的第 index 个项

SAX 解析 XML

SAX，全称 Simple API for XML，既是一种接口，也是一种软件包。它是一种 XML 解析的替代方法。SAX 不同于 DOM 解析，它逐行扫描文档，一边扫描一边解析。由于应用程序只是在读取数据时检查数据，因此不需要将数据存储在内存中，这对于大型文档的解析是个巨大优势。

SAX 是事件驱动的。通过继承 DefaultHandler 类，重写五个关键方法实现解析。

方法名称	说明
startDocument()	开始文档的标志
endDocument()	结束文档的标志
startElement(String uri, String localName, String qName, Attributes attributes)	通过比较 localName，找到指定的元素，打开元素
endElement(String uri, String localName, String qName)	通过比较 localName 找到指定的元素，结束元素
characters(char[] ch, int start, int length)	解析每个元素时调用的方法

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53


/**
* @author lastwhisper
* @desc 每当遇到起始标签时调用
* @param uri xml文档的命名空间
* @param localName 标签的名字
* @param qName 带命名空间的标签的名字
* @param attributes 标签的属性集
* @param ch 当前读取到的TextNode(文本节点)的字节数组
* @param start 字节开始的位置，为0则读取全部
* @param length 当前TextNode的长度
* @return void
*/

@Override
public void startDocument() throws SAXException {
    System.out.println("books2文档开始解析");
}

@Override
public void endDocument() throws SAXException {
    System.out.println("books2文档结束解析");
}

@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
    if (qName.equals("book")) {
        for(int i=0;i<attributes.getLength();i++){
            System.out.println("编号："+attributes.getValue(i));
        }
    }
    this.tagName = qName;
}

public void endElement(String uri, String localName, String qName) throws SAXException {
    if("book".equals(localName)){}
    this.tagName = null;
}

@Override
public void characters(char[] ch, int start, int length) throws SAXException {
    if (this.tagName != null) {
        String data = new String(ch, start, length);
        if (this.tagName.equals("bookname")) {
            System.out.println("书名："+data);
        }
        if (this.tagName.equals("bookauthor")) {
            System.out.println("作者："+data);
        }
        if (this.tagName.equals("bookprice")) {
            System.out.println("价格："+data);
        }
    }
}

练习

try catch 异常捕获

如果子类异常块放在父类异常块后面，就会报编译错误。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


try {
    int[] a = {1,2,3};
    System.out.print(a[3]);
    System.out.print(1);
} catch(Exception e) {
    System.out.print(2);
    System.exit(0);
} finally {
    System.out.print(3);
}

不同于 return，System.exit(0) 的优先级高于 finally，在前面遇到会直接退出程序。

system.exit(0)：程序正常执行结束退出，将整个JVM虚拟机里的内容全部关闭。system.exit(1)：程序非正常退出，就是说无论程序正在执行与否，都退出。

异常向外抛出，再被外部 try catch 接受，会造成死循环

集合

1
2
3
4
5
6
7
8


ArrayList<String> a = new ArrayList<String>();
        a.add(true);
        a.add(123);
        a.add("abc");
        System.out.print(a);

//执行后，控制台输出为？编译错误
//集合定义时加了泛型后，就不能添加不匹配泛型的元素。

1
2
3
4
5
6
7


List a = new ArrayList();
        a.add(1);
        a.add(2);
        a.add(3);
        a.remove(1);
        System.out.print(a);
//执行后，控制台输出为？ 1 3

ArrayList 有 2 个删除方法：a.remove(Object o); 和 a.remove(int index); 那么这里的 1 到底是匹配 Object 还是 int 类型呢？我们考虑一下这两个方法的来历就行了。 a.remove(Object o); 是父接口的方法，a.remove(int index); 是子类重写的方法，所以这里应该是调用子类重写的方法。

1
2
3
4
5
6


Set ts = new TreeSet();
        ts.add("zs");
        ts.add("ls");
        ts.add("ww");
        System.out.print(ts);
//执行后，控制台输出为？

TreeSet 对于字符串来说默认按照字典升序进行排序，所以答案为：[ls, ww, zs]

IO 和 XML

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


//假设文件 c:/a.txt 的内容为 abc
//以下代码
try {
        File f = new File("c:/a.txt");
        System.out.print(f.length());
        OutputStream out = new FileOutputStream(f);
        System.out.print(f.length());
        out.write(97);
        System.out.print(f.length());
        out.close();
        } catch (FileNotFoundException e) {
        e.printStackTrace();
        } catch (IOException e) {
        e.printStackTrace();
        }
//执行后，控制台输出为？301

File 对象 new 出来后，f.length() 返回值为 3。

FileOutputStream 对象 new 出来后，由于默认方法是覆盖已经存在的文件，所以 f.length() 返回值为 0，如果想不覆盖，应该使用 new FileOutputStream(f,false);。

out.write(97) 写入字母 a 后，f.lenght() 返回值为 1。

1
2
3
4
5


if(node2 instanceof Element){
    String string = node2.getNodeName();
    String ste = node2.getTextContent();
    System.out.println(string + " " + ste);
}

使用 org.w3c.dom.Node 的进行解析时，它会将你的回车也作为一个节点，在你的代码中打印 str.getLenth(); 得到的数值肯定比写定的节点数要多。应该将文件中多余的空格和回车都去掉。

如果：node2 instanceof Text，则输出：#text

如果：node2 instanceof Element，则输出：标签名