Google Protocol Buffers 入门(protobuf tutorial译文)

 protobuf  Google Protocol Buffers 入门(protobuf tutorial译文)已关闭评论
6月 122016
 

分享好文, 转自:http://www.cnblogs.com/shitouer/archive/2013/04/10/google-protocol-buffers-tutorial.html , google-protocol-buffer-tutorial的中文翻译文章!


1. 前言

这篇入门教程是基于Java语言的,这篇文章我们将会:

  1. 创建一个.proto文件,在其内定义一些PB message
  2. 使用PB编译器
  3. 使用PB Java API 读写数据

这篇文章仅是入门手册,如果想深入学习及了解,可以参看: Protocol Buffer Language GuideJava API ReferenceJava Generated Code Guide, 以及Encoding Reference

2. 为什么使用Protocol Buffers

接下来用“通讯簿”这样一个非常简单的应用来举例。该应用能够写入并读取“联系人”信息,每个联系人由name,ID,email address以及contact photo number组成。这些信息的最终存储在文件中。

如何序列化并检索这样的结构化数据呢?有以下解决方案:

  1.  使用Java序列化(Java Serialization)。这是最直接的解决方式,因为该方式是内置于Java语言的,但是,这种方式有许多问题(Effective Java 对此有详细介绍),而且当有其他应用程序(比如C++ 程序及Python程序书写的应用)与之共享数据的时候,这种方式就不能工作了。
  2. 将数据项编码成一种特殊的字符串。例如将四个整数编码成“12:3:-23:67”。这种方法简单且灵活,但是却需要编写独立的,只需要用一次的编码和解码代码,并且解析过程需要一些运行成本。这种方式对于简单的数据结构非常有效。
  3. 将数据序列化为XML。这种方式非常诱人,因为易于阅读(某种程度上)并且有不同语言的多种解析库。在需要与其他应用或者项目共享数据的时候,这是一种非常有效的方式。但是,XML是出了名的耗空间,在编码解码上会有很大的性能损耗。而且呢,操作XML DOM数非常的复杂,远不如操作类中的字段简单。

Protocol Buffers可以灵活,高效且自动化的解决该问题,只需要:

  1. 创建一个.proto 文件,描述希望数据存储结构
  2. 使用PB compiler 创建一个类,该类可以高效的,以二进制方式自动编码和解析PB数据

该生成类提供组成PB数据字段的getter和setter方法,甚至考虑了如何高效的读写PB数据。更厉害的是,PB友好的支持字段拓展,拓展后的代码,依然能够正确的读取原来格式编码的数据。

3. 定义协议格式

首先需要创建一个.proto文件。非常简单,每一个需要序列化的数据结构,编码一个PB message,然后为message中的字段指明一个名字和类型即可。该“通讯簿”的.proto 文件addressbook.proto定义如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
package tutorial;
 
option java_package = “com.example.tutorial”;
option java_outer_classname = “AddressBookProtos”;
 
message Person {
  required string name = 1;
  required int32 id = 2;
  optional string email = 3;
 
  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }
 
  message PhoneNumber {
    required string number = 1;
    optional PhoneType type = 2 [default = HOME];
  }
 
  repeated PhoneNumber phone = 4;
}
message AddressBook {
  repeated Person person = 1;
}

可以看到,语法非常类似Java或者C++,接下来,我们一条一条来过一遍每句话的含义:

  • .proto文件以一个package声明开始。该声明有助于避免不同项目建设的命名冲突。Java版的PB,在没有指明java_package的情况下,生成的类默认的package即为此package。这里我们生命的java_package,所以最终生成的类会位于com.example.tutorial package下。这里需要强调一下,即使指明了java_package,我们建议依旧定义.proto文件的package。
  • 在package声明之后,紧接着是专门为java指定的两个选项:java_package 以及 java_outer_classname。java_package我们已经说过,不再赘述。java_outer_classname为生成类的名字,该类包含了所有在.proto中定义的类。如果该选项不显式指明的话,会按照驼峰规则,将.proto文件的名字作为该类名。例如“addressbook.proto”将会是“Addressbook”,“address_book.proto”即为“AddressBook”
  • java指定选项后边,即为message定义。每个message是一个包含了一系列指明了类型的字段的集合。这里的字段类型包含大多数的标准简单数据类型,包括bool,int32,float,double以及string。Message中也可以定义嵌套的message,例如“Person” message 包含“PhoneNumber” message。也可以将已定义的message作为新的数据类型,例如上例中,PhoneNumber类型在Person内部定义,但他是phone的type。在需要一个字段包含预先定义的一个列表的时候,也可以定义枚举类型,例如“PhoneType”。
  • 我们注意到, 每一个message中的字段,都有“=1”,“=2”这样的标记,这可不是初始化赋值,该值是message中,该字段的唯一标示符,在二进制编码时候会用到。数字1~15的表示需求少于一个字节,所以在编码的时候,有这样一个优化,你可以用1~15标记最常使用或者重复字段元素(repeated elements)。用16或者更大的数字来标记不太常用的可选元素。再重复字段中,每一个元素都需重复编码标签数字,所以,该优化对重复字段最佳(repeat fileds)。

message的没一个字段,都要用如下的三个修饰符(modifier)来声明:

  1. required:必须赋值,不能为空,否则该条message会被认为是“uninitialized”。build一个“uninitialized” message会抛出一个RuntimeException异常,解析一条“uninitialized” message会抛出一条IOException异常。除此之外,“required”字段跟“optional”字段并无差别。
  2. optional:字段可以赋值,也可以不赋值。假如没有赋值的话,会被赋上默认值。对于简单类型,默认值可以自己设定,例如上例的PhoneNumber中的PhoneType字段。如果没有自行设定,会被赋上一个系统默认值,数字类型会被赋为0,String类型会被赋为空字符串,bool类型会被赋为false。对于内置的message,默认值为该message的默认实例或者原型,即其内所有字段均为设置。当获取没有显式设置值的optional字段的值时,就会返回该字段的默认值。
  3. repeated:该字段可以重复任意次数,包括0次。重复数据的顺序将会保存在protocol buffer中,将这个字段想象成一个可以自动设置size的数组就可以了。

 Notice:应该格外小心定义Required字段。当因为某原因要把Required字段改为Optional字段是,会有问题,老版本读取器会认为消息中没有该字段不完整,可能会拒绝或者丢弃该字段(Google文档是这么说的,但是我试了一下,将required的改为optional的,再用原来required时候的解析代码去读,如果字段赋值的话,并不会出错,但是如果字段未赋值,会报这样错误:Exception in thread “main” com.google.protobuf.InvalidProtocolBufferException: Message missing required fields:fieldname)。在设计时,尽量将这种验证放在应用程序端的完成。Google的一些工程师对此也很困惑,他们觉得,required类型坏处大于好处,应该尽量仅适用optional或者repeated的。但也并不是所有的人都这么想。

如果想深入学习.proto文件书写,可以参考Protocol Buffer Language Guide。但是不要妄想会有类似于类继承这样的机制,Protocol Buffers不做这个…

4. 编译Protocol Buffers

定义好.proto文件后,接下来,就是使用该文件,运行PB的编译器protoc,编译.proto文件,生成相关类,可以使用这些类读写“通讯簿”没得message。接下来我们要做:

  1. 如果你还没有安装PB编译器,到这里现在安装:download the package
  2. 安装后,运行protoc,结束后会发现在项目com.example.tutorial package下,生成了AddressBookProtos.java文件:
1
2
3
protoc -I=$SRC_DIR –java_out=$DST_DIR $SRC_DIR/addressbook.proto
#for example
protoc -I=G:\workspace\protobuf\message –java_out=G:\workspace\protobuf\src\main\java G:\workspace\protobuf\messages\addressbook.proto

  • -I:指明应用程序的源码位置,假如不赋值,则有当前路径(说实话,该处我是直译了,并不明白是什么意思。我做了尝试,该值不能为空,如果为空,则提示赋了一个空文件夹,如果是当前路径,请用.代替,我用.代替,又提示不对。但是可以是任何一个路径,都运行正确,只要不为空);
  • –java_out:指明目的路径,即生成代码输出路径。因为我们这里是基于java来说的,所以这里是–java_out,相对其他语言,设置为相对语言即可
  • 最后一个参数即.proto文件

Notice:此处运行完毕后,查看生成的代码,很有可能会出现一些类没有定义等错误,例如:com.google cannot be resolved to a type等。这是因为项目中缺少protocol buffers的相应library。在Protocol Buffers的源码包里,你会发现java/src/main/java,将这下边的文件拷贝到你的项目,大概可以解决问题。我只能说大概,因为当时我在弄得时候,也是刚学,各种出错,比较恶心。有一个简单的方法,呵呵,对于懒汉来说。创建一个maven的java项目,在pom.xml中,添加Protocol Buffers的依赖即可解决所有问题~在pom.xml中添加如下依赖(注意版本):

1
2
3
4
5
<dependency>
    <groupId>com.google.protobuf</groupId>
    <artifactId>protobuf-java</artifactId>
    <version>2.5.0</version>
</dependency>

 5. Protocol Buffer Java API

5.1 产生的类及方法

接下来看一下PB编译器创建了那些类以及方法。首先会发现一个.java文件,其内部定义了一个AddressBookProtos类,即我们在addressbook.proto文件java_outer_classname 指定的。该类内部有一系列内部类,对应分别是我们在addressbook.proto中定义的message。每个类内部都有相应的Builder类,我们可以用它创建类的实例。生成的类及类内部的Builder类,均自动生成了获取message中字段的方法,不同的是,生成的类仅有getter方法,而生成类内部的Builder既有getter方法,又有setter方法。本例中Person类,其仅有getter方法,如图所示:

 但是Person.Builder类,既有getter方法,又有setter方法,如图:

person.builderperson.builder

从上边两张图可以看到:

  1. 每一个字段都有JavaBean风格的getter和setter
  2. 对于每一个简单类型变量,还对应都有一个has这样的一个方法,如果该字段被赋值了,则返回true,否则,返回false
  3. 对每一个变量,都有一个clear方法,用于置空字段

对于repeated字段:

repeated filedrepeated filed

从图上看:

  1. 从person.builder图上看出,对于repeated字段,还有一个特殊的getter,即getPhoneCount方法,及repeated字段还有一个特殊的count方法
  2. 其getter和setter方法根据index获取或设置一个数据项
  3. add()方法用于附加一个数据项
  4. addAll()方法来直接增加一个容器中的所有数据项

注意到一点:所有的这些方法均命名均符合驼峰规则,即使在.proto文件中是小写的。PB compiler生成的方法及字段等都是按照驼峰规则来产生,以符合基本的Java规范,当然,其他语言也尽量如此。所以,在proto文件中,命名最好使用用“_”来分割不同小写的单词。

 5.2 枚举及嵌套类

从代码中可以发现,还产生了一个枚举:PhoneType,该枚举位于Person类内部:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
public enum PhoneType
        implements com.google.protobuf.ProtocolMessageEnum {
      /**
       * <code>MOBILE = 0;</code>
       */
      MOBILE(0,0),
      /**
       * <code>HOME = 1;</code>
       */
      HOME(1,1),
      /**
       * <code>WORK = 2;</code>
       */
      WORK(2,2),
      ;
      …
}

除此之外,如我们所预料,还有一个Person.PhoneNumber内部类,嵌套在Person类中,可以自行看一下生成代码,不再粘贴。

5.3 Builders vs. Messages

由PB compiler生成的消息类是不可变的。一旦一个消息对象构建出来,他就不再能够修改,就像java中的String一样。在构建一个message之前,首先要构建一个builder,然后使用builder的setter或者add()等方法为所需字段赋值,之后调用builder对象的build方法。

在使用中会发现,这些构造message对象的builder的方法,都又会返回一个新的builder,事实上,该builder跟调用这个方法的builder是同一方法。这样做的目的,仅是为了方便而已,我们可以把所有的setter写在一行内。

如下构造一个Person实例:

1
2
3
4
5
6
7
8
9
10
11
12
Person john = Person
        .newBuilder()
        .setId(1)
        .setName(“john”)
        .setEmail(“[email protected]”)
        .addPhone(
                PhoneNumber
                .newBuilder()
                .setNumber(“1861xxxxxxx”)
                .setType(PhoneType.WORK)
                .build()
        ).build();

5.4 标准消息方法

每一个消息类及Builder类,基本都包含一些公用方法,用来检查和维护这个message,包括:

  1.  isInitialized(): 检查是否所有的required字段是否被赋值
  2. toString(): 返回一个便于阅读的message表示(本来是二进制的,不可读),尤其在debug时候比较有用
  3. mergeFrom(Message other): 仅builder有此方法,将其message的内容与此message合并,覆盖简单及重复字段
  4. clear(): 仅builder有此方法,清空所有的字段

5.5 解析及序列化

对于每一个PB类,均提供了读写二进制数据的方法:

  1. byte[] toByteArray();: 序列化message并且返回一个原始字节类型的字节数组
  2. static Person parseFrom(byte[] data);: 将给定的字节数组解析为message
  3. void writeTo(OutputStream output);: 将序列化后的message写入到输出流
  4. static Person parseFrom(InputStream input);: 读入并且将输入流解析为一个message

这里仅列出了几个解析及序列化方法,完整列表,可以参见:Message API reference

6. 使用PB生成类写入

接下来使用这些生成的PB类,初始化一些联系人,并将其写入一个文件中。

下面的程序首先从一个文件中读取一个通讯簿(AddressBook),然后添加一个新的联系人,再将新的通讯簿写回到文件。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
package com.example.tutorial;
 
import com.example.tutorial.AddressBookProtos.AddressBook;
import com.example.tutorial.AddressBookProtos.Person;
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.InputStreamReader;
import java.io.IOException;
import java.io.PrintStream;
 
class AddPerson {
    // This function fills in a Person message based on user input.
    static Person PromptForAddress(BufferedReader stdin, PrintStream stdout)
            throws IOException {
        Person.Builder person = Person.newBuilder();
 
        stdout.print(“Enter person ID: “);
        person.setId(Integer.valueOf(stdin.readLine()));
 
        stdout.print(“Enter name: “);
        person.setName(stdin.readLine());
 
        stdout.print(“Enter email address (blank for none): “);
        String email = stdin.readLine();
        if (email.length() >0) {
            person.setEmail(email);
        }
 
        while (true) {
            stdout.print(“Enter a phone number (or leave blank to finish): “);
            String number = stdin.readLine();
            if (number.length() ==0) {
                break;
            }
 
            Person.PhoneNumber.Builder phoneNumber = Person.PhoneNumber
                    .newBuilder().setNumber(number);
 
            stdout.print(“Is this a mobile, home, or work phone? “);
            String type = stdin.readLine();
            if (type.equals(“mobile”)) {
                phoneNumber.setType(Person.PhoneType.MOBILE);
            }else if (type.equals(“home”)) {
                phoneNumber.setType(Person.PhoneType.HOME);
            }else if (type.equals(“work”)) {
                phoneNumber.setType(Person.PhoneType.WORK);
            }else {
                stdout.println(“Unknown phone type.  Using default.”);
            }
 
            person.addPhone(phoneNumber);
        }
 
        return person.build();
    }
 
    // Main function: Reads the entire address book from a file,
    // adds one person based on user input, then writes it back out to the same
    // file.
    public static void main(String[] args)throws Exception {
        if (args.length !=1) {
            System.err.println(“Usage:  AddPerson ADDRESS_BOOK_FILE”);
            System.exit(-1);
        }
 
        AddressBook.Builder addressBook = AddressBook.newBuilder();
 
        // Read the existing address book.
        try {
            addressBook.mergeFrom(new FileInputStream(args[0]));
        }catch (FileNotFoundException e) {
            System.out.println(args[0]
                    +”: File not found.  Creating a new file.”);
        }
 
        // Add an address.
        addressBook.addPerson(PromptForAddress(new BufferedReader(
                new InputStreamReader(System.in)), System.out));
 
        // Write the new address book back to disk.
        FileOutputStream output =new FileOutputStream(args[0]);
        addressBook.build().writeTo(output);
        output.close();
    }
}

 7. 使用PB生成类读取

运行第六部分程序,写入几个联系人到文件中,接下来,我们就要读取联系人。程序入下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
package com.example.tutorial;
import java.io.FileInputStream;
 
import com.example.tutorial.AddressBookProtos.AddressBook;
import com.example.tutorial.AddressBookProtos.Person;
 
class ListPeople {
  // Iterates though all people in the AddressBook and prints info about them.
  static void Print(AddressBook addressBook) {
    for (Person person: addressBook.getPersonList()) {
      System.out.println(“Person ID: ” + person.getId());
      System.out.println(”  Name: ” + person.getName());
      if (person.hasEmail()) {
        System.out.println(”  E-mail address: ” + person.getEmail());
      }
 
      for (Person.PhoneNumber phoneNumber : person.getPhoneList()) {
        switch (phoneNumber.getType()) {
          case MOBILE:
            System.out.print(”  Mobile phone #: “);
            break;
          case HOME:
            System.out.print(”  Home phone #: “);
            break;
          case WORK:
            System.out.print(”  Work phone #: “);
            break;
        }
        System.out.println(phoneNumber.getNumber());
      }
    }
  }
 
  // Main function:  Reads the entire address book from a file and prints all
  //   the information inside.
  public static void main(String[] args)throws Exception {
    if (args.length !=1) {
      System.err.println(“Usage:  ListPeople ADDRESS_BOOK_FILE”);
      System.exit(-1);
    }
 
    // Read the existing address book.
    AddressBook addressBook =
      AddressBook.parseFrom(new FileInputStream(args[0]));
 
    Print(addressBook);
  }
}

至此我们已经可以使用生成类写入和读取PB message。

8. 拓展PB

当产品发布后,迟早有一天我们需要改善我们的PB定义。如果要做到新的PB能够向后兼容,同时老的PB又能够向前兼容,我们必须遵守如下规则:

  1. 千万不要修改现有字段后边的数值标签
  2. 千万不要增加或者删除required字段
  3. 可以删除optional或者repeated字段
  4. 可以添加新的optional或者repeated字段,但是必须使用新的数字标签(该数字标签必须从未在该PB中使用过,包括已经删除字段的数字标签)

如果违反了这些规则,会有一些相应的异常,可参见some exceptions,但是这些异常,很少很少会被用到。

遵守这些规则,老的代码可以正确的读取新的message,但是会忽略新的字段;对于删掉的optional的字段,老代码会使用他们的默认值;对于删除的repeated字段,则把他们置为空。

新的代码也将能够透明的读取老的messages。但是必须注意,新的optional字段在老的message中是不存在的,必须显式的使用has_方法来判断其是否设置了,或者在.proto 文件中以[default = value]形式提供默认值。如果没有指定默认值的话,会按照类型默认值赋值。对于string类型,默认值是空字符串。对于bool来说,默认值是false。对于数字类型,默认值是0。

9. 高级用法

Protocol Buffers的应用远远不止简单的存取以及序列化。如果想了解更多用法,可以去研究Java API reference

Protocol Message Class提供了一个重要特性:反射。不需要再写任何特殊的message类型就可以遍历一条message的所有字段以及操作字段的值。反射的一个非常重要的应用是可以将PBmessage与其他的编码语言进行转化,例如与XML或者JSON之间。

反射另外一个更加高级的应用应该是两个同一类型message的之间的不同,或者开发一种可以成为“Protocol Buffers 正则表达式”的应用,使用它,可以编写符合一定消息内容的表达式。

除此之外,开动脑筋,你会发现,Protocol Buffers能解决远远超过你刚开始对他的期待。

译自:https://developers.google.com/protocol-buffers/docs/javatutorial

9月 162015
 

developers.google.com上的protobuf的语言入门(2.x版)

Language Guide

This guide describes how to use the protocol buffer language to structure your protocol buffer data, including .proto file syntax and how to generate data access classes from your .protofiles. It covers the proto2 version of the protocol buffers language: for information on the newerproto3 syntax, see the Proto3 Language Guide.

This is a reference guide – for a step by step example that uses many of the features described in this document, see the tutorial for your chosen language.

Defining A Message Type

First let’s look at a very simple example. Let’s say you want to define a search request message format, where each search request has a query string, the particular page of results you are interested in, and a number of results per page. Here’s the .proto file you use to define the message type.

message SearchRequest {
  required string query = 1;
  optional int32 page_number = 2;
  optional int32 result_per_page = 3;
}

The SearchRequest message definition specifies three fields (name/value pairs), one for each piece of data that you want to include in this type of message. Each field has a name and a type.

Specifying Field Types

In the above example, all the fields are scalar types: two integers (page_number andresult_per_page) and a string (query). However, you can also specify composite types for your fields, including enumerations and other message types.

Assigning Tags

As you can see, each field in the message definition has a unique numbered tag. These tags are used to identify your fields in the message binary format, and should not be changed once your message type is in use. Note that tags with values in the range 1 through 15 take one byte to encode, including the identifying number and the field’s type (you can find out more about this inProtocol Buffer Encoding). Tags in the range 16 through 2047 take two bytes. So you should reserve the tags 1 through 15 for very frequently occurring message elements. Remember to leave some room for frequently occurring elements that might be added in the future.

The smallest tag number you can specify is 1, and the largest is 229 – 1, or 536,870,911. You also cannot use the numbers 19000 though 19999 (FieldDescriptor::kFirstReservedNumberthrough FieldDescriptor::kLastReservedNumber), as they are reserved for the Protocol Buffers implementation – the protocol buffer compiler will complain if you use one of these reserved numbers in your .proto.

Specifying Field Rules

You specify that message fields are one of the following:

  • required: a well-formed message must have exactly one of this field.
  • optional: a well-formed message can have zero or one of this field (but not more than one).
  • repeated: this field can be repeated any number of times (including zero) in a well-formed message. The order of the repeated values will be preserved.

For historical reasons, repeated fields of basic numeric types aren’t encoded as efficiently as they could be. New code should use the special option [packed=true] to get a more efficient encoding. For example:

repeated int32 samples = 4 [packed=true];

Required Is Forever You should be very careful about marking fields as required. If at some point you wish to stop writing or sending a required field, it will be problematic to change the field to an optional field – old readers will consider messages without this field to be incomplete and may reject or drop them unintentionally. You should consider writing application-specific custom validation routines for your buffers instead. Some engineers at Google have come to the conclusion that using required does more harm than good; they prefer to use only optionaland repeated. However, this view is not universal.

Adding More Message Types

Multiple message types can be defined in a single .proto file. This is useful if you are defining multiple related messages – so, for example, if you wanted to define the reply message format that corresponds to your SearchResponse message type, you could add it to the same.proto:

message SearchRequest {
  required string query = 1;
  optional int32 page_number = 2;
  optional int32 result_per_page = 3;
}

message SearchResponse {
 ...
}

Adding Comments

To add comments to your .proto files, use C/C++-style // syntax.

message SearchRequest {
  required string query = 1;
  optional int32 page_number = 2;// Which page number do we want?
  optional int32 result_per_page = 3;// Number of results to return per page.
}

What’s Generated From Your .proto?

When you run the protocol buffer compiler on a .proto, the compiler generates the code in your chosen language you’ll need to work with the message types you’ve described in the file, including getting and setting field values, serializing your messages to an output stream, and parsing your messages from an input stream.

For C++, the compiler generates a .h and .cc file from each .proto, with a class for each message type described in your file.

For Java, the compiler generates a .java file with a class for each message type, as well as a special Builder classes for creating message class instances.

Python is a little different – the Python compiler generates a module with a static descriptor of each message type in your .proto, which is then used with a metaclass to create the necessary Python data access class at runtime.

You can find out more about using the APIs for each language by following the tutorial for your chosen language. For even more API details, see the relevant API reference.

Scalar Value Types

A scalar message field can have one of the following types – the table shows the type specified in the .proto file, and the corresponding type in the automatically generated class:

.proto Type Notes C++ Type Java Type Python Type[2]
double double double float
float float float float
int32 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. int32 int int
int64 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. int64 long int/long[3]
uint32 Uses variable-length encoding. uint32 int[1] int/long[3]
uint64 Uses variable-length encoding. uint64 long[1] int/long[3]
sint32 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. int32 int int
sint64 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. int64 long int/long[3]
fixed32 Always four bytes. More efficient than uint32 if values are often greater than 228. uint32 int[1] int
fixed64 Always eight bytes. More efficient than uint64 if values are often greater than 256. uint64 long[1] int/long[3]
sfixed32 Always four bytes. int32 int int
sfixed64 Always eight bytes. int64 long int/long[3]
bool bool boolean bool
string A string must always contain UTF-8 encoded or 7-bit ASCII text. string String str/unicode[4]
bytes May contain any arbitrary sequence of bytes. string ByteString str

You can find out more about how these types are encoded when you serialize your message inProtocol Buffer Encoding.

[1] In Java, unsigned 32-bit and 64-bit integers are represented using their signed counterparts, with the top bit simply being stored in the sign bit.

[2] In all cases, setting values to a field will perform type checking to make sure it is valid.

[3] 64-bit or unsigned 32-bit integers are always represented as long when decoded, but can be an int if an int is given when setting the field. In all cases, the value must fit in the type represented when set. See [2].

[4] Python strings are represented as unicode on decode but can be str if an ASCII string is given (this is subject to change).

Optional Fields And Default Values

As mentioned above, elements in a message description can be labeled optional. A well-formed message may or may not contain an optional element. When a message is parsed, if it does not contain an optional element, the corresponding field in the parsed object is set to the default value for that field. The default value can be specified as part of the message description. For example, let’s say you want to provide a default value of 10 for a SearchRequest’sresult_per_page value.

optional int32 result_per_page = 3 [default = 10];

If the default value is not specified for an optional element, a type-specific default value is used instead: for strings, the default value is the empty string. For bools, the default value is false. For numeric types, the default value is zero. For enums, the default value is the first value listed in the enum’s type definition.

Enumerations

When you’re defining a message type, you might want one of its fields to only have one of a pre-defined list of values. For example, let’s say you want to add a corpus field for eachSearchRequest, where the corpus can be UNIVERSAL, WEB, IMAGES, LOCAL, NEWS,PRODUCTS or VIDEO. You can do this very simply by adding an enum to your message definition – a field with an enum type can only have one of a specified set of constants as its value (if you try to provide a different value, the parser will treat it like an unknown field). In the following example we’ve added an enum called Corpus with all the possible values, and a field of typeCorpus:

message SearchRequest {
  required string query = 1;
  optional int32 page_number = 2;
  optional int32 result_per_page = 3 [default = 10];
  enum Corpus {
    UNIVERSAL = 0;
    WEB = 1;
    IMAGES = 2;
    LOCAL = 3;
    NEWS = 4;
    PRODUCTS = 5;
    VIDEO = 6;
  }
  optional Corpus corpus = 4 [default = UNIVERSAL];
}

You can define aliases by assigning the same value to different enum constants. To do this you need to set the allow_alias option to true, otherwise protocol compiler will generate an error message when aliases are found.

enum EnumAllowingAlias {
  option allow_alias = true;
  UNKNOWN = 0;
  STARTED = 1;
  RUNNING = 1;
}
enum EnumNotAllowingAlias {
  UNKNOWN = 0;
  STARTED = 1;
  // RUNNING = 1;  // Uncommenting this line will cause a compile error inside Google and a warning message outside.
}

Enumerator constants must be in the range of a 32-bit integer. Since enum values use varint encoding on the wire, negative values are inefficient and thus not recommended. You can defineenums within a message definition, as in the above example, or outside – these enums can be reused in any message definition in your .proto file. You can also use an enum type declared in one message as the type of a field in a different message, using the syntaxMessageType.EnumType.

When you run the protocol buffer compiler on a .proto that uses an enum, the generated code will have a corresponding enum for Java or C++, or a special EnumDescriptor class for Python that’s used to create a set of symbolic constants with integer values in the runtime-generated class.

For more information about how to work with message enums in your applications, see thegenerated code guide for your chosen language.

Using Other Message Types

You can use other message types as field types. For example, let’s say you wanted to includeResult messages in each SearchResponse message – to do this, you can define a Resultmessage type in the same .proto and then specify a field of type Result inSearchResponse:

message SearchResponse {
  repeated Result result = 1;
}

message Result {
  required string url = 1;
  optional string title = 2;
  repeated string snippets = 3;
}

Importing Definitions

In the above example, the Result message type is defined in the same file asSearchResponse – what if the message type you want to use as a field type is already defined in another .proto file?

You can use definitions from other .proto files by importing them. To import another .proto’s definitions, you add an import statement to the top of your file:

import "myproject/other_protos.proto";

By default you can only use definitions from directly imported .proto files. However, sometimes you may need to move a .proto file to a new location. Instead of moving the .proto file directly and updating all the call sites in a single change, now you can put a dummy .proto file in the old location to forward all the imports to the new location using the import public notion.import public dependencies can be transitively relied upon by anyone importing the proto contaning the import public statement. For example:

// new.proto
// All definitions are moved here
// old.proto
// This is the proto that all clients are importing.
import public "new.proto";
import "other.proto";
// client.proto
import "old.proto";
// You use definitions from old.proto and new.proto, but not other.proto

The protocol compiler searches for imported files in a set of directories specified on the protocol compiler command line using the -I/–proto_path flag. If no flag was given, it looks in the directory in which the compiler was invoked. In general you should set the –proto_path flag to the root of your project and use fully qualified names for all imports.

Using proto3 Message Types

It’s possible to import proto3 message types and use them in your proto2 messages, and vice versa. However, proto2 enums cannot be used in proto3 syntax.

Nested Types

You can define and use message types inside other message types, as in the following example – here the Result message is defined inside the SearchResponse message:

message SearchResponse {
  message Result {
    required string url = 1;
    optional string title = 2;
    repeated string snippets = 3;
  }
  repeated Result result = 1;
}

If you want to reuse this message type outside its parent message type, you refer to it asParent.Type:

message SomeOtherMessage {
  optional SearchResponse.Result result = 1;
}

You can nest messages as deeply as you like:

message Outer {                  // Level 0
  message MiddleAA {  // Level 1
    message Inner {   // Level 2
      required int64 ival = 1;
      optional bool  booly = 2;
    }
  }
  message MiddleBB {  // Level 1
    message Inner {   // Level 2
      required int32 ival = 1;
      optional bool  booly = 2;
    }
  }
}

Groups

Note that this feature is deprecated and should not be used when creating new message types – use nested message types instead.

Groups are another way to nest information in your message definitions. For example, another way to specify a SearchResponse containing a number of Results is as follows:

message SearchResponse {
  repeated group Result = 1 {
    required string url = 2;
    optional string title = 3;
    repeated string snippets = 4;
  }
}

A group simply combines a nested message type and a field into a single declaration. In your code, you can treat this message just as if it had a Result type field called result (the latter name is converted to lower-case so that it does not conflict with the former). Therefore, this example is exactly equivalent to the SearchResponse above, except that the message has a different wire format.

Updating A Message Type

If an existing message type no longer meets all your needs – for example, you’d like the message format to have an extra field – but you’d still like to use code created with the old format, don’t worry! It’s very simple to update message types without breaking any of your existing code. Just remember the following rules:

  • Don’t change the numeric tags for any existing fields.
  • Any new fields that you add should be optional or repeated. This means that any messages serialized by code using your “old” message format can be parsed by your new generated code, as they won’t be missing any required elements. You should set up sensible default values for these elements so that new code can properly interact with messages generated by old code. Similarly, messages created by your new code can be parsed by your old code: old binaries simply ignore the new field when parsing. However, the unknown fields are not discarded, and if the message is later serialized, the unknown fields are serialized along with it – so if the message is passed on to new code, the new fields are still available.
  • Non-required fields can be removed, as long as the tag number is not used again in your updated message type (it may be better to rename the field instead, perhaps adding the prefix “OBSOLETE_”, so that future users of your .proto can’t accidentally reuse the number).
  • A non-required field can be converted to an extension and vice versa, as long as the type and number stay the same.
  • int32, uint32, int64, uint64, and bool are all compatible – this means you can change a field from one of these types to another without breaking forwards- or backwards-compatibility. If a number is parsed from the wire which doesn’t fit in the corresponding type, you will get the same effect as if you had cast the number to that type in C++ (e.g. if a 64-bit number is read as an int32, it will be truncated to 32 bits).
  • sint32 and sint64 are compatible with each other but are not compatible with the other integer types.
  • string and bytes are compatible as long as the bytes are valid UTF-8.
  • Embedded messages are compatible with bytes if the bytes contain an encoded version of the message.
  • fixed32 is compatible with sfixed32, and fixed64 with sfixed64.
  • optional is compatible with repeated. Given serialized data of a repeated field as input, clients that expect this field to be optional will take the last input value if it’s a primitive type field or merge all input elements if it’s a message type field.
  • Changing a default value is generally OK, as long as you remember that default values are never sent over the wire. Thus, if a program receives a message in which a particular field isn’t set, the program will see the default value as it was defined in that program’s version of the protocol. It will NOT see the default value that was defined in the sender’s code.

Extensions

Extensions let you declare that a range of field numbers in a message are available for third-party extensions. Other people can then declare new fields for your message type with those numeric tags in their own .proto files without having to edit the original file. Let’s look at an example:

message Foo {
  // ...
  extensions 100 to 199;
}

This says that the range of field numbers [100, 199] in Foo is reserved for extensions. Other users can now add new fields to Foo in their own .proto files that import your .proto, using tags within your specified range – for example:

extend Foo {
  optional int32 bar = 126;
}

This says that Foo now has an optional int32 field called bar.

When your user’s Foo messages are encoded, the wire format is exactly the same as if the user defined the new field inside Foo. However, the way you access extension fields in your application code is slightly different to accessing regular fields – your generated data access code has special accessors for working with extensions. So, for example, here’s how you set the value ofbar in C++:

		
Foo foo; foo.SetExtension(bar, 15);

Similarly, the Foo class defines templated accessors HasExtension(), ClearExtension(),GetExtension(), MutableExtension(), and AddExtension(). All have semantics matching the corresponding generated accessors for a normal field. For more information about working with extensions, see the generated code reference for your chosen language.

Note that extensions can be of any field type, including message types, but cannot be oneofs or maps.

Nested Extensions

You can declare extensions in the scope of another type:

message Baz {
  extend Foo {
    optional int32 bar = 126;
  }
  ...
}

In this case, the C++ code to access this extension is:

Foo foo;
foo.SetExtension(Baz::bar, 15);

In other words, the only effect is that bar is defined within the scope of Baz.

This is a common source of confusion: Declaring an extend block nested inside a message typedoes not imply any relationship between the outer type and the extended type. In particular, the above example does not mean that Baz is any sort of subclass of Foo. All it means is that the symbol bar is declared inside the scope of Baz; it’s simply a static member.

A common pattern is to define extensions inside the scope of the extension’s field type – for example, here’s an extension to Foo of type Baz, where the extension is defined as part of Baz:

message Baz {
  extend Foo {
    optional Baz foo_ext = 127;
  }
  ...
}

However, there is no requirement that an extension with a message type be defined inside that type. You can also do this:

message Baz {
  ...
}

// This can even be in a different file.
extend Foo {
  optional Baz foo_baz_ext = 127;
}

In fact, this syntax may be preferred to avoid confusion. As mentioned above, the nested syntax is often mistaken for subclassing by users who are not already familiar with extensions.

Choosing Extension Numbers

It’s very important to make sure that two users don’t add extensions to the same message type using the same numeric tag – data corruption can result if an extension is accidentally interpreted as the wrong type. You may want to consider defining an extension numbering convention for your project to prevent this happening.

If your numbering convention might involve extensions having very large numbers as tags, you can specify that your extension range goes up to the maximum possible field number using the maxkeyword:

message Foo {
  extensions 1000 to max;
}

max is 229 – 1, or 536,870,911.

As when choosing tag numbers in general, your numbering convention also needs to avoid field numbers 19000 though 19999 (FieldDescriptor::kFirstReservedNumber throughFieldDescriptor::kLastReservedNumber), as they are reserved for the Protocol Buffers implementation. You can define an extension range that includes this range, but the protocol compiler will not allow you to define actual extensions with these numbers.

Oneof

If you have a message with many optional fields and where at most one field will be set at the same time, you can enforce this behavior and save memory by using the oneof feature.

Oneof fields are like optional fields except all the fields in a oneof share memory, and at most one field can be set at the same time. Setting any member of the oneof automatically clears all the other members. You can check which value in a oneof is set (if any) using a special case() orWhichOneof() method, depending on your chosen language.

Using Oneof

To define a oneof in your .proto you use the oneof keyword followed by your oneof name, in this case test_oneof:

		
message SampleMessage {   oneof test_oneof {      string name = 4;      SubMessage sub_message = 9;   } }

You then add your oneof fields to the oneof definition. You can add fields of any type, but cannot use the required, optional, or repeated keywords.

In your generated code, oneof fields have the same getters and setters as regular optionalmethods. You also get a special method for checking which value (if any) in the oneof is set. You can find out more about the oneof API for your chosen language in the relevant API reference.

Oneof Features

  • Setting a oneof field will automatically clear all other members of the oneof. So if you set several oneof fields, only the last field you set will still have a value.

    				
    SampleMessage message; message.set_name("name"); CHECK(message.has_name()); message.mutable_sub_message();   // Will clear name field. CHECK(!message.has_name());
  • If the parser encounters multiple members of the same oneof on the wire, only the last member seen is used in the parsed message.
  • Extensions are not supported for oneof.
  • A oneof cannot be repeated.
  • Reflection APIs work for oneof fields.
  • If you’re using C++, make sure your code doesn’t cause memory crashes. The following sample code will crash because sub_message was already deleted by calling theset_name() method.

    				
    SampleMessage message; SubMessage* sub_message = message.mutable_sub_message(); message.set_name("name");      // Will delete sub_message sub_message->set_...            // Crashes here
  • Again in C++, if you Swap() two messages with oneofs, each message will end up with the other’s oneof case: in the example below, msg1 will have a sub_message and msg2 will have a name.

    				
    SampleMessage msg1; msg1.set_name("name"); SampleMessage msg2; msg2.mutable_sub_message(); msg1.swap(&msg2); CHECK(msg1.has_sub_message()); CHECK(msg2.has_name());

Backwards-compatibility issues

Be careful when adding or removing oneof fields. If checking the value of a oneof returnsNone/NOT_SET, it could mean that the oneof has not been set or it has been set to a field in a different version of the oneof. There is no way to tell the difference, since there’s no way to know if an unknown field on the wire is a member of the oneof.

Tag Reuse Issues

  • Move optional fields into or out of a oneof: You may lose some of your information (some fields will be cleared) after the message is serialized and parsed.
  • Delete a oneof field and add it back: This may clear your currently set oneof field after the message is serialized and parsed.
  • Split or merge oneof: This has similar issues to moving regular optional fields.

Maps

If you want to create an associative map as part of your data definition, protocol buffers provides a handy shortcut syntax:

		
map<key_type, value_type> map_field = N;

…where the key_type can be any integral or string type (so, any scalar type except for floating point types and bytes). The value_type can be any type.

So, for example, if you wanted to create a map of projects where each Project message is associated with a string key, you could define it like this:

		
map<string, Project> projects = 3;

The generated map API is currently available for all proto2 supported languages. You can find out more about the map API for your chosen language in the relevant API reference.

Maps Features

  • Extensions are not supported for maps.
  • Maps cannot be repeated, optional, or required.
  • Wire format ordering and map iteration ordering of map values is undefined, so you cannot rely on your map items being in a particular order.
  • In text format, maps are sorted by key. Numeric keys are sorted numerically.

Backwards compatibility

The map syntax is equivalent to the following on the wire, so protocol buffers implementations that do not support maps can still handle your data:

		
message MapFieldEntry {   key_type key = 1;   value_type value = 2; } repeated MapFieldEntry map_field = N;

Packages

You can add an optional package specifier to a .proto file to prevent name clashes between protocol message types.

package foo.bar;
message Open { ... }

You can then use the package specifier when defining fields of your message type:

message Foo {
  ...
  required foo.bar.Open open = 1;
  ...
}

The way a package specifier affects the generated code depends on your chosen language:

  • In C++ the generated classes are wrapped inside a C++ namespace. For example, Openwould be in the namespace foo::bar.
  • In Java, the package is used as the Java package, unless you explicitly provide a option java_package in your .proto file.
  • In Python, the package directive is ignored, since Python modules are organized according to their location in the file system.

Packages and Name Resolution

Type name resolution in the protocol buffer language works like C++: first the innermost scope is searched, then the next-innermost, and so on, with each package considered to be “inner” to its parent package. A leading ‘.’ (for example, .foo.bar.Baz) means to start from the outermost scope instead.

The protocol buffer compiler resolves all type names by parsing the imported .proto files. The code generator for each language knows how to refer to each type in that language, even if it has different scoping rules.

Defining Services

If you want to use your message types with an RPC (Remote Procedure Call) system, you can define an RPC service interface in a .proto file and the protocol buffer compiler will generate service interface code and stubs in your chosen language. So, for example, if you want to define an RPC service with a method that takes your SearchRequest and returns a SearchResponse, you can define it in your .proto file as follows:

service SearchService {
  rpc Search (SearchRequest) returns (SearchResponse);
}

By default, the protocol compiler will then generate an abstract interface called SearchServiceand a corresponding “stub” implementation. The stub forwards all calls to an RpcChannel, which in turn is an abstract interface that you must define yourself in terms of your own RPC system. For example, you might implement an RpcChannel which serializes the message and sends it to a server via HTTP. In other words, the generated stub provides a type-safe interface for making protocol-buffer-based RPC calls, without locking you into any particular RPC implementation. So, in C++, you might end up with code like this:

		
using google::protobuf; protobuf::RpcChannel* channel; protobuf::RpcController* controller; SearchService* service; SearchRequest request; SearchResponse response; void DoSearch() {   // You provide classes MyRpcChannel and MyRpcController, which implement   // the abstract interfaces protobuf::RpcChannel and protobuf::RpcController.   channel = new MyRpcChannel("somehost.example.com:1234");   controller = new MyRpcController;   // The protocol compiler generates the SearchService class based on the   // definition given above.   service = new SearchService::Stub(channel);   // Set up the request.   request.set_query("protocol buffers");   // Execute the RPC.   service->Search(controller, request, response, protobuf::NewCallback(&Done)); } void Done() {   delete service;   delete channel;   delete controller; }

All service classes also implement the Service interface, which provides a way to call specific methods without knowing the method name or its input and output types at compile time. On the server side, this can be used to implement an RPC server with which you could register services.

		
using google::protobuf; class ExampleSearchService : public SearchService {  public:   void Search(protobuf::RpcController* controller,               const SearchRequest* request,               SearchResponse* response,               protobuf::Closure* done) {     if (request->query() == "google") {       response->add_result()->set_url("http://www.google.com");     } else if (request->query() == "protocol buffers") {       response->add_result()->set_url("http://protobuf.googlecode.com");     }     done->Run();   } }; int main() {   // You provide class MyRpcServer.  It does not have to implement any   // particular interface; this is just an example.   MyRpcServer server;   protobuf::Service* service = new ExampleSearchService;   server.ExportOnPort(1234, service);   server.Run();   delete service;   return 0; }

If you don’t want to plug in your own existing RPC system, you can now use gRPC: a language- and platform-neutral open source RPC system developed at Google. gRPC works particularly well with protocol buffers and lets you generate the relevant RPC code directly from your .proto files using a special protocol buffer compiler plugin. However, as there are potential compatibility issues between clients and servers generated with proto2 and proto3, we recommend that you use proto3 for defining gRPC services. You can find out more about proto3 syntax in the Proto3 Language Guide. If you do want to use proto2 with gRPC, you need to use version 3.0.0 or higher of the protocol buffers compiler and libraries.

In addition to gRPC, there are also a number of ongoing third-party projects to develop RPC implementations for Protocol Buffers. For a list of links to projects we know about, see the third-party add-ons wiki page.

Options

Individual declarations in a .proto file can be annotated with a number of options. Options do not change the overall meaning of a declaration, but may affect the way it is handled in a particular context. The complete list of available options is defined ingoogle/protobuf/descriptor.proto.

Some options are file-level options, meaning they should be written at the top-level scope, not inside any message, enum, or service definition. Some options are message-level options, meaning they should be written inside message definitions. Some options are field-level options, meaning they should be written inside field definitions. Options can also be written on enum types, enum values, service types, and service methods; however, no useful options currently exist for any of these.

Here are a few of the most commonly used options:

  • java_package (file option): The package you want to use for your generated Java classes. If no explicit java_package option is given in the .proto file, then by default the proto package (specified using the “package” keyword in the .proto file) will be used. However, proto packages generally do not make good Java packages since proto packages are not expected to start with reverse domain names. If not generating Java code, this option has no effect.

    option java_package = "com.example.foo";
  • java_outer_classname (file option): The class name for the outermost Java class (and hence the file name) you want to generate. If no explicit java_outer_classname is specified in the .proto file, the class name will be constructed by converting the .protofile name to camel-case (so foo_bar.proto becomes FooBar.java). If not generating Java code, this option has no effect.

    option java_outer_classname = "Ponycopter";
  • optimize_for (file option): Can be set to SPEED, CODE_SIZE, or LITE_RUNTIME. This affects the C++ and Java code generators (and possibly third-party generators) in the following ways:

    • SPEED (default): The protocol buffer compiler will generate code for serializing, parsing, and performing other common operations on your message types. This code is extremely highly optimized.
    • CODE_SIZE: The protocol buffer compiler will generate minimal classes and will rely on shared, reflection-based code to implement serialialization, parsing, and various other operations. The generated code will thus be much smaller than with SPEED, but operations will be slower. Classes will still implement exactly the same public API as they do in SPEED mode. This mode is most useful in apps that contain a very large number .proto files and do not need all of them to be blindingly fast.
    • LITE_RUNTIME: The protocol buffer compiler will generate classes that depend only on the “lite” runtime library (libprotobuf-lite instead of libprotobuf). The lite runtime is much smaller than the full library (around an order of magnitude smaller) but omits certain features like descriptors and reflection. This is particularly useful for apps running on constrained platforms like mobile phones. The compiler will still generate fast implementations of all methods as it does in SPEED mode. Generated classes will only implement the MessageLite interface in each language, which provides only a subset of the methods of the full Message interface.
    option optimize_for = CODE_SIZE;
  • cc_generic_services, java_generic_services, py_generic_services (file options): Whether or not the protocol buffer compiler should generate abstract service code based on services definitions in C++, Java, and Python, respectively. For legacy reasons, these default to true. However, as of version 2.3.0 (January 2010), it is considered preferrable for RPC implementations to provide code generator plugins to generate code more specific to each system, rather than rely on the “abstract” services.

    // This file relies on plugins to generate service code.
    option cc_generic_services = false;
    option java_generic_services = false;
    option py_generic_services = false;
  • cc_enable_arenas (file option): Enables arena allocation for C++ generated code.
  • message_set_wire_format (message option): If set to true, the message uses a different binary format intended to be compatible with an old format used inside Google called MessageSet. Users outside Google will probably never need to use this option. The message must be declared exactly as follows:

    message Foo {
      option message_set_wire_format = true;
      extensions 4 to max;
    }
    
  • packed (field option): If set to true on a repeated field of a basic integer type, a more compact encoding will be used. There is no downside to using this option. However, note that prior to version 2.3.0, parsers that received packed data when not expected would ignore it. Therefore, it was not possible to change an existing field to packed format without breaking wire compatibility. In 2.3.0 and later, this change is safe, as parsers for packable fields will always accept both formats, but be careful if you have to deal with old programs using old protobuf versions.

    repeated int32 samples = 4 [packed=true];
  • deprecated (field option): If set to true, indicates that the field is deprecated and should not be used by new code. In most languages this has no actual effect. In Java, this becomes a @Deprecated annotation. In the future, other language-specific code generators may generate deprecation annotations on the field’s accessors, which will in turn cause a warning to be emitted when compiling code which attempts to use the field.

    optional int32 old_field = 6 [deprecated=true];

Custom Options

Protocol Buffers even allow you to define and use your own options. Note that this is an advanced feature which most people don’t need. Since options are defined by the messages defined ingoogle/protobuf/descriptor.proto (like FileOptions or FieldOptions), defining your own options is simply a matter of extending those messages. For example:

import "google/protobuf/descriptor.proto";

extend google.protobuf.MessageOptions {
  optional string my_option = 51234;
}

message MyMessage {
  option (my_option) = "Hello world!";
}

Here we have defined a new message-level option by extending MessageOptions. When we then use the option, the option name must be enclosed in parentheses to indicate that it is an extension. We can now read the value of my_option in C++ like so:

string value = MyMessage::descriptor()->options().GetExtension(my_option);

Here, MyMessage::descriptor()->options() returns the MessageOptions protocol message for MyMessage. Reading custom options from it is just like reading any other extension.

Similarly, in Java we would write:

String value = MyProtoFile.MyMessage.getDescriptor().getOptions()
  .getExtension(MyProtoFile.myOption);

In Python it would be:

value = my_proto_file_pb2.MyMessage.DESCRIPTOR.GetOptions()
  .Extensions[my_proto_file_pb2.my_option]

Custom options can be defined for every kind of construct in the Protocol Buffers language. Here is an example that uses every kind of option:

import "google/protobuf/descriptor.proto";

extend google.protobuf.FileOptions {
  optional string my_file_option = 50000;
}
extend google.protobuf.MessageOptions {
  optional int32 my_message_option = 50001;
}
extend google.protobuf.FieldOptions {
  optional float my_field_option = 50002;
}
extend google.protobuf.EnumOptions {
  optional bool my_enum_option = 50003;
}
extend google.protobuf.EnumValueOptions {
  optional uint32 my_enum_value_option = 50004;
}
extend google.protobuf.ServiceOptions {
  optional MyEnum my_service_option = 50005;
}
extend google.protobuf.MethodOptions {
  optional MyMessage my_method_option = 50006;
}

option (my_file_option) = "Hello world!";

message MyMessage {
  option (my_message_option) = 1234;

  optional int32 foo = 1 [(my_field_option) = 4.5];
  optional string bar = 2;
}

enum MyEnum {
  option (my_enum_option) = true;

  FOO = 1 [(my_enum_value_option) = 321];
  BAR = 2;
}

message RequestType {}
message ResponseType {}

service MyService {
  option (my_service_option) = FOO;

  rpc MyMethod(RequestType) returns(ResponseType) {
    // Note:  my_method_option has type MyMessage.  We can set each field
    //   within it using a separate "option" line.
    option (my_method_option).foo = 567;
    option (my_method_option).bar = "Some string";
  }
}

Note that if you want to use a custom option in a package other than the one in which it was defined, you must prefix the option name with the package name, just as you would for type names. For example:

// foo.proto
import "google/protobuf/descriptor.proto";
package foo;
extend google.protobuf.MessageOptions {
  optional string my_option = 51234;
}
// bar.proto
import "foo.proto";
package bar;
message MyMessage {
  option (foo.my_option) = "Hello world!";
}

One last thing: Since custom options are extensions, they must be assigned field numbers like any other field or extension. In the examples above, we have used field numbers in the range 50000-99999. This range is reserved for internal use within individual organizations, so you can use numbers in this range freely for in-house applications. If you intend to use custom options in public applications, however, then it is important that you make sure that your field numbers are globally unique. To obtain globally unique field numbers, please send a request to [email protected]. Simply provide your project name (e.g. Object-C plugin) and your project website (if available). Usually you only need one extension number. You can declare multiple options with only one extension number by putting them in a sub-message:

message FooOptions {
  optional int32 opt1 = 1;
  optional string opt2 = 2;
}

extend google.protobuf.FieldOptions {
  optional FooOptions foo_options = 1234;
}

// usage:
message Bar {
  optional int32 a = 1 [(foo_options).opt1 = 123, (foo_options).opt2 = "baz"];
  // alternative aggregate syntax (uses TextFormat):
  optional int32 b = 2 [(foo_options) = { opt1: 123 opt2: "baz" }];
}

Also, note that each option type (file-level, message-level, field-level, etc.) has its own number space, so e.g. you could declare extensions of FieldOptions and MessageOptions with the same number.

Generating Your Classes

To generate the Java, Python, or C++ code you need to work with the message types defined in a.proto file, you need to run the protocol buffer compiler protoc on the .proto. If you haven’t installed the compiler, download the package and follow the instructions in the README.

The Protocol Compiler is invoked as follows:

protoc --proto_path=IMPORT_PATH --cpp_out=DST_DIR --java_out=DST_DIR --python_out=DST_DIR path/to/file.proto
  • IMPORT_PATH specifies a directory in which to look for .proto files when resolvingimport directives. If omitted, the current directory is used. Multiple import directories can be specified by passing the –proto_path option multiple times; they will be searched in order. -I=IMPORT_PATH can be used as a short form of –proto_path.
  • You can provide one or more output directives:

    As an extra convenience, if the DST_DIR ends in .zip or .jar, the compiler will write the output to a single ZIP-format archive file with the given name. .jar outputs will also be given a manifest file as required by the Java JAR specification. Note that if the output archive already exists, it will be overwritten; the compiler is not smart enough to add files to an existing archive.

  • You must provide one or more .proto files as input. Multiple .proto files can be specified at once. Although the files are named relative to the current directory, each file must reside in one of the IMPORT_PATHs so that the compiler can determine its canonical name.


9月 162015
 

developers.google.com上的protobuf的学习教程(java版), 给不能翻墙的朋友

Protocol Buffer Basics: Java

This tutorial provides a basic Java programmer’s introduction to working with protocol buffers. By walking through creating a simple example application, it shows you how to

  • Define message formats in a .proto file.
  • Use the protocol buffer compiler.
  • Use the Java protocol buffer API to write and read messages.

This isn’t a comprehensive guide to using protocol buffers in Java. For more detailed reference information, see the Protocol Buffer Language Guide, the Java API Reference, the Java Generated Code Guide, and the Encoding Reference.

Why Use Protocol Buffers?

The example we’re going to use is a very simple “address book” application that can read and write people’s contact details to and from a file. Each person in the address book has a name, an ID, an email address, and a contact phone number.

How do you serialize and retrieve structured data like this? There are a few ways to solve this problem:

  • Use Java Serialization. This is the default approach since it’s built into the language, but it has a host of well-known problems (see Effective Java, by Josh Bloch pp. 213), and also doesn’t work very well if you need to share data with applications written in C++ or Python.
  • You can invent an ad-hoc way to encode the data items into a single string – such as encoding 4 ints as “12:3:-23:67”. This is a simple and flexible approach, although it does require writing one-off encoding and parsing code, and the parsing imposes a small run-time cost. This works best for encoding very simple data.
  • Serialize the data to XML. This approach can be very attractive since XML is (sort of) human readable and there are binding libraries for lots of languages. This can be a good choice if you want to share data with other applications/projects. However, XML is notoriously space intensive, and encoding/decoding it can impose a huge performance penalty on applications. Also, navigating an XML DOM tree is considerably more complicated than navigating simple fields in a class normally would be.

Protocol buffers are the flexible, efficient, automated solution to solve exactly this problem. With protocol buffers, you write a .proto description of the data structure you wish to store. From that, the protocol buffer compiler creates a class that implements automatic encoding and parsing of the protocol buffer data with an efficient binary format. The generated class provides getters and setters for the fields that make up a protocol buffer and takes care of the details of reading and writing the protocol buffer as a unit. Importantly, the protocol buffer format supports the idea of extending the format over time in such a way that the code can still read data encoded with the old format.

Where to Find the Example Code

The example code is included in the source code package, under the “examples” directory. Download it here.

Defining Your Protocol Format

To create your address book application, you’ll need to start with a .proto file. The definitions in a .proto file are simple: you add a message for each data structure you want to serialize, then specify a name and a type for each field in the message. Here is the .proto file that defines your messages, addressbook.proto.

package tutorial;

option java_package = "com.example.tutorial";
option java_outer_classname = "AddressBookProtos";

message Person {
  required string name = 1;
  required int32 id = 2;
  optional string email = 3;

  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }

  message PhoneNumber {
    required string number = 1;
    optional PhoneType type = 2 [default = HOME];
  }

  repeated PhoneNumber phone = 4;
}

message AddressBook {
  repeated Person person = 1;
}

As you can see, the syntax is similar to C++ or Java. Let’s go through each part of the file and see what it does.

The .proto file starts with a package declaration, which helps to prevent naming conflicts between different projects. In Java, the package name is used as the Java package unless you have explicitly specified ajava_package, as we have here. Even if you do provide a java_package, you should still define a normalpackage as well to avoid name collisions in the Protocol Buffers name space as well as in non-Java languages.

After the package declaration, you can see two options that are Java-specific: java_package andjava_outer_classname. java_package specifies in what Java package name your generated classes should live. If you don’t specify this explicitly, it simply matches the package name given by the packagedeclaration, but these names usually aren’t appropriate Java package names (since they usually don’t start with a domain name). The java_outer_classname option defines the class name which should contain all of the classes in this file. If you don’t give a java_outer_classname explicitly, it will be generated by converting the file name to camel case. For example, “my_proto.proto” would, by default, use “MyProto” as the outer class name.

Next, you have your message definitions. A message is just an aggregate containing a set of typed fields. Many standard simple data types are available as field types, including bool, int32, float, double, andstring. You can also add further structure to your messages by using other message types as field types – in the above example the Person message contains PhoneNumber messages, while the AddressBookmessage contains Person messages. You can even define message types nested inside other messages – as you can see, the PhoneNumber type is defined inside Person. You can also define enum types if you want one of your fields to have one of a predefined list of values – here you want to specify that a phone number can be one of MOBILE, HOME, or WORK.

The ” = 1″, ” = 2″ markers on each element identify the unique “tag” that field uses in the binary encoding. Tag numbers 1-15 require one less byte to encode than higher numbers, so as an optimization you can decide to use those tags for the commonly used or repeated elements, leaving tags 16 and higher for less-commonly used optional elements. Each element in a repeated field requires re-encoding the tag number, so repeated fields are particularly good candidates for this optimization.

Each field must be annotated with one of the following modifiers:

  • required: a value for the field must be provided, otherwise the message will be considered “uninitialized”. Trying to build an uninitialized message will throw a RuntimeException. Parsing an uninitialized message will throw an IOException. Other than this, a required field behaves exactly like an optional field.
  • optional: the field may or may not be set. If an optional field value isn’t set, a default value is used. For simple types, you can specify your own default value, as we’ve done for the phone number type in the example. Otherwise, a system default is used: zero for numeric types, the empty string for strings, false for bools. For embedded messages, the default value is always the “default instance” or “prototype” of the message, which has none of its fields set. Calling the accessor to get the value of an optional (or required) field which has not been explicitly set always returns that field’s default value.
  • repeated: the field may be repeated any number of times (including zero). The order of the repeated values will be preserved in the protocol buffer. Think of repeated fields as dynamically sized arrays.

Required Is Forever You should be very careful about marking fields as required. If at some point you wish to stop writing or sending a required field, it will be problematic to change the field to an optional field – old readers will consider messages without this field to be incomplete and may reject or drop them unintentionally. You should consider writing application-specific custom validation routines for your buffers instead. Some engineers at Google have come to the conclusion that using required does more harm than good; they prefer to use onlyoptional and repeated. However, this view is not universal.

You’ll find a complete guide to writing .proto files – including all the possible field types – in the Protocol Buffer Language Guide. Don’t go looking for facilities similar to class inheritance, though – protocol buffers don’t do that.

Compiling Your Protocol Buffers

Now that you have a .proto, the next thing you need to do is generate the classes you’ll need to read and writeAddressBook (and hence Person and PhoneNumber) messages. To do this, you need to run the protocol buffer compiler protoc on your .proto:

  1. If you haven’t installed the compiler, download the package and follow the instructions in the README.
  2. Now run the compiler, specifying the source directory (where your application’s source code lives – the current directory is used if you don’t provide a value), the destination directory (where you want the generated code to go; often the same as $SRC_DIR), and the path to your .proto. In this case, you…:

    protoc -I=$SRC_DIR --java_out=$DST_DIR $SRC_DIR/addressbook.proto

    Because you want Java classes, you use the –java_out option – similar options are provided for other supported languages.

This generates com/example/tutorial/AddressBookProtos.java in your specified destination directory.

The Protocol Buffer API

Let’s look at some of the generated code and see what classes and methods the compiler has created for you. If you look in AddressBookProtos.java, you can see that it defines a class called AddressBookProtos, nested within which is a class for each message you specified in addressbook.proto. Each class has its ownBuilder class that you use to create instances of that class. You can find out more about builders in theBuilders vs. Messages section below.

Both messages and builders have auto-generated accessor methods for each field of the message; messages have only getters while builders have both getters and setters. Here are some of the accessors for the Personclass (implementations omitted for brevity):

	
// required string name = 1; public boolean hasName(); public String getName(); // required int32 id = 2; public boolean hasId(); public int getId(); // optional string email = 3; public boolean hasEmail(); public String getEmail(); // repeated .tutorial.Person.PhoneNumber phone = 4; public List<PhoneNumber> getPhoneList(); public int getPhoneCount(); public PhoneNumber getPhone(int index);

Meanwhile, Person.Builder has the same getters plus setters:

	
// required string name = 1; public boolean hasName(); public java.lang.String getName(); public Builder setName(String value); public Builder clearName(); // required int32 id = 2; public boolean hasId(); public int getId(); public Builder setId(int value); public Builder clearId(); // optional string email = 3; public boolean hasEmail(); public String getEmail(); public Builder setEmail(String value); public Builder clearEmail(); // repeated .tutorial.Person.PhoneNumber phone = 4; public List<PhoneNumber> getPhoneList(); public int getPhoneCount(); public PhoneNumber getPhone(int index); public Builder setPhone(int index, PhoneNumber value); public Builder addPhone(PhoneNumber value); public Builder addAllPhone(Iterable<PhoneNumber> value); public Builder clearPhone();

As you can see, there are simple JavaBeans-style getters and setters for each field. There are also has getters for each singular field which return true if that field has been set. Finally, each field has a clear method that un-sets the field back to its empty state.

Repeated fields have some extra methods – a Count method (which is just shorthand for the list’s size), getters and setters which get or set a specific element of the list by index, an add method which appends a new element to the list, and an addAll method which adds an entire container full of elements to the list.

Notice how these accessor methods use camel-case naming, even though the .proto file uses lowercase-with-underscores. This transformation is done automatically by the protocol buffer compiler so that the generated classes match standard Java style conventions. You should always use lowercase-with-underscores for field names in your .proto files; this ensures good naming practice in all the generated languages. See the style guide for more on good .proto style.

For more information on exactly what members the protocol compiler generates for any particular field definition, see the Java generated code reference.

Enums and Nested Classes

The generated code includes a PhoneType Java 5 enum, nested within Person:

	
public static enum PhoneType {   MOBILE(0, 0),   HOME(1, 1),   WORK(2, 2),   ;   ... }

The nested type Person.PhoneNumber is generated, as you’d expect, as a nested class within Person.

Builders vs. Messages

The message classes generated by the protocol buffer compiler are all immutable. Once a message object is constructed, it cannot be modified, just like a Java String. To construct a message, you must first construct a builder, set any fields you want to set to your chosen values, then call the builder’s build() method.

You may have noticed that each method of the builder which modifies the message returns another builder. The returned object is actually the same builder on which you called the method. It is returned for convenience so that you can string several setters together on a single line of code.

Here’s an example of how you would create an instance of Person:

	
Person john =   Person.newBuilder()     .setId(1234)     .setName("John Doe")     .setEmail("[email protected]")     .addPhone(       Person.PhoneNumber.newBuilder()         .setNumber("555-4321")         .setType(Person.PhoneType.HOME))     .build();

Standard Message Methods

Each message and builder class also contains a number of other methods that let you check or manipulate the entire message, including:

  • isInitialized(): checks if all the required fields have been set.
  • toString(): returns a human-readable representation of the message, particularly useful for debugging.
  • mergeFrom(Message other): (builder only) merges the contents of other into this message, overwriting singular fields and concatenating repeated ones.
  • clear(): (builder only) clears all the fields back to the empty state.

These methods implement the Message and Message.Builder interfaces shared by all Java messages and builders. For more information, see the complete API documentation for Message.

Parsing and Serialization

Finally, each protocol buffer class has methods for writing and reading messages of your chosen type using the protocol buffer binary format. These include:

  • byte[] toByteArray();: serializes the message and returns a byte array containing its raw bytes.
  • static Person parseFrom(byte[] data);: parses a message from the given byte array.
  • void writeTo(OutputStream output);: serializes the message and writes it to anOutputStream.
  • static Person parseFrom(InputStream input);: reads and parses a message from anInputStream.

These are just a couple of the options provided for parsing and serialization. Again, see the Message API reference for a complete list.

Protocol Buffers and O-O Design Protocol buffer classes are basically dumb data holders (like structs in C++); they don’t make good first class citizens in an object model. If you want to add richer behaviour to a generated class, the best way to do this is to wrap the generated protocol buffer class in an application-specific class. Wrapping protocol buffers is also a good idea if you don’t have control over the design of the .proto file (if, say, you’re reusing one from another project). In that case, you can use the wrapper class to craft an interface better suited to the unique environment of your application: hiding some data and methods, exposing convenience functions, etc. You should never add behaviour to the generated classes by inheriting from them. This will break internal mechanisms and is not good object-oriented practice anyway.

Writing A Message

Now let’s try using your protocol buffer classes. The first thing you want your address book application to be able to do is write personal details to your address book file. To do this, you need to create and populate instances of your protocol buffer classes and then write them to an output stream.

Here is a program which reads an AddressBook from a file, adds one new Person to it based on user input, and writes the new AddressBook back out to the file again. The parts which directly call or reference code generated by the protocol compiler are highlighted.

	
import com.example.tutorial.AddressBookProtos.AddressBook; import com.example.tutorial.AddressBookProtos.Person; import java.io.BufferedReader; import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.FileOutputStream; import java.io.InputStreamReader; import java.io.IOException; import java.io.PrintStream; class AddPerson {   // This function fills in a Person message based on user input.   static Person PromptForAddress(BufferedReader stdin,                                  PrintStream stdout) throws IOException {     Person.Builder person = Person.newBuilder();     stdout.print("Enter person ID: ");     person.setId(Integer.valueOf(stdin.readLine()));     stdout.print("Enter name: ");     person.setName(stdin.readLine());     stdout.print("Enter email address (blank for none): ");     String email = stdin.readLine();     if (email.length() > 0) {       person.setEmail(email);     }     while (true) {       stdout.print("Enter a phone number (or leave blank to finish): ");       String number = stdin.readLine();       if (number.length() == 0) {         break;       }       Person.PhoneNumber.Builder phoneNumber =         Person.PhoneNumber.newBuilder().setNumber(number);       stdout.print("Is this a mobile, home, or work phone? ");       String type = stdin.readLine();       if (type.equals("mobile")) {         phoneNumber.setType(Person.PhoneType.MOBILE);       } else if (type.equals("home")) {         phoneNumber.setType(Person.PhoneType.HOME);       } else if (type.equals("work")) {         phoneNumber.setType(Person.PhoneType.WORK);       } else {         stdout.println("Unknown phone type.  Using default.");       }       person.addPhone(phoneNumber);     }     return person.build();   }   // Main function:  Reads the entire address book from a file,   //   adds one person based on user input, then writes it back out to the same   //   file.   public static void main(String[] args) throws Exception {     if (args.length != 1) {       System.err.println("Usage:  AddPerson ADDRESS_BOOK_FILE");       System.exit(-1);     ; }     AddressBook.Builder addressBook = AddressBook.newBuilder();     // Read the existing address book.     try {       addressBook.mergeFrom(new FileInputStream(args[0]));     } catch (FileNotFoundException e) {       System.out.println(args[0] + ": File not found.  Creating a new file.");     }     // Add an address.     addressBook.addPerson(       PromptForAddress(new BufferedReader(new InputStreamReader(System.in)),                        System.out));     // Write the new address book back to disk.     FileOutputStream output = new FileOutputStream(args[0]);     addressBook.build().writeTo(output);     output.close();   } }

Reading A Message

Of course, an address book wouldn’t be much use if you couldn’t get any information out of it! This example reads the file created by the above example and prints all the information in it.

	
import com.example.tutorial.AddressBookProtos.AddressBook; import com.example.tutorial.AddressBookProtos.Person; import java.io.FileInputStream; import java.io.IOException; import java.io.PrintStream; class ListPeople {   // Iterates though all people in the AddressBook and prints info about them.   static void Print(AddressBook addressBook) {     for (Person person: addressBook.getPersonList()) {       System.out.println("Person ID: " + person.getId());       System.out.println("  Name: " + person.getName());       if (person.hasEmail()) {         System.out.println("  E-mail address: " + person.getEmail());       }       for (Person.PhoneNumber phoneNumber : person.getPhoneList()) {         switch (phoneNumber.getType()) {           case MOBILE:             System.out.print("  Mobile phone #: ");             break;           case HOME:             System.out.print("  Home phone #: ");             break;           case WORK:             System.out.print("  Work phone #: ");             break;         }         System.out.println(phoneNumber.getNumber());       }     }   }   // Main function:  Reads the entire address book from a file and prints all   //   the information inside.   public static void main(String[] args) throws Exception {     if (args.length != 1) {       System.err.println("Usage:  ListPeople ADDRESS_BOOK_FILE");       System.exit(-1);     }     // Read the existing address book.     AddressBook addressBook =       AddressBook.parseFrom(new FileInputStream(args[0]));     Print(addressBook);   } }

Extending a Protocol Buffer

Sooner or later after you release the code that uses your protocol buffer, you will undoubtedly want to “improve” the protocol buffer’s definition. If you want your new buffers to be backwards-compatible, and your old buffers to be forward-compatible – and you almost certainly do want this – then there are some rules you need to follow. In the new version of the protocol buffer:

  • you must not change the tag numbers of any existing fields.
  • you must not add or delete any required fields.
  • you may delete optional or repeated fields.
  • you may add new optional or repeated fields but you must use fresh tag numbers (i.e. tag numbers that were never used in this protocol buffer, not even by deleted fields).

(There are some exceptions to these rules, but they are rarely used.)

If you follow these rules, old code will happily read new messages and simply ignore any new fields. To the old code, optional fields that were deleted will simply have their default value, and deleted repeated fields will be empty. New code will also transparently read old messages. However, keep in mind that new optional fields will not be present in old messages, so you will need to either check explicitly whether they’re set with has_, or provide a reasonable default value in your .proto file with [default = value] after the tag number. If the default value is not specified for an optional element, a type-specific default value is used instead: for strings, the default value is the empty string. For booleans, the default value is false. For numeric types, the default value is zero. Note also that if you added a new repeated field, your new code will not be able to tell whether it was left empty (by new code) or never set at all (by old code) since there is no has_ flag for it.

Advanced Usage

Protocol buffers have uses that go beyond simple accessors and serialization. Be sure to explore the Java API reference to see what else you can do with them.

One key feature provided by protocol message classes is reflection. You can iterate over the fields of a message and manipulate their values without writing your code against any specific message type. One very useful way to use reflection is for converting protocol messages to and from other encodings, such as XML or JSON. A more advanced use of reflection might be to find differences between two messages of the same type, or to develop a sort of “regular expressions for protocol messages” in which you can write expressions that match certain message contents. If you use your imagination, it’s possible to apply Protocol Buffers to a much wider range of problems than you might initially expect!

Reflection is provided as part of the Message and Message.Builder interfaces.

附源码:

==============addressbook.proto文件=======================

// See README.txt for information and build instructions.

package tutorial;

option java_package = “com.example.tutorial”;
option java_outer_classname = “AddressBookProtos”;

message Person {
  required string name = 1;
  required int32 id = 2;        // Unique ID number for this person.
  optional string email = 3;

  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }

  message PhoneNumber {
    required string number = 1;
    optional PhoneType type = 2 [default = HOME];
  }

  repeated PhoneNumber phone = 4;
}

// Our address book file is just one of these.
message AddressBook {
  repeated Person person = 1;
}

===========================================================

=======AddPerson.java==============================================

package com.example.tutorial;

// See README.txt for information and build instructions.

import com.example.tutorial.AddressBookProtos.AddressBook;
import com.example.tutorial.AddressBookProtos.Person;
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.InputStreamReader;
import java.io.IOException;
import java.io.PrintStream;

class AddPerson {
  // This function fills in a Person message based on user input.
  static Person PromptForAddress(BufferedReader stdin,
                                 PrintStream stdout) throws IOException {
    Person.Builder person = Person.newBuilder();

    stdout.print(“Enter person ID: “);
    person.setId(Integer.valueOf(stdin.readLine()));

    stdout.print(“Enter name: “);
    person.setName(stdin.readLine());

    stdout.print(“Enter email address (blank for none): “);
    String email = stdin.readLine();
    if (email.length() > 0) {
      person.setEmail(email);
    }

    while (true) {
      stdout.print(“Enter a phone number (or leave blank to finish): “);
      String number = stdin.readLine();
      if (number.length() == 0) {
        break;
      }

      Person.PhoneNumber.Builder phoneNumber =
        Person.PhoneNumber.newBuilder().setNumber(number);

      stdout.print(“Is this a mobile, home, or work phone? “);
      String type = stdin.readLine();
      if (type.equals(“mobile”)) {
        phoneNumber.setType(Person.PhoneType.MOBILE);
      } else if (type.equals(“home”)) {
        phoneNumber.setType(Person.PhoneType.HOME);
      } else if (type.equals(“work”)) {
        phoneNumber.setType(Person.PhoneType.WORK);
      } else {
        stdout.println(“Unknown phone type.  Using default.”);
      }

      person.addPhone(phoneNumber);
    }

    return person.build();
  }

  // Main function:  Reads the entire address book from a file,
  //   adds one person based on user input, then writes it back out to the same
  //   file.
  public static void main(String[] args) throws Exception {
    if (args.length != 1) {
      System.err.println(“Usage:  AddPerson ADDRESS_BOOK_FILE”);
      System.exit(-1);
    }

    AddressBook.Builder addressBook = AddressBook.newBuilder();

    // Read the existing address book.
    try {
      FileInputStream input = new FileInputStream(args[0]);
      try {
        addressBook.mergeFrom(input);
      } finally {
        try { input.close(); } catch (Throwable ignore) {}
      }
    } catch (FileNotFoundException e) {
      System.out.println(args[0] + “: File not found.  Creating a new file.”);
    }

    // Add an address.
    addressBook.addPerson(
      PromptForAddress(new BufferedReader(new InputStreamReader(System.in)),
                       System.out));

    // Write the new address book back to disk.
    FileOutputStream output = new FileOutputStream(args[0]);
    try {
      addressBook.build().writeTo(output);
    } finally {
      output.close();
    }
  }
}

======================================================

================ListPeople.java===============================

package com.example.tutorial;

// See README.txt for information and build instructions.

import com.example.tutorial.AddressBookProtos.AddressBook;
import com.example.tutorial.AddressBookProtos.Person;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.PrintStream;

class ListPeople {
  // Iterates though all people in the AddressBook and prints info about them.
  static void Print(AddressBook addressBook) {
    for (Person person: addressBook.getPersonList()) {
      System.out.println(“Person ID: ” + person.getId());
      System.out.println(”  Name: ” + person.getName());
      if (person.hasEmail()) {
        System.out.println(”  E-mail address: ” + person.getEmail());
      }

      for (Person.PhoneNumber phoneNumber : person.getPhoneList()) {
        switch (phoneNumber.getType()) {
          case MOBILE:
            System.out.print(”  Mobile phone #: “);
            break;
          case HOME:
            System.out.print(”  Home phone #: “);
            break;
          case WORK:
            System.out.print(”  Work phone #: “);
            break;
        }
        System.out.println(phoneNumber.getNumber());
      }
    }
  }

  // Main function:  Reads the entire address book from a file and prints all
  //   the information inside.
  public static void main(String[] args) throws Exception {
    if (args.length != 1) {
      System.err.println(“Usage:  ListPeople ADDRESS_BOOK_FILE”);
      System.exit(-1);
    }

    // Read the existing address book.
    AddressBook addressBook =
      AddressBook.parseFrom(new FileInputStream(args[0]));

    Print(addressBook);
  }
}

问题:

1. 出现如下错误:“java.lang.UnsupportedOperationException: This is supposed to be overridden by subclasses.”

  • The .proto definition is converted to java using the 2.4.3 (or earlier) protoc.exe
  • You use the 2.5.0 protobuffers library

检查下使用使用了2.4.x版本的protoc,  我在使用ubuntu12.04时就碰到上面情况,以为安装了2.6,实际默认的protoc是2.4版本

2. 使用源码安装的protoc命令时,提示如下错误:error while loading shared libraries: libprotoc.so.9: cannot open shared object file: No such file or directory

    命令行界面输入:   

sudo ldconfig

DONE!!!

9月 162015
 

网上看到的一篇比较文章,分享下

 在java中socket传输数据时,数据类型往往比较难选择。可能要考虑带宽、跨语言、版本的兼容等问题。比较常见的做法有两种:一是把对象包装成JSON字符串传输,二是采用java对象的序列化和反序列化。随着Google工具protoBuf的开源,protobuf也是个不错的选择。对JSON,Object Serialize,ProtoBuf 做个对比。

定义一个待传输的对象UserVo:

Java代码  收藏代码

  1. public class UserVo{  
  2.     private String name;  
  3.     private int age;  
  4.     private long phone;  
  5.       
  6.     private List<UserVo> friends;  
  7. ……  
  8. }  

 初始化UserVo的实例src:

Java代码  收藏代码

  1. UserVo src = new UserVo();  
  2. src.setName(“Yaoming”);  
  3. src.setAge(30);  
  4. src.setPhone(13789878978L);  
  5.       
  6. UserVo f1 = new UserVo();  
  7. f1.setName(“tmac”);  
  8. f1.setAge(32);  
  9. f1.setPhone(138999898989L);  
  10. UserVo f2 = new UserVo();  
  11. f2.setName(“liuwei”);  
  12. f2.setAge(29);  
  13. f2.setPhone(138999899989L);  
  14.           
  15. List<UserVo> friends = new ArrayList<UserVo>();  
  16. friends.add(f1);  
  17. friends.add(f2);  
  18. src.setFriends(friends);  

JSON格式

采用Google的gson-2.2.2.jar 进行转义

Java代码  收藏代码

  1. Gson gson = new Gson();  
  2. String json = gson.toJson(src);  

 得到的字符串:

Js代码  收藏代码

  1. {“name”:“Yaoming”,“age”:30,“phone”:13789878978,“friends”:[{“name”:“tmac”,“age”:32,“phone”:138999898989},{“name”:“liuwei”,“age”:29,“phone”:138999899989}]}  

 字节数为153

Json的优点:明文结构一目了然,可以跨语言,属性的增加减少对解析端影响较小。缺点:字节数过多,依赖于不同的第三方类库。

 

Object Serialize

UserVo实现Serializalbe接口,提供唯一的版本号:

Java代码  收藏代码

  1. public class UserVo implements Serializable{  
  2.   
  3.     private static final long serialVersionUID = -5726374138698742258L;  
  4.     private String name;  
  5.     private int age;  
  6.     private long phone;  
  7.       
  8.     private List<UserVo> friends;  

 

序列化方法:

Java代码  收藏代码

  1. ByteArrayOutputStream bos = new ByteArrayOutputStream();  
  2. ObjectOutputStream os = new ObjectOutputStream(bos);  
  3. os.writeObject(src);  
  4. os.flush();  
  5. os.close();  
  6. byte[] b = bos.toByteArray();  
  7. bos.close();  

 字节数是238

 

反序列化:

Java代码  收藏代码

  1. ObjectInputStream ois = new ObjectInputStream(fis);  
  2. vo = (UserVo) ois.readObject();  
  3. ois.close();  
  4. fis.close();  

Object Serializalbe 优点:java原生支持,不需要提供第三方的类库,使用比较简单。缺点:无法跨语言,字节数占用比较大,某些情况下对于对象属性的变化比较敏感。 

对象在进行序列化和反序列化的时候,必须实现Serializable接口,但并不强制声明唯一的serialVersionUID

是否声明serialVersionUID对于对象序列化的向上向下的兼容性有很大的影响。我们来做个测试:

 
思路一

把UserVo中的serialVersionUID去掉,序列化保存。反序列化的时候,增加或减少个字段,看是否成功。

Java代码  收藏代码

  1. public class UserVo implements Serializable{  
  2.     private String name;  
  3.     private int age;  
  4.     private long phone;  
  5.       
  6.     private List<UserVo> friends;  

 

保存到文件中:

Java代码  收藏代码

  1. ByteArrayOutputStream bos = new ByteArrayOutputStream();  
  2. ObjectOutputStream os = new ObjectOutputStream(bos);  
  3. os.writeObject(src);  
  4. os.flush();  
  5. os.close();  
  6. byte[] b = bos.toByteArray();  
  7. bos.close();  
  8.   
  9. FileOutputStream fos = new FileOutputStream(dataFile);  
  10. fos.write(b);  
  11. fos.close();  

 

增加或者减少字段后,从文件中读出来,反序列化:

Java代码  收藏代码

  1. FileInputStream fis = new FileInputStream(dataFile);  
  2. ObjectInputStream ois = new ObjectInputStream(fis);  
  3. vo = (UserVo) ois.readObject();  
  4. ois.close();  
  5. fis.close();  

 

结果:抛出异常信息

Java代码  收藏代码

  1. Exception in thread “main” java.io.InvalidClassException: serialize.obj.UserVo; local class incompatible: stream classdesc serialVersionUID = 3305402508581390189, local class serialVersionUID = 7174371419787432394  
  2.     at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:560)  
  3.     at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1582)  
  4.     at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1495)  
  5.     at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1731)  
  6.     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)  
  7.     at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)  
  8.     at serialize.obj.ObjectSerialize.read(ObjectSerialize.java:74)  
  9.     at serialize.obj.ObjectSerialize.main(ObjectSerialize.java:27)  
 
思路二

eclipse指定生成一个serialVersionUID,序列化保存,修改字段后反序列化

略去代码

结果:反序列化成功

结论

如果没有明确指定serialVersionUID,序列化的时候会根据字段和特定的算法生成一个serialVersionUID,当属性有变化时这个id发生了变化,所以反序列化的时候就会失败。抛出“本地classd的唯一id和流中class的唯一id不匹配”。

 

jdk文档关于serialVersionUID的描述:

写道
如果可序列化类未显式声明 serialVersionUID,则序列化运行时将基于该类的各个方面计算该类的默认 serialVersionUID 值,如“Java(TM) 对象序列化规范”中所述。不过,强烈建议 所有可序列化类都显式声明 serialVersionUID 值,原因是计算默认的 serialVersionUID 对类的详细信息具有较高的敏感性,根据编译器实现的不同可能千差万别,这样在反序列化过程中可能会导致意外的 InvalidClassException。因此,为保证 serialVersionUID 值跨不同 java 编译器实现的一致性,序列化类必须声明一个明确的 serialVersionUID 值。还强烈建议使用 private 修饰符显示声明 serialVersionUID(如果可能),原因是这种声明仅应用于直接声明类 — serialVersionUID 字段作为继承成员没有用处。数组类不能声明一个明确的 serialVersionUID,因此它们总是具有默认的计算值,但是数组类没有匹配 serialVersionUID 值的要求。

 

Google ProtoBuf

protocol buffers 是google内部得一种传输协议,目前项目已经开源(http://code.google.com/p/protobuf/)。它定义了一种紧凑得可扩展得二进制协议格式,适合网络传输,并且针对多个语言有不同得版本可供选择。

以protobuf-2.5.0rc1为例,准备工作:

下载源码,解压,编译,安装

Shell代码  收藏代码

  1. tar zxvf protobuf-2.5.0rc1.tar.gz  
  2. ./configure  
  3. ./make  
  4. ./make install  

 测试:

Shell代码  收藏代码

  1. MacBook-Air:~ ming$ protoc –version  
  2. libprotoc 2.5.0  

 安装成功!进入源码得java目录,用mvn工具编译生成所需得jar包,protobuf-java-2.5.0rc1.jar

 

1、编写.proto文件,命名UserVo.proto 

Text代码  收藏代码

  1. package serialize;  
  2.   
  3. option java_package = “serialize”;  
  4. option java_outer_classname=“UserVoProtos”;  
  5.   
  6. message UserVo{  
  7.     optional string name = 1;  
  8.     optional int32 age = 2;  
  9.     optional int64 phone = 3;  
  10.     repeated serialize.UserVo friends = 4;  
  11. }  

 

2、在命令行利用protoc 工具生成builder类

Shell代码  收藏代码

  1. protoc -IPATH=.proto文件所在得目录 –java_out=java文件的输出路径  .proto的名称   

 得到UserVoProtos类

 

3、编写序列化代码

Java代码  收藏代码

  1. UserVoProtos.UserVo.Builder builder = UserVoProtos.UserVo.newBuilder();  
  2. builder.setName(“Yaoming”);  
  3. builder.setAge(30);  
  4. builder.setPhone(13789878978L);  
  5.           
  6. UserVoProtos.UserVo.Builder builder1 = UserVoProtos.UserVo.newBuilder();  
  7. builder1.setName(“tmac”);  
  8. builder1.setAge(32);  
  9. builder1.setPhone(138999898989L);  
  10.           
  11. UserVoProtos.UserVo.Builder builder2 = UserVoProtos.UserVo.newBuilder();  
  12. builder2.setName(“liuwei”);  
  13. builder2.setAge(29);  
  14. builder2.setPhone(138999899989L);  
  15.           
  16. builder.addFriends(builder1);  
  17. builder.addFriends(builder2);  
  18.           
  19. UserVoProtos.UserVo vo = builder.build();  
  20.           
  21. byte[] v = vo.toByteArray();  

 字节数53

 

4、反序列化

Java代码  收藏代码

  1. UserVoProtos.UserVo uvo = UserVoProtos.UserVo.parseFrom(dstb);  
  2. System.out.println(uvo.getFriends(0).getName());  

 结果:tmac,反序列化成功

google protobuf 优点:字节数很小,适合网络传输节省io,跨语言 。缺点:需要依赖于工具生成代码。

 

工作机制

proto文件是对数据的一个描述,包括字段名称,类型,字节中的位置。protoc工具读取proto文件生成对应builder代码的类库。protoc xxxxx  –java_out=xxxxxx 生成java类库。builder类根据自己的算法把数据序列化成字节流,或者把字节流根据反射的原理反序列化成对象。官方的示例:https://developers.google.com/protocol-buffers/docs/javatutorial。

proto文件中的字段类型和java中的对应关系:

详见:https://developers.google.com/protocol-buffers/docs/proto

 .proto Type  java Type  c++ Type
double  double  double
float  float  float
int32  int  int32
int64  long   int64
uint32  int  uint32
unint64  long  uint64
sint32  int  int32
sint64  long  int64
fixed32  int  uint32
fixed64  long  uint64
sfixed32  int  int32
sfixed64  long  int64
bool  boolean  bool
string  String  string
bytes  byte  string
字段属性的描述:
写道
required: a well-formed message must have exactly one of this field.
optional: a well-formed message can have zero or one of this field (but not more than one).
repeated: this field can be repeated any number of times (including zero) in a well-formed message. The order of the repeated values will be preserved.

 

protobuf 在序列化和反序列化的时候,是依赖于.proto文件生成的builder类完成,字段的变化如果不表现在.proto文件中就不会影响反序列化,比较适合字段变化的情况。做个测试:
把UserVo序列化到文件中:
Java代码  收藏代码

  1. UserVoProtos.UserVo vo = builder.build();  
  2. byte[] v = vo.toByteArray();  
  3. FileOutputStream fos = new FileOutputStream(dataFile);  
  4. fos.write(vo.toByteArray());  
  5. fos.close();  

 

为UserVo增加字段,对应的.proto文件:
Text代码  收藏代码

  1. package serialize;  
  2.   
  3. option java_package = “serialize”;  
  4. option java_outer_classname=“UserVoProtos”;  
  5.   
  6. message UserVo{  
  7.     optional string name = 1;  
  8.     optional int32 age = 2;  
  9.     optional int64 phone = 3;  
  10.     repeated serialize.UserVo friends = 4;  
  11.     optional string address = 5;  
  12. }  

 

从文件中反序列化回来:
Java代码  收藏代码

  1. FileInputStream fis = new FileInputStream(dataFile);  
  2. byte[] dstb = new byte[fis.available()];  
  3. for(int i=0;i<dstb.length;i++){  
  4.     dstb[i] = (byte)fis.read();  
  5. }  
  6. fis.close();  
  7. UserVoProtos.UserVo uvo = UserVoProtos.UserVo.parseFrom(dstb);  
  8. System.out.println(uvo.getFriends(0).getName());  

 成功得到结果。

三种方式对比传输同样的数据,google protobuf只有53个字节是最少的。结论:
方式 优点 缺点
JSON

跨语言、格式清晰一目了然

字节数比较大,需要第三方类库
Object Serialize java原生方法不依赖外部类库 字节数比较大,不能跨语言
Google protobuf

跨语言、字节数比较少

编写.proto配置用protoc工具生成对应的代码

 

以上测试用例覆盖面比较窄,可能无法正确反应真实情况仅代表个人观点,欢迎随时指正和讨论。

转自:http://www.iteye.com/topic/1128881