SELECT语句

本节深入讨论SELECT语句,并且涵盖了以下EBNF范式语法:

SELECT语句的EBNF范式
selectStmt  := name "=" selectBlock
 
selectBlock := SELECT name FROM ( edgeSet | vertexSet )
                   [sampleClause]
                   [whereClause]
                   [accumClause]
                   [postAccumClause]
                   [havingClause]
                   [orderClause]
                   [limitClause]
 
vertexSet := name [":" name]
 
edgeSet   := name [":" name]
            "-" "(" [vertexEdgeType][":" name] ")" "->"
            [vertexEdgeType][":" name]
 
vertexEdgeType := "_" | ANY | name | ( "(" name ["|" name]* ")" )
 
sampleClause := SAMPLE ( expr | expr "%" ) EDGE WHEN condition
             | SAMPLE expr TARGET WHEN condition
             | SAMPLE expr "%" TARGET PINNED WHEN condition
 
whereClause := WHERE condition
 
accumClause := ACCUM DMLSubStmtList
 
postAccumClause := POST-ACCUM DMLSubStmtList
   
DMLSubStmtList := DMLSubStmt ["," DMLSubStmt]*
 
DMLSubStmt := assignStmt           // Assignment  
           | funcCallStmt         // Function Call
           | gAccumAccumStmt      // Assignment
           | vAccumFuncCall       // Function Call
           | localVarDeclStmt     // Declaration
           | DMLSubCaseStmt       // Control Flow
           | DMLSubIfStmt         // Control Flow
           | DMLSubWhileStmt      // Control Flow
           | DMLSubForEachStmt    // Control Flow
           | BREAK                // Control Flow
           | CONTINUE             // Control Flow
           | insertStmt           // Data Modification
           | DMLSubDeleteStmt     // Data Modification
           | printlnStmt          // Output
           | logStmt              // Output
 
vAccumFuncCall := name "." "@"name ("." name "(" [argList] ")")+
 
havingClause := HAVING condition
 
orderClause := ORDER BY expr [ASC | DESC]["," expr [ASC | DESC]]*
 
limitClause := LIMIT ( expr | expr "," expr | expr OFFSET expr )

SELECT语句的作用是从顶点集或边集中选择出一组顶点。 它有许多子句可供选择,通过约束顶点集、边集或结果集来定义和/或细化选择的范围。SELECT语句有两大类,顶点引导(vertex-induced )和边引导(edge-induced)。 两者都会产生一个结果顶点集,称为结果集(result set)。

输出数据大小限制

SELECT语句的结果集最大为2GB。 如果SELECT语句的结果大于2GB,则系统将不返回任何数据,也不会生成任何错误消息。

SELECT语句的数据流

SELECT语句是一个赋值语句,它的右侧是一个SELECT语句表达式。 SELECT语句有许多不同功能的子句,它们根据逻辑流程组合在一起。 总的来说,SELECT语句从一组源顶点开始,并返回一个结果集,该结果集是源顶点集的子集或其相邻顶点的子集。语句执行过程中会对选定的顶点和边进行计算。 下图以图形方式描绘了整个SELECT数据流。 虽然ACCUM和POST-ACCUM子句不直接影响结果集中包含哪些顶点,但它们会影响到附加在这些顶点上的数据(累加器)。

FROM子句:顶点集和边集

FROM子句有两个选项:vertexSet或edgeSet。 如果使用vertexSet,则查询从顶点引发。 如果使用edge,则查询从边缘引发。

FROM子句
### selectBlock := SELECT name FROM ( edgeSet | vertexSet ) ...

顶点引导的选择操作

vertexSet := name [":" name]

顶点引发的选择操作的输入为顶点集,生成的结果集是输入集的子集。 FROM参数的格式为Source:s,其中Source是顶点集。Source之后的:s参数为可选项,它是Source集合中的任何顶点的别名。

resultSet = SELECT s FROM Source:s;

此语句可以解释为“从输入顶点集Source中选择所有顶点s”。其结果是一个顶点集。

下面是顶点引导选择操作的一个简单示例。

顶点引导SELECT语句的例子
# displays all 'post'-type vertices
CREATE QUERY printAllPosts() FOR GRAPH socialNet
{
       start = {post.*};​​​​ # start is initialized with all vertices of type 'post'
       results = SELECT s FROM start:s; # select these vertices
       PRINT results;
}
查询 printAllPosts的结果
GSQL > RUN QUERY printAllPosts()
{
 "error": false,
 "message": "",
 "version": {
   "schema": 0,
   "api": "v2"
 },
 "results": [{"results": [
   {
     "v_id": "0",
     "attributes": {
       "postTime": "2010-01-12 11:22:05",
       "subject": "Graphs"
     },
     "v_type": "post"
   },
   {
     "v_id": "10",
     "attributes": {
       "postTime": "2011-02-04 03:02:31",
       "subject": "cats"
     },
     "v_type": "post"
   },
   {
     "v_id": "2",
     "attributes": {
       "postTime": "2011-02-03 01:02:42",
       "subject": "query languages"
     },
     "v_type": "post"
   },
   {
     "v_id": "4",
     "attributes": {
       "postTime": "2011-02-07 05:02:51",
       "subject": "coffee"
     },
     "v_type": "post"
   },
   {
     "v_id": "9",
     "attributes": {
       "postTime": "2011-02-05 23:12:42",
       "subject": "cats"
     },
     "v_type": "post"
   },
   {
     "v_id": "3",
     "attributes": {
       "postTime": "2011-02-05 01:02:44",
       "subject": "cats"
     },
     "v_type": "post"
   },
   {
     "v_id": "5",
     "attributes": {
       "postTime": "2011-02-06 01:02:02",
       "subject": "tigergraph"
     },
     "v_type": "post"
   },
   {
     "v_id": "7",
     "attributes": {
       "postTime": "2011-02-04 17:02:41",
       "subject": "Graphs"
     },
     "v_type": "post"
   },
   {
     "v_id": "1",
     "attributes": {
       "postTime": "2011-03-03 23:02:00",
       "subject": "tigergraph"
     },
     "v_type": "post"
   },
   {
     "v_id": "11",
     "attributes": {
       "postTime": "2011-02-03 01:02:21",
       "subject": "cats"
     },
     "v_type": "post"
   },
   {
     "v_id": "8",
     "attributes": {
       "postTime": "2011-02-03 17:05:52",
       "subject": "cats"
     },
     "v_type": "post"
   },
   {
     "v_id": "6",
     "attributes": {
       "postTime": "2011-02-05 02:02:05",
       "subject": "tigergraph"
     },
     "v_type": "post"
   }
 ]}]
}

边引导的SELECT操作

edgeSet的EBNF范式,由边导出的选择操作
dgeSet   := name [":" name]
            "-" "(" [vertexEdgeType][":" name] ")" "->"
            [vertexEdgeType][":" name]
 
vertexEdgeType := "_" | ANY | name | ("(" name ["|" name]* ")")

用户可以使用分隔符“|”指定多个类型。另外,关键字“_”或“ANY”也可用于表示包含任何顶点类或边类的集合。

由边引导的SELECT操作从一组顶点开始,找到射向这些顶点的一组边,并生成也连接到这些边的顶点的集合作为结果集。通常这个操作的作用在于,从一组特定的源顶点出发,遍历某个指定类型的边类,最终得到一组目标顶点。FROM子句的参数(由EBNF范式的edgeSet规则正式定义)被制作为边的模板:Source:s-(eType:e)->tType:t 。边的模板有三个部分:源顶点集(源),边类(通过eType类型的边),以及目标顶点类型(tType)。 s和t都是顶点别名,e是边别名。该模板定义了一个模式:s→e→t,即从源顶点s,跨eType类型的边,到达tType类型的目标顶点。边别名e表示符合整个模式的任何边。同样,s和t是分别代表符合整个模式的任何源顶点和目标顶点的别名。

源顶点集(s)或目标顶点集(t)都可以用作SELECT语句的参数,该参数将决定SELECT语句的结果输出。请注意下面两个SELECT语句的细微差别。

在边引导的SELECT操作中,选择源顶点或目标顶点的区别:
resultSet1 = SELECT s FROM Source:s-(eType:e)->tType:t;   //Select from the source set
resultSet2 = SELECT t FROM Source:s-(eType:e)->tType:t;   //Select from the target set

resultSet1基于所选边的源端。 resultSet2基于所选边的目标端。 但是,resultSet1与Source顶点集不同,它只包括了Source集合中连接到eType类型的边并随之连接到tType类型顶点的那一小部分顶点。 此外,还有另一些子句可以对源集进行额外的过滤,在本章的稍后会详细介绍。

resultSet1基于所选边的源端。 resultSet2基于所选边的目标端。 但是,resultSet1与Source顶点集不同,它只包括了Source集合中连接到eType类型的边并随之连接到tType类型顶点的那一小部分顶点。 此外,还有另一些子句可以对源集进行额外的过滤,在本章的稍后会详细介绍。

我们强烈建议给FROM子句中的每个顶点和边都配一个别名,因为有某几个函数和特性只能用在顶点和边的别名上。

边集与目标顶点集的选项

FROM子句按类型选择边和目标顶点。 EBNF范式的vertexEdgeType描述了以下这些选项:

请注意,eType和tType不是必须的。 如果省略eType/tType(或者如果使用ANY或_代替),则SELECT语句将寻找所有有效的边或目标顶点(即边缘上的两个顶点之间所有有效路径,比如两个点通过一条边存在)。 对于下面的示例,如果V1和V2是只有eType类型的边可以连通,则我们可以省略标识目标顶点类型,而所有下述的SELECT语句全部等效。 系统则将在运行时推断目标顶点类型。

即便没有明确声明边/目标顶点类型,声明别名也是合法的操作。 请参阅以下示例。

目标顶点类指定
resultSet3 = SELECT v FROM Source:v-(eType:e)->(V1|V2):t;
resultSet4 = SELECT v FROM Source:v-(eType:e)->:t;
resultSet5 = SELECT v FROM Source:v-(eType:e)->ANY:t;
resultSet6 = SELECT v FROM Source:v-(eType:e)->_:t;

从优化性能的角度考虑,用户需要尽可能使用“类型”来筛选出边集和目标顶点集中需要的部分。Source中的顶点类将根据图形数据库纲目找到所有对应的边类。模式结合了eType和tType标识的类型条件,同时又结合了WHERE子句中明确标识或隐含的类型条件,最终输出一个合法的结果集,该集合满足Source →eType→tType模式 ;通过类型这个属性来筛选,用户可以自由地选择到他们想要的结果集。

类似地,GSQL引擎会在运行时推断边类。 例如,如果E1,E2和E3是唯一可以遍历到达tType类型顶点的边类,则我们可以不指定边类型。以下SELECT语句等效。

边类型的指定
resultSet7 = SELECT v FROM Source:v-((E1|E2|E3):e)->tType:t;
resultSet8 = SELECT v FROM Source:v-(:e)->tType:t;
resultSet9 = SELECT v FROM Source:v-(_:e)->tType:t;
resultSet10 = SELECT v FROM Source:v-(ANY:e)->tType:t;

下例中是一条边引发的SELECT语句的查询。 allPostsLiked和allPostsMade查询展示了如何省略目标顶点类型。allPostsLikedOrMade查询使用“|” 运算符,通过它可以选择多种类型的边。

例:边引导SELECT操作
# uses various SELECT statements (some of which are equivalent) to print out
# either the posts made by the given user, the posts liked by the given
# user, or the posts made or liked by the given user.
CREATE QUERY printAllPosts2(vertex<person> seed) FOR GRAPH socialNet
{
start = {seed}; # initialize starting set of vertices
 
 
​# --- statements produce equivalent results
​# select all 'post' vertices which can be reached from 'start' in one hop
​# ​using an edge of type 'liked'
​allPostsLiked = SELECT targetVertex FROM start -(liked:e)-> post:targetVertex;
 
​# select all vertices of any type which can be reached from 'start' in one hop
​# ​using an edge of type 'liked'
​allPostsLiked = SELECT targetVertex FROM start -(liked:e)-> :targetVertex;
​# ----
 
 
​# --- statements produce equivalent results
​# start with the vertex set from above, and traverse all edges of type "posted"
 ​# ​(locally those edges are just given a name 'e' in case they need accessed)
 ​# ​and return all vertices of type 'post' which can be reached within one-hop of 'start' vertices
​allPostsMade = SELECT targetVertex FROM start -(posted:e)-> post:targetVertex;
 
​# start with the vertex set from above, and traverse all edges of type "posted"
 ​# ​(locally those edges are just given a name 'e' in case they need accessed)
 ​# ​and return all vertices of any type which can be reached within one-hop of 'start' vertices
​allPostsMade = SELECT targetVertex FROM start -(posted:e)-> :targetVertex;
​# ----
 
 
​# --- statements produce equivalent results
​# select all vertices of type 'post' which can be reached from 'start' in one hop
​# ​using an edge of any type
​# not equivalent to any statement. because it doesn't restrict the edge type,
​# ​this will include any vertex connected by 'liked' or 'posted' edge types
​allPostsLikedOrMade = SELECT t FROM start -(:e)-> t;
 
​# select all vertices of type 'post' which can be reached from 'start' in one hop
​#​using an edge of type either 'posted' or 'liked'
​allPostsLikedOrMade = SELECT t FROM start -((posted|liked):e)-> post:t;
 
​# select all vertices of any type which can be reached from 'start' in one hop
​#​using an edge of type either 'posted' or 'liked/
​allPostsLikedOrMade = SELECT t FROM start -((posted|liked):e)-> :t;
​# ----

PRINT allPostsLiked;
   PRINT allPostsMade;
   PRINT allPostsLikedOrMade;
}
查询 printAllPosts2的结果
GSQL > RUN QUERY printAllPosts2("person2")
{
 "error": false,
 "message": "",
 "version": {
   "schema": 0,
   "api": "v2"
 },
 "results": [
   {"allPostsLiked": [
     {
       "v_id": "0",
       "attributes": {
         "postTime": "2010-01-12 11:22:05",
         "subject": "Graphs"
       },
       "v_type": "post"
     },
     {
       "v_id": "3",
       "attributes": {
         "postTime": "2011-02-05 01:02:44",
         "subject": "cats"
       },
       "v_type": "post"
     }
   ]},
   {"allPostsMade": [{
     "v_id": "1",
     "attributes": {
       "postTime": "2011-03-03 23:02:00",
       "subject": "tigergraph"
     },
     "v_type": "post"
   }]},
   {"allPostsLikedOrMade": [
     {
       "v_id": "0",
       "attributes": {
         "postTime": "2010-01-12 11:22:05",
         "subject": "Graphs"
       },
       "v_type": "post"
     },
     {
       "v_id": "3",
       "attributes": {
         "postTime": "2011-02-05 01:02:44",
         "subject": "cats"
       },
       "v_type": "post"
     },
     {
       "v_id": "1",
       "attributes": {
         "postTime": "2011-03-03 23:02:00",
         "subject": "tigergraph"
       },
       "v_type": "post"
     }
   ]}
 ]
}
GSQL > RUN QUERY printAllPosts2("person6")
{
 "error": false,
 "message": "",
 "version": {
   "schema": 0,
   "api": "v2"
 },
 "results": [
   {"allPostsLiked": [{
     "v_id": "8",
     "attributes": {
       "postTime": "2011-02-03 17:05:52",
       "subject": "cats"
     },
     "v_type": "post"
   }]},
   {"allPostsMade": [
     {
       "v_id": "10",
       "attributes": {
         "postTime": "2011-02-04 03:02:31",
         "subject": "cats"
       },
       "v_type": "post"
     },
     {
       "v_id": "5",
       "attributes": {
         "postTime": "2011-02-06 01:02:02",
         "subject": "tigergraph"
       },
       "v_type": "post"
     }
   ]},
   {"allPostsLikedOrMade": [
     {
       "v_id": "10",
       "attributes": {
         "postTime": "2011-02-04 03:02:31",
         "subject": "cats"
       },
       "v_type": "post"
     },
     {
       "v_id": "5",
       "attributes": {
         "postTime": "2011-02-06 01:02:02",
         "subject": "tigergraph"
       },
       "v_type": "post"
     },
     {
       "v_id": "8",
       "attributes": {
         "postTime": "2011-02-03 17:05:52",
         "subject": "cats"
       },
       "v_type": "post"
     }
   ]}
 ]
}

这个例子是另一个使用“|”分隔符的边选择操作示例, 其中包含多个类型的目标顶点的边。

边引导SELECT操作的示例
# uses a SELECT statement to print out everything related to a given user
# ​this includes posts that the user liked, posts that the user made, and friends
# ​of the user
CREATE QUERY printAllRelatedItems(vertex<person> seed) FOR GRAPH socialNet
{
​sourceVertex = {seed};

​# -- statements produce equivalent output
​# returns all vertices of type either 'person' or 'post' that can be reached
​# ​from the sourceVertex set using one edge of any type
​everythingRelated = SELECT v FROM sourceVertex -(:e)-> (person|post):v;
 
​# returns all vertices of any type that can be reached from the sourceVertex
​# ​using one edge of any type
​# this statement is equivalent to the above one because the graph schema only
​#​has vertex types of either 'person' or 'post'. if there were more vertex
​#​types present, these would not be equivalent.
​everythingRelated = SELECT v FROM sourceVertex -(:e)-> :v;
​# --
 
PRINT everythingRelated;
}
结果
GSQL > RUN QUERY printAllRelatedItems("person2")
{
 "error": false,
 "message": "",
 "version": {
   "schema": 0,
   "api": "v2"
 },
 "results": [{"everythingRelated": [
   {
     "v_id": "0",
     "attributes": {
       "postTime": "2010-01-12 11:22:05",
       "subject": "Graphs"
     },
     "v_type": "post"
   },
   {
     "v_id": "person3",
     "attributes": {
       "gender": "Male",
       "id": "person3"
     },
     "v_type": "person"
   },
   {
     "v_id": "person1",
     "attributes": {
       "gender": "Male",
       "id": "person1"
     },
     "v_type": "person"
   },
   {
     "v_id": "3",
     "attributes": {
       "postTime": "2011-02-05 01:02:44",
       "subject": "cats"
     },
     "v_type": "post"
   },
   {
     "v_id": "1",
     "attributes": {
       "postTime": "2011-03-03 23:02:00",
       "subject": "tigergraph"
     },
     "v_type": "post"
   }
 ]}]
}
GSQL > RUN QUERY printAllRelatedItems("person6")
{
 "error": false,
 "message": "",
 "version": {
   "schema": 0,
   "api": "v2"
 },
 "results": [{"everythingRelated": [
   {
     "v_id": "person4",
     "attributes": {
       "gender": "Female",
       "id": "person4"
     },
     "v_type": "person"
   },
   {
     "v_id": "10",
     "attributes": {
       "postTime": "2011-02-04 03:02:31",
       "subject": "cats"
     },
     "v_type": "post"
   },
   {
     "v_id": "5",
     "attributes": {
       "postTime": "2011-02-06 01:02:02",
       "subject": "tigergraph"
     },
     "v_type": "post"
   },
   {
     "v_id": "person8",
     "attributes": {
       "gender": "Male",
       "id": "person8"
     },
     "v_type": "person"
   },
   {
     "v_id": "8",
     "attributes": {
       "postTime": "2011-02-03 17:05:52",
       "subject": "cats"
     },
     "v_type": "post"
   }
 ]}]
}

顶点和边的别名

顶点和边的别名在SELECT语句的FROM子句中声明,使用字符“:”,后跟别名。 别名可以在同一个SELECT语句中的任何位置被使用和访问。 它们被用于引用某个集合中选定的顶点或某个集合中的边。 通过顶点或边的别名可以访问这些顶点或边的属性。

例如,以下代码段显示了两个不同的SELECT语句。 第一个SELECT语句从名为allVertices的顶点集开始,顶点别名v可以从allVertices访问每个单独的顶点。 第二个SELECT语句选择一组边。 它可以使用顶点别名s来引用源顶点,或使用别名t来引用目标顶点。

顶点变量
results = SELECT v FROM allVertices:v;
results = SELECT v FROM allVertices:s -()-> :t;

以下示例展示了基于边的SELECT语句,声明了边的所有三个部分的别名。 在ACCUM子句中,e和t别名被分配给局部的顶点和边变量。

边变量
results = SELECT v
         FROM allVertices:s -(:e)-> :t
         ACCUM VERTEX v = t, EDGE eg = e;

我们强烈建议对FROM子句中的每个顶点和边都配给一个别名,因为有某几个函数和特性只能用在顶点和边的别名上。

采样(SAMPLE)子句

采样子句是一个可选子句,它从FROM参数中指定的边或顶点的范围中选择一个标准随机样本。 需要说明的是,符合要求的边由满足FROM子句的所有三个参数规定(源集,边类型和目标类型)。 采样子句的作用在于给出一个代表性样本,该样本描述了连接到中心顶点(hub vertex)的边(或顶点),从而避免了处理所有的边。 中心顶点是一个具有相对较高“度数”(degree)的顶点。 (一个顶点的度数就是它所连接的边的个数。如果边是有向的,则需要区分向内度数(indegree)和向外度数(outdegree)。

注意:

目前,可以与采样子句一起使用的WHEN的条件,被严格限制为仅检查在某个顶点上调用的某个函数的结果,是否大于或大于等于某个数字。

采样子句表达式之后,用户需要指定样本大小,该大小可以是一个绝对数量,也可以是总体的百分比。 sampleClause的结果必须为正整数。 抽样方法有两种。 一种是基于边ID的采样,另一个则基于目标顶点id的采样。如果对目标顶点id进行采样,则意味着要对从该源顶点开始,到采样的目标顶点结束的所有边进行采样。

采样子句的EBNF范式
sampleClause := SAMPLE ( expr | expr "%" ) EDGE WHEN condition  # Sample an absolute number (or a percentage) of edges for each source vertex.
             | SAMPLE expr TARGET WHEN condition               # Sample an absolute number of edges incident to each target vertex.
             | SAMPLE expr "%" TARGET PINNED WHEN condition    # Sample a percentage of edges incident to each target vertex.

鉴于采样是随机的,示例查询的结果可能在每次运行时都会发生变化。

以下查询显示两种采样模式:绝对数量和百分比。 我们使用示例图computerNet(见附录D)作为源数据。 在computerNet中,有31个顶点和43条边,但只有7个顶点是源顶点。 此外,c1,c12和c23是中心顶点,每个顶点至少射出10条边。 对于绝对计数情况,我们将每个源顶点取样数设为1个边,即相当于在边中随机选择一个。 则结果一定是精确的7条边。 而对于百分比采样的情况来说,我们对具有至少3个输出边的顶点采样33%,则预计大约会输出15个边,但有可能上下会有一些变化。

sampleEx3: 基于每个源顶点上的边的样本
CREATE QUERY sampleEx3() FOR GRAPH computerNet
{
   MapAccum<STRING,ListAccum<STRING>> @@absEdges; // record each selected edge as (src->tgt)
   SumAccum<INT> @@totalAbs;
   MapAccum<STRING,ListAccum<STRING>> @@pctEdges; // record each selected edge as (src->tgt)
   SumAccum<INT> @@totalPct;
 
   start = {computer.*};
 
   # Sample one outgoing edge per source vertex = Random Walk
   absSample = SELECT v FROM start:s -(:e)-> :v
            SAMPLE 1 EDGE WHEN s.outdegree() >= 1    # sample 1 target vertex from each source vertex
            ACCUM @@absEdges += (s.id -> v.id),
                  @@totalAbs += 1;
   PRINT @@totalAbs, @@absEdges;
 
   pctSample = SELECT v FROM start:s -(:e)-> :v
            SAMPLE 33% EDGE WHEN s.outdegree() >= 3  # select ~1/3 of edges when outdegree >= 3
            ACCUM @@pctEdges += (s.id -> v.id),
                  @@totalPct += 1;
   PRINT @@totalPct, @@pctEdges;
}
sampleEx3.json
GSQL > RUN QUERY sampleEx3()
{
 "error": false,
 "message": "",
 "version": {
   "schema": 0,
   "api": "v2"
 },
 "results": [
   {
     "@@totalAbs": 7,
     "@@absEdges": {
       "c4": ["c23"],
       "c11": ["c12"],
       "c10": ["c11"],
       "c12": ["c14"],
       "c23": ["c26"],
       "c14": ["c24"],
       "c1": ["c10"]
     }
   },
   {
     "@@totalPct": 13,
     "@@pctEdges": {
       "c4": ["c23"],
       "c11": ["c12"],
       "c10": ["c11"],
       "c12": [
         "c14",
         "c15",
         "c19"
       ],
       "c23": [
         "c29",
         "c25"
       ],
       "c14": [
         "c24",
         "c23"
       ],
       "c1": [
         "c3",
         "c8",
         "c2"
       ]
     }
   }
 ]
}

下面的示例中,SELECT语句仅遍历每个源顶点的一条边。 附属于顶点的累加器@timesTraversedNoSample和@timesTraversedWithSample用于跟踪遍历到达目标顶点的边的次数。 如果不使用采样,则每个边都只经过一次; 因此,@timesTraversedNoSample应该与顶点的向内度数相同。 而对于边的采样来说,反映在@timesTraversedWithSample累加器中,边的数量受到了限制。注意结果集中的差异。 因为在使用采样子句时,遍历仅选择每个源顶点的一条边,所以它不会到达所有的目标顶点。 例如,顶点company3有三条边,但在某一次查询的实例中,它们可能永远不会都遍历到。 此外,company2有6条入射边,但只有4个源顶点采样了到company2的一条边事件。

例:通过绝对数量对边取样
CREATE QUERY sampleEx1() FOR GRAPH workNet
{
​SumAccum<INT> @timesTraversedNoSample;
​SumAccum<INT> @timesTraversedWithSample;
​workers = {person.*};
 
​# The 'beforeSample' result set encapsulates the normal functionality of
​# a SELECT statement, where 'timesTraversedNoSample' vertex accumulator is increased for
​# each edge incident to the vertex.
​beforeSample = SELECT v FROM workers:t -(:e)-> :v
​​       ACCUM v.@timesTraversedNoSample += 1;
 
​# The 'afterSample' result set is formed by those vertices which can be
​# reached when for each source vertex, only one edge is used for traversal.
​# This is demonstrated by the values of 'timesTraversedWithSample' vertex accumulator, which
​# is increased for each edge incident to the vertex which is used in the
​# sample.
​afterSample = SELECT v FROM workers:t -(:e)-> :v
​​      SAMPLE 1 EDGE WHEN t.outdegree() >= 1​​# only use 1 edge from the source vertex
​​      ACCUM v.@timesTraversedWithSample += 1;
 
PRINT beforeSample;
PRINT afterSample;
}
sampleEx1.json
GSQL > RUN QUERY sampleEx1()
{
 "error": false,
 "message": "",
 "version": {
   "schema": 0,
   "api": "v2"
 },
 "results": [
   {"beforeSample": [
     {
       "v_id": "company4",
       "attributes": {
         "country": "us",
         "@timesTraversedNoSample": 1,
         "@timesTraversedWithSample": 1,
         "id": "company4"
       },
       "v_type": "company"
     },
     {
       "v_id": "company5",
       "attributes": {
         "country": "can",
         "@timesTraversedNoSample": 1,
         "@timesTraversedWithSample": 1,
         "id": "company5"
       },
       "v_type": "company"
     },
     {
       "v_id": "company3",
       "attributes": {
         "country": "jp",
         "@timesTraversedNoSample": 3,
         "@timesTraversedWithSample": 3,
         "id": "company3"
       },
       "v_type": "company"
     },
     {
       "v_id": "company2",
       "attributes": {
         "country": "chn",
         "@timesTraversedNoSample": 6,
         "@timesTraversedWithSample": 4,
         "id": "company2"
       },
       "v_type": "company"
     },
     {
       "v_id": "company1",
       "attributes": {
         "country": "us",
         "@timesTraversedNoSample": 6,
         "@timesTraversedWithSample": 3,
         "id": "company1"
       },
       "v_type": "company"
     }
   ]},
   {"afterSample": [
     {
       "v_id": "company4",
       "attributes": {
         "country": "us",
         "@timesTraversedNoSample": 1,
         "@timesTraversedWithSample": 1,
         "id": "company4"
       },
       "v_type": "company"
     },
     {
       "v_id": "company5",
       "attributes": {
         "country": "can",
         "@timesTraversedNoSample": 1,
         "@timesTraversedWithSample": 1,
         "id": "company5"
       },
       "v_type": "company"
     },
     {
       "v_id": "company3",
       "attributes": {
         "country": "jp",
         "@timesTraversedNoSample": 3,
         "@timesTraversedWithSample": 3,
         "id": "company3"
       },
       "v_type": "company"
     },
     {
       "v_id": "company2",
       "attributes": {
         "country": "chn",
         "@timesTraversedNoSample": 6,
         "@timesTraversedWithSample": 4,
         "id": "company2"
       },
       "v_type": "company"
     },
     {
       "v_id": "company1",
       "attributes": {
         "country": "us",
         "@timesTraversedNoSample": 6,
         "@timesTraversedWithSample": 3,
         "id": "company1"
       },
       "v_type": "company"
     }
   ]}
 ]
}

由于PRINT语句位于查询的末尾,因此顶点集beforeSample和顶点集afterSample会几乎完全相同。因为它们都显示了累加器@timesTraversedNoSample和累加器@timesTraversedWithSample的最终值。 但它们仍然有一个区别:company3不包含在afterSample中,因为没有任何被选做样本的边到达过company3。

WHERE子句

WHERE子句是一个可选子句,它用于约束FROM和采样子句中指定的边和顶点的范围。

WHERE子句的EBNF范式
whereClause := WHERE condition

WHERE子句使用布尔条件来检测FROM子句获取的集合中的每个顶点或边(如果使用了采样子句,则检测的是采样的顶点和边集)。

如果顶点/边X的表达式求值为false,则将X从结果集中排除。它的表达式可以使用常量或SELECT语句范围内的任何变量或参数。用户可以使用算术运算符(+, - ,*,/,%),比较运算符(==,!=,<,<=,>,> =) ,布尔运算符(AND,OR,NOT),判断运算符(IN,NOT IN)以及括号来加强表达式的优先级逻辑。 WHERE条件表达式可以使用其作用域内的任何变量(全局累加器,顶点集变量,查询输入参数,FROM子句的顶点和边集(或它们的顶点和边的别名),或者任何顶点/边集的属性或累加器。)关于条件的详述,请参阅条件和表达式的EBNF范式一节。

如果用户使用了内置的顶点\边的属性和函数,例如.type和.neighbors(),WHERE子句将会在边的遍历中实施复杂的选择规则。在下面的示例中, FROM子句中没有提到边类型或顶点类型,所有的选择条件都在WHERE子句中指定

WHERE子句作为过滤器
resultSet1 = SELECT v FROM S:v-((E1|E2|E3):e)->(V1|V2):t;
resultSet2 = SELECT v FROM S:v-(:e)->:t
​​​​​ WHERE t.type IN ("V1", "V2") AND
​​​​​​   t IN v.neighbors("E1|E2|E3")

下面的示例中,WHERE子句根据顶点属性限定了对应输出的结果集的内容

基本的SELECT WHERE语法
CREATE QUERY printCatPosts() FOR GRAPH socialNet {
​posts = {post.*};
​catPosts = SELECT v FROM posts:v​​# select only those post vertices
               WHERE v.subject == "cats";  # which have a subset of 'cats'
PRINT catPosts;
}
结果:查询 printCatPosts
GSQL > RUN QUERY printCatPosts()
{
 "error": false,
 "message": "",
 "version": {
   "schema": 0,
   "api": "v2"
 },
 "results": [{"catPosts": [
   {
     "v_id": "10",
     "attributes": {
       "postTime": "2011-02-04 03:02:31",
       "subject": "cats"
     },
     "v_type": "post"
   },
   {
     "v_id": "9",
     "attributes": {
       "postTime": "2011-02-05 23:12:42",
       "subject": "cats"
     },
     "v_type": "post"
   },
   {
     "v_id": "3",
     "attributes": {
       "postTime": "2011-02-05 01:02:44",
       "subject": "cats"
     },
     "v_type": "post"
   },
   {
     "v_id": "11",
     "attributes": {
       "postTime": "2011-02-03 01:02:21",
       "subject": "cats"
     },
     "v_type": "post"
   },
   {
     "v_id": "8",
     "attributes": {
       "postTime": "2011-02-03 17:05:52",
       "subject": "cats"
     },
     "v_type": "post"
   }
 ]}]
}
使用IN运算符编写SELECT WHERE
CREATE QUERY findGraphFocusedPosts() FOR GRAPH socialNet
{
​posts = {post.*};
​results = SELECT v FROM posts:v​​​​​# select only post vertices
​​WHERE v.subject IN ("Graph", "tigergraph");​# which have a subject of either 'Graph' or 'tigergraph'
PRINT results;
}
结果:查询 findGraphFocusedPosts
GSQL > RUN QUERY findGraphFocusedPosts()
{
 "error": false,
 "message": "",
 "version": {
   "schema": 0,
   "api": "v2"
 },
 "results": [{"results": [
   {
     "v_id": "5",
     "attributes": {
       "postTime": "2011-02-06 01:02:02",
       "subject": "tigergraph"
     },
     "v_type": "post"
   },
   {
     "v_id": "1",
     "attributes": {
       "postTime": "2011-03-03 23:02:00",
       "subject": "tigergraph"
     },
     "v_type": "post"
   },
   {
     "v_id": "6",
     "attributes": {
       "postTime": "2011-02-05 02:02:05",
       "subject": "tigergraph"
     },
     "v_type": "post"
   }
 ]}]
}

WHERE NOT 运算符的限制

NOT运算符不能与.type属性选择器结合使用。 要检查边或顶点类型是否不等于给定类型,请使用!=运算符。 详见下面的示例。

以下示例展示了使用WHERE作为类型过滤器的等效语句及其限制。

在SELECT WHERE语句中使用AND或OR
# finds female person in the social network. all of the following statements
# are equivalent (i.e., produce the same results)
CREATE QUERY findFemaleMembers() FOR GRAPH socialNet
{
​allVertices = {ANY}; # includes all posts and person
​females = SELECT v FROM allVertices:v
​​  WHERE v.type   == "person" AND
​​  ​    v.gender != "Male";
 
​females = SELECT v FROM allVertices:v
​​  WHERE v.type   == "person" AND
​​  ​    v.gender == "Female";
 
​females = SELECT v FROM allVertices:v
​​  WHERE v.type       == "person" AND
​​  ​    NOT v.gender == "Male";
 
​females = SELECT v FROM allVertices:v
​​  WHERE v.type       != "post" AND
​​  ​    NOT v.gender == "Male";
 
 ​# does not compile. cannot use NOT operator in combination with type attribute
​#females = SELECT v FROM allVertices:v
​#​  WHERE NOT v.type   != "person" AND
​#​  ​    NOT v.gender == "Male";
 
 ​# does not compile. cannot use NOT operator in combination with type attribute
​#females = SELECT v FROM allVertices:v
​#​  WHERE NOT v.type   == "post" AND
​#​  ​    NOT v.gender == "Male";
 
​personVertices = {person.*};
​females = SELECT v FROM personVertices:v
​​   WHERE NOT v.gender == "Male";
 
​females = SELECT v FROM personVertices:v
​​   WHERE v.gender != "Male";
 
​females = SELECT v FROM personVertices:v
​​   WHERE v.gender != "Male" AND true;
 
​females = SELECT v FROM personVertices:v
​​   WHERE v.gender != "Male" OR false;
     
PRINT females;
}
结果:查询 findFemaleMembers
GSQL > RUN QUERY findFemaleMembers()
{
 "error": false,
 "message": "",
 "version": {
   "schema": 0,
   "api": "v2"
 },
 "results": [{"females": [
   {
     "v_id": "person4",
     "attributes": {
       "gender": "Female",
       "id": "person4"
     },
     "v_type": "person"
   },
   {
     "v_id": "person5",
     "attributes": {
       "gender": "Female",
       "id": "person5"
     },
     "v_type": "person"
   },
   {
     "v_id": "person2",
     "attributes": {
       "gender": "Female",
       "id": "person2"
     },
     "v_type": "person"
   }
 ]}]
}

以下示例使用边属性来确定哪些人员已经注册为某些公司的全职工作人员。

WHERE子句中使用边属性
# find all workers who are full time at some company
CREATE QUERY fullTimeWorkers() FOR GRAPH workNet
{
start = {person.*};
​fullTimeWorkers = SELECT v FROM start:v -(worksFor:e)-> company:t
​​​WHERE e.fullTime;​# fullTime is a boolean attribute on the edge
 
PRINT fullTimeWorkers;
}
fullTimeWorkers 的结果
GSQL > RUN QUERY fullTimeWorkers()
{
 "error": false,
 "message": "",
 "version": {
   "schema": 0,
   "api": "v2"
 },
 "results": [{"fullTimeWorkers": [
   {
     "v_id": "person4",
     "attributes": {
       "interestList": ["football"],
       "skillSet": [ 10, 1, 4 ],
       "skillList": [ 4, 1, 10 ],
       "locationId": "us",
       "interestSet": ["football"],
       "id": "person4"
     },
     "v_type": "person"
   },
   {
     "v_id": "person11",
     "attributes": {
       "interestList": [ "sport", "football" ],
       "skillSet": [10],
       "skillList": [10],
       "locationId": "can",
       "interestSet": [ "football", "sport" ],
       "id": "person11"
     },
     "v_type": "person"
   },
   {
     "v_id": "person10",
     "attributes": {
       "interestList": [ "football", "sport" ],
       "skillSet": [3],
       "skillList": [3],
       "locationId": "us",
       "interestSet": [ "sport", "football" ],
       "id": "person10"
     },
     "v_type": "person"
   },
   {
     "v_id": "person1",
     "attributes": {
       "interestList": [ "management", "financial" ],
       "skillSet": [ 3, 2, 1 ],
       "skillList": [ 1, 2, 3 ],
       "locationId": "us",
       "interestSet": [ "financial", "management" ],
       "id": "person1"
     },
     "v_type": "person"
   },
   {
     "v_id": "person6",
     "attributes": {
       "interestList": [ "music", "art" ],
       "skillSet": [ 10, 7 ],
       "skillList": [ 7, 10 ],
       "locationId": "jp",
       "interestSet": [ "art", "music" ],
       "id": "person6"
     },
     "v_type": "person"
   },
   {
     "v_id": "person2",
     "attributes": {
       "interestList": ["engineering"],
       "skillSet": [ 6, 5, 3, 2 ],
       "skillList": [ 2, 3, 5, 6 ],
       "locationId": "chn",
       "interestSet": ["engineering"],
       "id": "person2"
     },
     "v_type": "person"
   },
   {
     "v_id": "person8",
     "attributes": {
       "interestList": ["management"],
       "skillSet": [ 2, 5, 1 ],
       "skillList": [ 1, 5, 2 ],
       "locationId": "chn",
       "interestSet": ["management"],
       "id": "person8"
     },
     "v_type": "person"
   },
   {
     "v_id": "person12",
     "attributes": {
       "interestList": [
         "music",
         "engineering",
         "teaching",
         "teaching",
         "teaching"
       ],
       "skillSet": [ 2, 5, 1 ],
       "skillList": [ 1, 5, 2, 2, 2 ],
       "locationId": "jp",
       "interestSet": [ "teaching", "engineering", "music" ],
       "id": "person12"
     },
     "v_type": "person"
   },
   {
     "v_id": "person3",
     "attributes": {
       "interestList": ["teaching"],
       "skillSet": [ 6, 1, 4 ],
       "skillList": [ 4, 1, 6 ],
       "locationId": "jp",
       "interestSet": ["teaching"],
       "id": "person3"
     },
     "v_type": "person"
   },
   {
     "v_id": "person9",
     "attributes": {
       "interestList": [ "financial", "teaching" ],
       "skillSet": [ 2, 7, 4 ],
       "skillList": [ 4, 7, 2 ],
       "locationId": "us",
       "interestSet": [ "teaching", "financial" ],
       "id": "person9"
     },
     "v_type": "person"
   }
 ]}]
}

如果在边引导的SELECT操作中指定了多个边的类型,则WHERE子句应使用OR来分隔每个边类型或每个目标顶点类型。 例如,

WHERE子句中包含多个边类
CREATE QUERY multipleEdgeTypeWhereEx(vertex<person> m1) FOR GRAPH socialNet {
 allUser = {m1};
 FilteredUser = SELECT s
     FROM allUser:s - ((posted|liked|friend):e) -> (post|person):t
     # WHERE e.actionTime > epoch_to_datetime(1) AND t.gender == "Male";
     WHERE ( e.type == "liked" AND e.actionTime > epoch_to_datetime(1) ) OR
           ( e.type == "friend" AND t.gender == "Male" )
           ;
 PRINT FilteredUser;
}

上述查询是可编译的。 但是,如果我们选择使用第5行的WHERE子句(当前被注释掉),则该查询不可编译。 边类冲突检查会检测到错误,因为我们同时使用了“liked”边的属性和“friend”边的属性,但没有通过OR将它们分开。

ACCUM和POST-ACCUM子句

ACCUM和POST-ACCUM子句是可选的。 在FROM,SAMPLE和WHERE子句输出结果顶点集或边集后,可以使用ACCUM和POST-ACCUM子句在结果集上进行复杂的聚合和其他计算。一个查询可以包含它们中的一个或两个都包含。 ACCUM子句中的语句适用于边引导SELECT操作中的每个边,或顶点引导的选择操作中的每个顶点。

如果ACCUM子句中有多个语句,则语句之间用逗号分隔,并按顺序为每个选定的元素执行。TigerGraph系统使用并行计算来提高性能,即在ACCUM子句中,每条边都由一个单独的进程处理。因此,该并行处理没有固定顺序。ACCUM子句中的边处理不应视为顺序执行。 累加器是在这些进程中共享的互斥变量。在所有边都完成遍历之前,ACCUM子句中任何累积结果都不完整。所以对ACCUM的中间结果的任何检查的存在意义并不大,因为数据不完整。

对于给定的顶点或边,ACCUM子句中的语句是按顺序执行的。 但是,针对顶点集或边集,则没有固定的处理顺序。

POST-ACCUM子句也是可选的。它允许聚合计算和其他计算在先前子句选择的顶点集(但不是边)上进行。 POST-ACCUM可以在没有ACCUM的情况下使用。 但如果它前面是ACCUM子句,那么它可以用于二阶段累积计算:ACCUM中的累积计算是第一阶段, POST-ACCUM中的累积计算为第二阶段。

从v1.1开始,关键字POST-ACCUM中的短横也可写为下划线:POST_ACCUM。

POST-ACCUM子句中的每个语句都可以引用源顶点或目标顶点,但不能同时引用两者。

在边引导SELECT操作中,由于ACCUM子句在边上迭代,并且通常两个边会连接到相同的源顶点或相同的目标顶点,因此ACCUM子句可以针对一个顶点重复多次地执行。

如果希望某个操作只在每个顶点上只跑一次,那应该使用POST-ACCUM子句。

ACCUM或POST-ACCUM子句的主要目的是通过更新累加器值(通过+=或=)来收集有关图形的信息。 有关+=操作的详细信息,请参阅“累加器”章节。 但是,我们同时也允许其他类型的语句(例如,分支,迭代,本地赋值等),以支持更加复杂的计算或记录。 下面的EBNF范式定义了可以在ACCUM或POST-ACCUM中使用的语句。 DMLSubStmt列表类似于queryBodyStmt列表,该列表适用于SELECT语句之外的语句; 但请着重关注两者的差别。 每个语句的类型都在对应的章节中有详细解释。

ACCUM和POST-ACCUM子句的EBNF范式:
accumClause := ACCUM DMLSubStmtList
 
postAccumClause := POST-ACCUM DMLSubStmtList
 
DMLSubStmtList := DMLSubStmt ["," DMLSubStmt]*
 
DMLSubStmt := assignStmt           // Assignment (including vertex-attached accumulate)
           | funcCallStmt         // Function Call
           | gAccumAccumStmt      // Assignment (global accumulate)
           | vAccumFuncCall       // Function Call
           | localVarDeclStmt     // Declaration
           | DMLSubCaseStmt       // Control Flow
           | DMLSubIfStmt         // Control Flow
           | DMLSubWhileStmt      // Control Flow
           | DMLSubForEachStmt    // Control Flow
           | BREAK                // Control Flow
           | CONTINUE             // Control Flow
           | insertStmt           // Data Modification
           | DMLSubDeleteStmt     // Data Modification
           | printlnStmt          // Output
           | logStmt              // Output

请注意,DML子句不包括全局累加器赋值语句(gAccumAssignStmt),而是包含全局累加器累加语句(gAccumAccumStmt)。 全局累加器可以在这些子句中执行累加操作“+=”,但不做赋值操作“=”。

DML子句还有其他限制:

  • 在ACCUM或POST-ACCUM子句中允许全局变量赋值,但在查询完成之前不会发生值的更改。 因此,如果同一变量有多个赋值语句,则只有最后一个赋值语句才会生效。

  • ACCUM子句中不允许使用顶点属性赋值操作:“=”。 但是,允许边属性的赋值操作。 这是因为ACCUM子句会在边集上迭代。 POST-ACCUM子句中允许使用顶点属性赋值。 与所有数据更新规则一样,对应的值在查询完成之前不会变化。

别名和ACCUM/POST-ACCUM迭代模型

要引用选出的结果集的每个元素,请使用FROM子句中定义的它们的别名。 例如,假设我们有以下别名:

例:顶点和边的别名
FROM Source:s -(edgeTypes:e)-> targetTypes:t # edge-induced selection
FROM Source:v                                # vertex-induced selection

令(V1,V2,... Vn)为顶点引导SELECT操作中的顶点。 则以下伪代码模拟ACCUM子句行为。

ACCUM在顶点引导的选择操作中的行为的模型
FOREACH v in (V1,V2,...Vn) DO # iterations may occur in parallel, in unknown order
 DMLSubStmts referencing v
DONE

令E =(E1,E2,...... En)为边引导SELECT操作的结果集合中的边。 此外,令S =(S1,S1,... Sn)和T =(T1,T2,... Tn)是源顶点和目标顶点的多个集的合集(Bag),其对应于之前的边集。 S和T是Bag,因为它们可以包含重复的元素。

ACCUM在边引导SELECT操作中的行为的模型
FOREACH i in (1..n) DO # iterations may occur in parallel, in unknown order
 DMLSubStmts referencing e, s, t, which really means e_i, s_i, t_i
DONE

请注意,任何对源顶点的别名s或目标顶点的别名t的引用都是针对当前边的两端的顶点。

类似地,POST-ACCUM子句的作用类似于SELECT子句中指定的顶点结果集的FOREACH循环(例如,S或T)。

边/顶点类的推导和冲突

如果在边引导SELECT操作中指定了多个边类,则ACCUM子句中的每个ACCUM语句都会检查边类是否存在冲突。 如果只有某些边类在ACCUM语句中有效,则该语句不会在在其他边类上执行。 例如:

多重边类中的ACCUM语句检查
CREATE QUERY multipleEdgeTypeCheckEx(vertex<person> m1) FOR GRAPH socialNet {
 ListAccum<STRING> @@testList1, @@testList2, @@testList3;
 allUser = {m1};
 allUser = SELECT s
        FROM allUser:s - ((posted|liked|friend):e) -> (post|person):t
        ACCUM @@testList1 += to_string(datetime_to_epoch(e.actionTime))
             ,@@testList2 += t.gender
            #,@@testList3 += to_string(datetime_to_epoch(e.actionTime)) + t.gender # illegal
              ;
 PRINT @@testList1, @@testList2, @@testList3;
}

在上面的例子中,第6行仅在“liked”边上执行,因为“actionTime”仅仅是“liked”边的属性。 类似地,第7行仅在“friend”边上执行,因为“gender”仅仅是“person”的属性,并且只有“friend”边使用了“person”作为目标顶点。 同时,第8行会导致编译错误,因为它使用多个边,其中某些边在语句的部分区域不被支持,即“liked”边没有t.gender而“friend”边没有e.actionTime。

我们强烈建议,如果在边引导SELECT操作中指定了多个边类型,则ACCUM子句应使用CASE语句(详见“控制流语句”一章)来分隔每个边类型或每个目标顶点类型(或它们的组合)。 边类冲突检查完成后,会根据条件检查每个THEN/ELSE语句块内的ACCUM语句。 例如,

例:多重边类中的ACCUM语句检查 - 2
CREATE QUERY multipleEdgeTypeCheckEx2(vertex<person> m1) FOR GRAPH socialNet {
 ListAccum<STRING> @@testList1;
 allUser = {m1};
 allUser = SELECT s
           FROM allUser:s - ((posted|liked|friend):e) -> (post|person):t
           ACCUM CASE
                   WHEN e.type == "liked" THEN    # for liked edges
                     @@testList1 += to_string(datetime_to_epoch(e.actionTime))
                   WHEN e.type == "friend" THEN   # for friend edges
                     @@testList1 += t.gender
                   ELSE      # For the remained edge type, which is posted edges
                     @@testList1 += to_string(datetime_to_epoch(t.postTime))
                 END
           ;
 PRINT @@testList1;
}

以上查询是可编译的。 但是,如果我们把第8行和第10行交换位置,则边类冲突检查会报告错误,因为“liked”边不支持t.gender,而“friend”边缘不支持e.actionTime。

与ACCUM子句类似,如果在边引导SELECT操作中指定了多个源/目标顶点的类型,并且POST-ACCUM子句会访问这些源/目标顶点,则POST-ACCUM子句中的每个ACCUM语句都会检查这些源/目标顶点的类型是否存在冲突。 如果在POST-ACCUM语句中只有源/目标顶点类型中的某些顶点有效,则不会在其他类型的源/目标顶点上执行此语句。

与ACCUM子句类似,我们强烈建议如果在边引导的选择操作中指定了多个源/目标顶点类型,并且POST-ACCUM子句访问了这些源/目标顶点,则POST-ACCUM子句应使用CASE语句(详见“控制流语句”一章) 分隔对每个源/目标顶点类型的操作。 然后,顶点类冲突检查根据条件检查每个THEN/ELSE语句块内的ACCUM语句。

附属于顶点的累加器的更新规则

在v1.0之前,只能在ACCUM或POST-ACCUM子句中更新附属于顶点的累加器,并且只有在前面的FROM, SAMPLE或WHERE子句选择了它的顶点时才能更新。

从v1.0开始,我们增加了一些附属于顶点的累加器可以更新的情况。 如果某些顶点被之前某些已被选中顶点的附属于顶点的累加器所引用,那么这些顶点就可以在ACCUM子句中更新它们自己的附属于顶点的累加器(但不能在POST-ACCUM子句中更新)。 也就是说,一个顶点如果被之前选中的顶点所引用,则它就会被更新,但仍然有一些限制。下面的 一些例子将有助于理解这种更加复杂的情况。

  • 假设某个查询声明了一个保存了顶点信息的附属于顶点的累加器。 我们称之为顶点保持累加器(vertex-holdingaccumulator)。 它可以采取几种形式:

    • 标量累加器,例如MaxAccum <VERTEX> @maxV;

    • 集合累加器:例如,ListAccum <VERTEX> @listV;

    • 包含元组的累加器,其中元组类型包含VERTEX字段。

  • 如果一个顶点V被选中,则不仅可以更新V的累加器,而且还可以在ACCUM子句中,更新存储在V顶点的顶点保持累加器中的顶点。

  • 在使用这些间接引用的顶点之前,需要激活它们。 有两种方法可以激活间接顶点:

    • 首先将顶点保持累加器的顶点分配给局部顶点变量。 然后可以通过局部顶点变量更新顶点。

ACCUM
 VERTEX<person> mx = tgt.@maxV,   # assign to local variable
         mx.@curId += src.id      # access via local variable
  • 一个FOREACH循环可以迭代顶点保持累加器。 顶点当下可以通过循环变量被更新。

ACCUM
 FOREACH vtx IN src.@setIds DO   # iterate on collection accumulator
     vtx.@curId += tgt.id        # access via loop variable
 END

下面的使用方法在新规则下已作废

  • 间接激活的顶点可能无法在POST-ACCUM子句中或SELECT语句之外更新。

  • 将顶点作为输入参数传递给查询不是激活的途径。

  • 使用全局顶点保持累加器不是激活的途径。

  • 如果通过把顶点赋值给局部变量(例如,在ACCUM或POST-ACCUM中声明的变量)的方式间接激活顶点,请注意以下规则,该规则始终适用于所有局部变量:

    • 声明和初始化局部变量的动作只能在ACCUM语句块中做一次。 它不能在ACCUM语句块中重新声明或重新赋值。

下面的查询演示了如何更新间接激活的顶点

更新一个间接引用的顶点
CREATE QUERY vUpdateIndirectAccum() FOR GRAPH socialNet {
 
 SetAccum<VERTEX<person>> @posters;
 SetAccum<VERTEX<person>> @fellows;
 
  Persons = {person.*};
  # To each post, attach a list of persons who liked the post
  likedPosts = SELECT p
      FROM Persons:src -(liked:e)-> post:p
      ACCUM
      ​p.@posters += src;
 
   # To each person who liked a post, attach a list of everyone
   # who also liked one of this person's liked posts.
​likedPosts = SELECT src
​​FROM likedPosts:src
​​ACCUM
​​  FOREACH v IN src.@posters DO
​​    v.@fellows += src.@posters
​​  END
       ORDER BY src.subject;
​​    
​PRINT Persons[Persons.@fellows];
}
结果: 查询 vUpdateIndirectAccums
GSQL > RUN QUERY vUpdateIndirectAccess()
{
 "error": false,
 "message": "",
 "version": {
   "schema": 0,
   "api": "v2"
 },
 "results": [{"Persons": [
   {
     "v_id": "person4",
     "attributes": {"Persons.@fellows": [
       "person8",
       "person4"
     ]},
     "v_type": "person"
   },
   {
     "v_id": "person3",
     "attributes": {"Persons.@fellows": [ "person2", "person1", "person3" ]},
     "v_type": "person"
   },
   {
     "v_id": "person7",
     "attributes": {"Persons.@fellows": ["person7"]},
     "v_type": "person"
   },
   {
     "v_id": "person1",
     "attributes": {"Persons.@fellows": [ "person2", "person1", "person3" ]},
     "v_type": "person"
   },
   {
     "v_id": "person5",
     "attributes": {"Persons.@fellows": ["person5"]},
     "v_type": "person"
   },
   {
     "v_id": "person6",
     "attributes": {"Persons.@fellows": ["person6"]},
     "v_type": "person"
   },
   {
     "v_id": "person2",
     "attributes": {"Persons.@fellows": [ "person2", "person1", "person3" ]},
     "v_type": "person"
   },
   {
     "v_id": "person8",
     "attributes": {"Persons.@fellows": [ "person8", "person4" ]},
     "v_type": "person"
   }
 ]}]
}

ACCUM和POST-ACCUM的例子

我们现在展示几个例子。 它们演示了如何使用ACCUM或POST-ACCUM来计算给定集合中的顶点数。

Accum 和PostAccum 的语法
#Show Accum PostAccum Behavior
CREATE QUERY accumPostAccumSemantics() FOR GRAPH workNet {
 
 SumAccum<INT> @@vertexOnlyAccum;
 SumAccum<INT> @@vertexOnlyPostAccum;
 
 SumAccum<INT> @@vertexOnlyWhereAccum;
 SumAccum<INT> @@vertexOnlyWherePostAccum;
 
 SumAccum<INT> @@sourceWithEdgeAccum;
 SumAccum<INT> @@sourceWithEdgePostAccum;
 
 SumAccum<INT> @@targetWithEdgeAccum;
 SumAccum<INT> @@targetWithEdgePostAccum;
 
 #Seed start set with all company vertices
 start = {company.*};
 
 #Select all vertices in source set start
 selectVertexSet = SELECT v from start:v
​​​​​#Happens once for each vertex discovered
​​​​​ACCUM @@vertexOnlyAccum += 1
 
​​​​​#Happens once for each vertex in the result set "v"
​​​​​POST-ACCUM @@vertexOnlyPostAccum += 1;
 
 #Select all vertices in source set start with a where constraint
 selectVertexSetWhere = SELECT v from start:v WHERE (v.country == "us")
​​​​​​#Happens once for each vertex discovered that also
​​​​​​# meets the constraint condition
​​​​​​ACCUM @@vertexOnlyWhereAccum += 1
​​​
​​​​​​#Happens once for each vertex in the result set "v"
​​​​​​POST-ACCUM @@vertexOnlyWherePostAccum += 1;
 
 #Select all source "s" vertices in set start and explore all "worksFor" edge paths
 selectSourceWithEdge = SELECT s from start:s -(worksFor)-> :t
​​  ​​       ​ #Happens once for each "worksFor" edge discovered
​​​​​​ ACCUM @@sourceWithEdgeAccum += 1
 
​​​​​​#Happens once for each vertex in result set "s" (source)
​​​​​​POST-ACCUM @@sourceWithEdgePostAccum += 1;
 
 #Select all target "t" vertices found from exploring all "worksFor" edge paths from set start
 selectTargetWithEdge = SELECT t from start:s -(worksFor)-> :t
​​​​​​ #Happens once for each "worksFor" edge discovered​
​​​​​​ ACCUM @@targetWithEdgeAccum += 1
 
​​​​​​ #Happens once for each vertex in result set "t" (target)
​​​​​​ POST-ACCUM @@targetWithEdgePostAccum += 1;
 
 PRINT @@vertexOnlyAccum;
 PRINT @@vertexOnlyPostAccum;
 
 PRINT @@vertexOnlyWhereAccum;
 PRINT @@vertexOnlyWherePostAccum;
 
 PRINT @@sourceWithEdgeAccum;
 PRINT @@sourceWithEdgePostAccum;
 
 PRINT @@targetWithEdgeAccum;
 PRINT @@targetWithEdgePostAccum;
}
accumPostAccumSemantics 的结果
GSQL > RUN QUERY accumPostAccumSemantics()
{
 "error": false,
 "message": "",
 "version": {
   "schema": 0,
   "api": "v2"
 },
 "results": [
   {"@@vertexOnlyAccum": 5},
   {"@@vertexOnlyPostAccum": 5},
   {"@@vertexOnlyWhereAccum": 2},
   {"@@vertexOnlyWherePostAccum": 2},
   {"@@sourceWithEdgeAccum": 17},
   {"@@sourceWithEdgePostAccum": 5},
   {"@@targetWithEdgeAccum": 17},
   {"@@targetWithEdgePostAccum": 12}
 ]
}

此示例使用ACCUM语句查找用户发布的所有主题。

顶点ACCUM的例子
# For each person, make a list of all their post subjects
CREATE QUERY userPosts() FOR GRAPH socialNet {
 ListAccum<STRING> @personPosts;
 start = {person.*};
 
 # Find all user post topics and append them to the vertex list accum
 userPostings = SELECT s FROM start:s -(posted)-> :g
                ACCUM s.@personPosts += g.subject;
 
 PRINT userPostings;
}
查询 userPosts的结果
GSQL > RUN QUERY userPosts()
{
 "error": false,
 "message": "",
 "version": {
   "schema": 0,
   "api": "v2"
 },
 "results": [{"userPostings": [
   {
     "v_id": "person4",
     "attributes": {
       "gender": "Female",
       "@personPosts": ["cats"],
       "id": "person4"
     },
     "v_type": "person"
   },
   {
     "v_id": "person3",
     "attributes": {
       "gender": "Male",
       "@personPosts": ["query languages"],
       "id": "person3"
     },
     "v_type": "person"
   },
   {
     "v_id": "person7",
     "attributes": {
       "gender": "Male",
       "@personPosts": [ "cats", "tigergraph" ],
       "id": "person7"
     },
     "v_type": "person"
   },
   {
     "v_id": "person1",
     "attributes": {
       "gender": "Male",
       "@personPosts": ["Graphs"],
       "id": "person1"
     },
     "v_type": "person"
   },
/*** other vertices omitted ***/
 ]}]
}

此示例展示每个人自己发布的顶点(即personposts顶点)以及每个人表达喜爱的行为方式(即liked边)。

ACCUM<VERTEX> 和ACCUM<EDGE>的例子
# Show each user's post and liked post time
CREATE QUERY userPosts2() FOR GRAPH socialNet {
 ListAccum<VERTEX> @personPosts;
 ListAccum<EDGE> @personLikedInfo;
 start = {person.*};
 
 # Find all user post topics and append them to the vertex list accum
 userPostings = SELECT s FROM start:s -(posted)-> :g
                ACCUM s.@personPosts += g;
 
 userPostings = SELECT s from start:s -(liked:e)-> :g
                ACCUM s.@personLikedInfo += e;
 
 PRINT start;
}
查询userPosts2的结果
GSQL > RUN QUERY userPosts2()
{
 "error": false,
 "message": "",
 "version": {
   "schema": 0,
   "api": "v2"
 },
 "results": [{"start": [
   {
     "v_id": "person4",
     "attributes": {
       "gender": "Female",
       "@personPosts": ["3"],
       "id": "person4",
       "@personLikedInfo": [{
         "from_type": "person",
         "to_type": "post",
         "directed": true,
         "from_id": "person4",
         "to_id": "4",
         "attributes": {"actionTime": "2010-01-13 03:16:05"},
         "e_type": "liked"
       }]
     },
     "v_type": "person"
   },
   {
     "v_id": "person7",
     "attributes": {
       "gender": "Male",
       "@personPosts": [ "9", "6" ],
       "id": "person7",
       "@personLikedInfo": [{
         "from_type": "person",
         "to_type": "post",
         "directed": true,
         "from_id": "person7",
         "to_id": "10",
         "attributes": {"actionTime": "2010-01-12 11:22:05"},
         "e_type": "liked"
       }]
     },
     "v_type": "person"
   },
   {
     "v_id": "person1",
     "attributes": {
       "gender": "Male",
       "@personPosts": ["0"],
       "id": "person1",
       "@personLikedInfo": [{
         "from_type": "person",
         "to_type": "post",
         "directed": true,
         "from_id": "person1",
         "to_id": "0",
         "attributes": {"actionTime": "2010-01-11 11:32:00"},
         "e_type": "liked"
       }]
     },
     "v_type": "person"
   },
/*** other vertices omitted ***/
 ]}]
}

这个示例计算了每个话题(topic)被用过的次数总和。

全局ACCUM的例子
# Show number of total posts by topic
CREATE QUERY userPostsByTopic() FOR GRAPH socialNet {
 MapAccum<STRING, INT> @@postTopicCounts;
 start = {person.*};
 
 # Append subject and update the appearance count in the global map accum
 posts = SELECT g FROM start -(posted)-> :g
​​  ACCUM @@postTopicCounts += (g.subject -> 1);
 
 PRINT @@postTopicCounts;
}
查询userPostsByTopic的结果
GSQL > RUN QUERY userPostsByTopic()
{
 "error": false,
 "message": "",
 "version": {
   "schema": 0,
   "api": "v2"
 },
 "results": [{"@@postTopicCounts": {
   "cats": 5,
   "coffee": 1,
   "query languages": 1,
   "Graphs": 2,
   "tigergraph": 3
 }}]
}

这是一个使用ACCUM和POST-ACCUM的示例。示例中, ACCUM遍历了整个图,并查找在同一国家生活和工作的所有人。在确定了这些人之后,POST-ACCUM检查每个顶点(即每个person)以确认他们是否在他们居住的地方工作。

顶点POST-ACCUM的例子
#Show all person who both work and live in the same country
CREATE QUERY residentEmployees() FOR GRAPH workNet {
 
 ListAccum<STRING> @company;
 OrAccum @worksAndLives;
 
 start = {person.*};
 
 
 employees = SELECT s FROM start:s -(worksFor)-> :c
             #If a person works for a company in the same country where they live
             # add the company to the list
             ACCUM CASE WHEN (s.locationId == c.country) THEN
                          s.@company += c.id
                        END
 
             #Check each vertex and see if a person works where they live
             POST-ACCUM CASE WHEN (s.@company.size() > 0) THEN
                          s.@worksAndLives += True
                        ELSE
                          s.@worksAndLives += False
                        END;
 
 PRINT employees WHERE (employees.@worksAndLives == True);
}
residentEmployees 的结果
GSQL > RUN QUERY residentEmployees()
{
 "error": false,
 "message": "",
 "version": {
   "schema": 0,
   "api": "v2"
 },
 "results": [{"employees": [
   {
     "v_id": "person11",
     "attributes": {
       "interestList": [
         "sport",
         "football"
       ],
       "skillSet": [10],
       "skillList": [10],
       "@worksAndLives": true,
       "locationId": "can",
       "interestSet": [ "football", "sport" ],
       "id": "person11",
       "@company": ["company5"]
     },
     "v_type": "person"
   },
   {
     "v_id": "person10",
     "attributes": {
       "interestList": [ "football", "sport" ],
       "skillSet": [3],
       "skillList": [3],
       "@worksAndLives": true,
       "locationId": "us",
       "interestSet": [ "sport", "football" ],
       "id": "person10",
       "@company": ["company1"]
     },
     "v_type": "person"
   },
   {
     "v_id": "person1",
     "attributes": {
       "interestList": [ "management", "financial" ],
       "skillSet": [ 3, 2, 1 ],
       "skillList": [ 1, 2, 3 ],
       "@worksAndLives": true,
       "locationId": "us",
       "interestSet": [ "financial", "management" ],
       "id": "person1",
       "@company": ["company1"]
     },
     "v_type": "person"
   },
   {
     "v_id": "person2",
     "attributes": {
       "interestList": ["engineering"],
       "skillSet": [ 6, 5, 3, 2 ],
       "skillList": [ 2, 3, 5, 6 ],
       "@worksAndLives": true,
       "locationId": "chn",
       "interestSet": ["engineering"],
       "id": "person2",
       "@company": ["company2"]
     },
     "v_type": "person"
   }
 ]}]
}

这是POST-ACCUM的一个示例,仅计算具有特定性别的人数。

全局POST-ACCUM的示例
#Count the number of person of​a given​gender
CREATE QUERY personGender(STRING gender) FOR GRAPH socialNet {
 
 SumAccum<INT> @@genderCount;
 
 start = {ANY};
 
 # Select all person vertices and check the gender attribute
 friends = SELECT v FROM start:v
           WHERE v.type == "person"
 
           POST-ACCUM CASE WHEN (start.gender == gender) THEN
                        @@genderCount += 1
                      END;
 
 PRINT @@genderCount;
}
查询personGender的结果
GSQL > RUN QUERY personGender("Female")
{
 "error": false,
 "message": "",
 "version": {
   "schema": 0,
   "api": "v2"
 },
 "results": [{"@@genderCount": 3}]
}

HAVING子句

HAVING子句是可选的。它对SELECT的结果集进行约束。 与WHERE子句不同,HAVING语句中的约束在ACCUM和POST-ACCUM操作之后应用,而后者在ACCUM和POST-ACCUM操作之前应用。

HAVING子句的EBNF范式
havingClause := HAVING condition

只有当存在ACCUM或POST-ACCUM子句时,用户才能选用HAVING子句。 对应的筛选条件应用于SELECT结果集中的每个顶点(源顶点或目标顶点)之上,这些顶点也必须满足FROM和WHERE条件。 HAVING子句用于测试在ACCUM或POST-ACCUM子句中更新了的一个或多个累加器变量,它可以是布尔值中的一个。 如果某个顶点的判断结果为假,则该顶点从结果集中排除。

以下示例演示如何使用HAVING子句,根据在ACCUM子句中更新的顶点累加器变量来约束结果集。

例1. HAVING
# find all persons meeting a given activityThreshold, based on how many posts or likes a person has made
CREATE QUERY activeMembers(int activityThreshold) FOR GRAPH socialNet
{
       SumAccum<int> @activityAmount;
       start = {person.*};
       result = SELECT v FROM start:v -(:e)-> post:tgt
                         ACCUM v.@activityAmount +=1
​                      HAVING v.@activityAmount >= activityThreshold;
       PRINT result;
}

如果activityThreshold 参数被设置为3,则该查询的结果返回为5个顶点。

例 1 的结果
GSQL > RUN QUERY activeMembers(3)
{
 "error": false,
 "message": "",
 "version": {
   "schema": 0,
   "api": "v2"
 },
 "results": [{"result": [
   {
     "v_id": "person7",
     "attributes": {
       "gender": "Male",
       "@activityAmount": 3,
       "id": "person7"
     },
     "v_type": "person"
   },
   {
     "v_id": "person5",
     "attributes": {
       "gender": "Female",
       "@activityAmount": 3,
       "id": "person5"
     },
     "v_type": "person"
   },
   {
     "v_id": "person6",
     "attributes": {
       "gender": "Male",
       "@activityAmount": 3,
       "id": "person6"
     },
     "v_type": "person"
   },
   {
     "v_id": "person2",
     "attributes": {
       "gender": "Female",
       "@activityAmount": 3,
       "id": "person2"
     },
     "v_type": "person"
   },
   {
     "v_id": "person8",
     "attributes": {
       "gender": "Male",
       "@activityAmount": 3,
       "id": "person8"
     },
     "v_type": "person"
   }
 ]}]
}

如果activityThreshold参数设置为2,则查询将返回8个顶点。 若改成 4,则查询将不返回任何顶点。

以下示例演示了SELECT语句的等效性,其中HAVING子句的条件判断始终为真。

例2:HAVING与条件的使用
# find all person meeting a given activityThreshold, based on how many posts or likes a person has made
CREATE QUERY printMemberActivity() FOR GRAPH socialNet
{
       SumAccum<int> @activityAmount;
       start = {person.*};
 
​​### --- equivalent statements -----
       result = SELECT v FROM start:v -(:e)-> post:tgt
​​​​​      ACCUM v.@activityAmount +=1
​​​​​      HAVING true;
 
​​result = SELECT v FROM start:v -(:e)-> post:tgt
​​​​​      ACCUM v.@activityAmount +=1;
​​### -----
 
       PRINT result;
}
查询printMemberActivity的结果
GSQL > RUN QUERY printMemberActivity()
{
 "error": false,
 "message": "",
 "version": {
   "schema": 0,
   "api": "v2"
 },
 "results": [{"result": [
   {
     "v_id": "person4",
     "attributes": {
       "gender": "Female",
       "@activityAmount": 4,
       "id": "person4"
     },
     "v_type": "person"
   },
   {
     "v_id": "person3",
     "attributes": {
       "gender": "Male",
       "@activityAmount": 4,
       "id": "person3"
     },
     "v_type": "person"
   },
   {
     "v_id": "person7",
     "attributes": {
       "gender": "Male",
       "@activityAmount": 6,
       "id": "person7"
     },
     "v_type": "person"
   },
   {
     "v_id": "person1",
     "attributes": {
       "gender": "Male",
       "@activityAmount": 4,
       "id": "person1"
     },
     "v_type": "person"
   },
   {
     "v_id": "person5",
     "attributes": {
       "gender": "Female",
       "@activityAmount": 6,
       "id": "person5"
     },
     "v_type": "person"
   },
   {
     "v_id": "person6",
     "attributes": {
       "gender": "Male",
       "@activityAmount": 6,
       "id": "person6"
     },
     "v_type": "person"
   },
   {
     "v_id": "person2",
     "attributes": {
       "gender": "Female",
       "@activityAmount": 6,
       "id": "person2"
     },
     "v_type": "person"
   },
   {
     "v_id": "person8",
     "attributes": {
       "gender": "Male",
       "@activityAmount": 6,
       "id": "person8"
     },
     "v_type": "person"
   }
 ]}]
}

以下示例展示了等效的WHERE与HAVING的约束后结果集。前文中提到过,用户需要在ACCUM之前使用WHERE子句,或在ACCUM之后使用HAVING子句。 两者都基于顶点必须满足的条件来约束结果集。

例 3. HAVING vs. WHERE
# Compute the total post activity for each male person.
# Because the gender of the vertex does not change, evaluating whether the person vertex
# is male before (WHERE) the ACCUM clause or after (HAVING) the ACCUM clause does not
# change the result. However, if the condition in the HAVING clause could change within
# the ACCUM clause, these statements would produce different results.
 
CREATE QUERY activeMaleMembers() FOR GRAPH socialNet
{
   SumAccum<INT> @activityAmount;
   start = {person.*};
 
   ### --- statements produce equivalent results
   result1 = SELECT v FROM start:v -(:e)-> post:tgt
                     WHERE v.gender == "Male"
                     ACCUM v.@activityAmount +=1;
 
   result2 = SELECT v FROM start:v -(:e)-> post:tgt
                     ACCUM v.@activityAmount +=1
                     HAVING v.gender == "Male";
 
   PRINT result2[result2.@activityAmount];
   PRINT result2[result2.@activityAmount];
}
查询ActiveMaleMembers的结果
GSQL > RUN QUERY activeMaleMembers()
{
 "error": false,
 "message": "",
 "version": {
   "schema": 0,
   "api": "v2"
 },
 "results": [
   {"result2": [
     {
       "v_id": "person3",
       "attributes": {"result2.@activityAmount": 4},
       "v_type": "person"
     },
     {
       "v_id": "person7",
       "attributes": {"result2.@activityAmount": 6},
       "v_type": "person"
     },
     {
       "v_id": "person1",
       "attributes": {"result2.@activityAmount": 4},
       "v_type": "person"
     },
     {
       "v_id": "person6",
       "attributes": {"result2.@activityAmount": 6},
       "v_type": "person"
     },
     {
       "v_id": "person8",
       "attributes": {"result2.@activityAmount": 6},
       "v_type": "person"
     }
   ]},
   {"result2": [
     {
       "v_id": "person3",
       "attributes": {"result2.@activityAmount": 4},
       "v_type": "person"
     },
     {
       "v_id": "person7",
       "attributes": {"result2.@activityAmount": 6},
       "v_type": "person"
     },
     {
       "v_id": "person1",
       "attributes": {"result2.@activityAmount": 4},
       "v_type": "person"
     },
     {
       "v_id": "person6",
       "attributes": {"result2.@activityAmount": 6},
       "v_type": "person"
     },
     {
       "v_id": "person8",
       "attributes": {"result2.@activityAmount": 6},
       "v_type": "person"
     }
   ]}
 ]
}

以下示例包含编译错误,因为结果集取自源顶点,但HAVING语句的条件正在检查目标顶点。

例 4. HAVING 语句中包含错误的顶点集
# find all person having a post subject about cats
# This query is illegal because the having condition is testing the wrong vertex set
CREATE QUERY printMemberAboutCats() FOR GRAPH socialNet
{
       start = {person.*};
 
       result = SELECT v FROM start:v -(:e)-> post:tgt
                         HAVING tgt.subject == "cats";
       PRINT result;
}
printMemberAboutCats的编译错误
> gsql printMemberAboutCats.gsql
Semantic Check Error in query printMemberAboutCats (SEM-50): line 8, col 33
The SELECT block selects src, but the HAVING clause uses tgt

排序子句(ORDER BY)

排序子句是可选的,通过它可以对结果集进行排序。

排序子句的EBNF范式
orderClause := ORDER BY expr [ASC | DESC]["," expr [ASC | DESC]]*

ASC表示升序(最小值排第一),DESC表示降序(最大值排第一)。 如果两者均未指定,则默认使用升序。 每个排序项表达式(expr)必须引用结果集中成员的属性或累加器,并且排序项表达式的运算结果必须为可排序值(例如数字或字符串)。 排序子句可以包含一系列以逗号分隔的表达式,从而提供多层次排序功能,即最左边的排序项表达式最先排序。 若某个排序项表达式的排序结果中有数个元素顺序相等,则由下一个层次的表达式定义排序方式。 结果集中任何无法被排序的项(例如排序表达式与它们无关)将出现在已排序项之后。

下面的示例演示了排序子句与表达式的使用方法。 返回的顶点集首先按顶点的朋友数(numFriends)排序,然后按该顶点的同事数(numCoworkers)排序。

topPopular.gsql: 降序排序
# find the most popular people, sorting first based on the number as friends
# and then in case of a tie by the number of coworkers
CREATE QUERY topPopular() FOR GRAPH friendNet
{
​SumAccum<INT> @numFriends;
​SumAccum<INT> @numCoworkers;
​start = {person.*};
 
​result = SELECT v FROM start -((friend|coworker):e)-> person:v
​       ​ ACCUM CASE WHEN e.type == "friend" THEN v.@numFriends += 1
​​       ​    WHEN e.type == "coworker" THEN v.@numCoworkers += 1
​​       END
​​ ORDER BY v.@numFriends DESC, v.@numCoworkers DESC;
 
​PRINT result;
}
topPopular.json
GSQL > RUN QUERY topPopular()
{
 "error": false,
 "message": "",
 "version": {
   "schema": 0,
   "api": "v2"
 },
 "results": [{"result": [
   {
     "v_id": "person9",
     "attributes": {
       "@numCoworkers": 3,
       "@numFriends": 5,
       "id": "person9"
     },
     "v_type": "person"
   },
   {
     "v_id": "person8",
     "attributes": {
       "@numCoworkers": 1,
       "@numFriends": 4,
       "id": "person8"
     },
     "v_type": "person"
   },
   {
     "v_id": "person12",
     "attributes": {
       "@numCoworkers": 1,
       "@numFriends": 4,
       "id": "person12"
     },
     "v_type": "person"
   },
   {
     "v_id": "person6",
     "attributes": {
       "@numCoworkers": 4,
       "@numFriends": 3,
       "id": "person6"
     },
     "v_type": "person"
   },
   {
     "v_id": "person1",
     "attributes": {
       "@numCoworkers": 3,
       "@numFriends": 3,
       "id": "person1"
     },
     "v_type": "person"
   },
   {
     "v_id": "person4",
     "attributes": {
       "@numCoworkers": 5,
       "@numFriends": 2,
       "id": "person4"
     },
     "v_type": "person"
   },
   {
     "v_id": "person3",
     "attributes": {
       "@numCoworkers": 3,
       "@numFriends": 2,
       "id": "person3"
     },
     "v_type": "person"
   },
   {
     "v_id": "person2",
     "attributes": {
       "@numCoworkers": 3,
       "@numFriends": 2,
       "id": "person2"
     },
     "v_type": "person"
   },
   {
     "v_id": "person10",
     "attributes": {
       "@numCoworkers": 1,
       "@numFriends": 2,
       "id": "person10"
     },
     "v_type": "person"
   },
   {
     "v_id": "person7",
     "attributes": {
       "@numCoworkers": 6,
       "@numFriends": 1,
       "id": "person7"
     },
     "v_type": "person"
   },
   {
     "v_id": "person5",
     "attributes": {
       "@numCoworkers": 5,
       "@numFriends": 1,
       "id": "person5"
     },
     "v_type": "person"
   },
   {
     "v_id": "person11",
     "attributes": {
       "@numCoworkers": 1,
       "@numFriends": 1,
       "id": "person11"
     },
     "v_type": "person"
   }
 ]}]
}

限制子句(LIMIT)

LIMIT子句是可选的。它对最终结果集中包含的项目数量和排名进行约束。

LIMIT子句的EBNF范式
limitClause := LIMIT ( expr | expr "," expr | expr OFFSET expr )

每个限制条件表达式的结果必须得到一个非负整数。想要理解LIMIT子句,我们首先要假定有一系列的顶点被存储在计算机中,作为临时的结果集(即等待被处理的结果集)。 如果查询中包含了排序子句,则查询结果的顺序是已定的; 否则,该结果集的排列顺序未知。 假设我们将顶点编号为v1,v2,...,vn,则LIMIT子句会从临时的结果集中筛选出一部分的顶点组成新的集合,该筛选动作从列表中的较低位置开始,到较高的位置结束。

它有三种形式:

LIMIT子句的应用场景
result = SELECT v FROM S -(:e)-> :v LIMIT k;​​  # case 1: k = Count
result = SELECT v FROM S -(:e)-> :v LIMIT j, k; ​  # case 2: j = Offset from the start of the list, k = Count
result = SELECT v FROM S -(:e)-> :v LIMIT k OFFSET j; # case 3: k = Count, j = Offset from the start of the list

模式 1: LIMIT k

  • 当提供单个表达式时,LIMIT返回临时结果集中的前k个元素。 如果可用的元素少于k个,则将在结果集中返回所有元素。 如果k= 5并假定结果集包含至少5个元素,则最终结果列表将是[v1,v2,v3,v4,v5]。

模式 2: LIMIT j, k

  • 假设有两个表达式,当中用逗号分隔,则LIMIT会将第一个表达式的j认定为一个偏移量。 也就是说,它会跳过最先的j个元素。第二个表达式k表示包含的元素数量的最大值。假设一个列表中至少有7个元素,则LIMIT 2,5 的计算结果就是[v3,v4,v5,v6,v7]。

模式 3: LIMIT k OFFSET j

  • 模式3和模式2相同,区别仅仅在于语法的不同。关键词OFFSET分隔了2个表达式,但相比于模式2,k和j的位置对换了。假设某个列表至少有7个元素,则LIMIT 5 OFFSET 2的输出为[v3,v4,v5,v6,v7]

如果任何表达式的值为负整数,则结果无意义。

OFFSET只能用于已知顺序的结果集。 如果在没有排序子句的情况下使用OFFSET,会导致编译时错误。

以下示例演示了LIMIT子句的各种形式。

第一个示例显示LIMIT子句用作上限时返回的结果集,其中最大的大小限制为4个元素。

limitEx1.gsql: 按照某个上限执行LIMIT
CREATE QUERY limitEx1(INT k) FOR GRAPH friendNet
{
       start = {person.*};
 
       result1 = SELECT v FROM start:v
               ORDER BY v.id
               LIMIT k;
 
​​PRINT result1[result1.id]; // api v2
}
limit1Ex.json 的结果
GSQL > RUN QUERY limitEx1(4)
{
 "error": false,
 "message": "",
 "version": {
   "schema": 0,
   "api": "v2"
 },
 "results": [{"result1": [
   {
     "v_id": "person1",
     "attributes": {"result1.id": "person1"},
     "v_type": "person"
   },
   {
     "v_id": "person10",
     "attributes": {"result1.id": "person10"},
     "v_type": "person"
   },
   {
     "v_id": "person11",
     "attributes": {"result1.id": "person11"},
     "v_type": "person"
   },
   {
     "v_id": "person12",
     "attributes": {"result1.id": "person12"},
     "v_type": "person"
   }
 ]}]
}

以下示例显示如何将LIMIT子句与偏移量一起使用。

limit2Ex.gsql: 按照下限和大小执行LIMIT
CREATE QUERY limitEx2(INT j, INT k) FOR GRAPH friendNet
{
       start = {person.*};
       result2 = SELECT v FROM start:v
               ORDER BY v.id
               LIMIT j, k;
 
       PRINT result2[result2.id]; // api v2
}
limit2Ex.json 的结果
GSQL > RUN QUERY limitEx2(2,3)
{
 "error": false,
 "message": "",
 "version": {
   "schema": 0,
   "api": "v2"
 },
 "results": [{"result2": [
   {
     "v_id": "person11",
     "attributes": {"result2.id": "person11"},
     "v_type": "person"
   },
   {
     "v_id": "person12",
     "attributes": {"result2.id": "person12"},
     "v_type": "person"
   },
   {
     "v_id": "person2",
     "attributes": {"result2.id": "person2"},
     "v_type": "person"
   }
 ]}]
}

以下示例中,带有偏移量的限制子句还有另一个做法。 这次我们尝试更大的偏移量和大小值。 在大型数据集中,limitTest(5,20)应该返回20个顶点,那是若是原始数据中没有25个顶点,则输出会少于20个顶点。

limit3Ex.gsql:带有OFFSET的LIMIT
CREATE QUERY limitEx3(INT j, INT k) FOR GRAPH friendNet
{
       start = {person.*};
 
       result3 = SELECT v FROM start:v
               ORDER BY v.id
               LIMIT k OFFSET j;
 
       PRINT result3[result3.id]; // api v2
}
limit3Ex.json 的结果
GSQL > RUN QUERY limitEx3(5,20)
{
 "error": false,
 "message": "",
 "version": {
   "schema": 0,
   "api": "v2"
 },
 "results": [{"result3": [
   {
     "v_id": "person3",
     "attributes": {"result3.id": "person3"},
     "v_type": "person"
   },
   {
     "v_id": "person4",
     "attributes": {"result3.id": "person4"},
     "v_type": "person"
   },
   {
     "v_id": "person5",
     "attributes": {"result3.id": "person5"},
     "v_type": "person"
   },
   {
     "v_id": "person6",
     "attributes": {"result3.id": "person6"},
     "v_type": "person"
   },
   {
     "v_id": "person7",
     "attributes": {"result3.id": "person7"},
     "v_type": "person"
   },
   {
     "v_id": "person8",
     "attributes": {"result3.id": "person8"},
     "v_type": "person"
   },
   {
     "v_id": "person9",
     "attributes": {"result3.id": "person9"},
     "v_type": "person"
   }
 ]}]
}

Last updated