GQL

Author: Saqib Ali

  • Linear Composition in GQL

    The workhorse of Graph Query Language (GQL) is pattern matching. GQL is oblivious to how a graph is stored. The table resulting from pattern matching is manipulated by a sequence of operators that modify it in an imperative style that that is referred to as “Linear Composition” or “Sequential Composition”. In “Linear Composition” we can simply add clauses to the already existing query which apply new operations to the result of already processed clauses.

    In the following example we will demonstrate how a sequence of operators can continue after the RETURN clause.

    Let’s start with a simple Graph.

    Let’s say we interested in the nodes that have the longest path and then we want to find out all the paths between those two nodes. We can start with finding the longest path i.e. Saqib -> Angela -> Scott -> Uroosa and then use Linear Composition to add another Pattern Matching query to identify the all the paths between Saqib and Uroosa

    Let’s start by create some sample dataset:

    create graph social_graph{
    
     node person ({name string})
    , edge relationship (person)-[{type string}]->(person)
    };
    
    use social_graph;
    
    
    INSERT (n1:person {_id:'saqib', name:"Saqib"})
      , (n2:person {_id:'angela', name:"Angela"})
      , (n3:person {_id:'scott', name:"Scott"})
      , (n4:person {_id:'uroosa', name:"Uroosa"})
      , (n5:person {_id:'fatima', name:"Fatima"})
      , (n6:person {_id:'eleni', name:"Eleni"})
      , (n1)-[link1:relationship {type:"friend"}]->(n2)
      , (n2)-[link2:relationship {type:"co-worker"}]->(n3)
      , (n3)-[link3:relationship {type:"friend"}]->(n4)
      , (n1)-[link4:relationship {type:"co-worker"}]->(n5)
      , (n5)-[link5:relationship {type:"friend"}]->(n6)
      , (n1)-[link:relationship {type:"friend"}]->(n4)
    ;

    Now let’s find the Longest path using PATTERN MATCHING

    MATCH path = ALL  (a)-[links]->{1,}(b)
    RETURN a, b, PATH_LENGTH(path) as path_length
    ORDER BY path_length DESC limit 1

    This will result in:

    Next let’s add use Linear Composition to add another Pattern Matching query to identify the all the paths between Saqib and Uroosa

    
    MATCH path = ALL  (a)-[links]->{1,}(b)
    RETURN a, b, PATH_LENGTH(path) as path_length, path
    ORDER BY path_length DESC limit 1
    NEXT
    MATCH path = ALL (a)-[links]->{1,}(b)
    RETURN table(a.name, b.name, pedges(path));

    This will result in

    Observe that GQL processes results of pattern matching in sequential, or pipelined way where the output of each operation in a sequence serves as the input for the next operation. GQL standard refers to this as Linear Statement Composition or simple Linear Composition.

  • Complex Pattern Matching in GQL

    The workhorse of Graph Query Language (GQL) is pattern matching. Pattern matching is done by specifying one or more path patterns in the MATCH clause. A single path pattern matches a linear path of vertices and edges, while more complex patterns can be matched by combining multiple path patterns, separated by comma. In this blogpost we will explore complex patterns.

    We will use the following sample Graph of Restaurants, Likes, Cities and Persons

    Let’s create some sample data

    create graph restaurant_graph {
      node person({name string})
      , node restaurant({name string})
      , node country({name string})
      , node city({name string})
      , node state({name string})
      , relationship located_in ()-[]->()
      , relationship lives_in ()-[]->()
      , relationship likes ()-[]->()
    };
    
    use restaurant_graph;
    
    insert (n1:person {_id:'uroosa', name:'Uroosa'})
      , (n2:person {_id:'fatima', name:'Fatima'})
      , (n3:person {_id:'maryam', name:'Maryam'})
      , (n4:person {_id:'zainab', name:'Zainab'})
      , (n5:person {_id:'hannah', name:'Hannah'})
      , (n6:restaurant {_id:'vostok', name:'Vostok'})
      , (n7:restaurant {_id:'kusan', name:'Kusan '})
      , (n8:restaurant {_id:'navat', name:'Navat'})
      , (n9:city {_id:'Santa Clara', name:'Santa Clara'})
      , (n10:city {_id:'San Jose', name:'San Jose'})
      , (n11:city {_id:'Los Angeles', name:'Los Angeles'})
      , (n12:state {_id:'Calfornia', name:'California'})
      , (n1)-[:likes]->(n6)
      , (n4)-[:likes]->(n6)
      , (n1)-[:likes]->(n8)
      , (n2)-[:likes]->(n8)
      , (n3)-[:likes]->(n8)
    
      , (n1)-[:lives_in]->(n9)
      , (n2)-[:lives_in]->(n9)
      , (n3)-[:lives_in]->(n10)
      , (n4)-[:lives_in]->(n9)
      , (n5)-[:lives_in]->(n10)
      
      , (n6)-[:located_in]->(n9)
      , (n7)-[:located_in]->(n10)
      , (n8)-[:located_in]->(n11)
      , (n9)-[:located_in]->(n12);

    Let’s say we are interested in the finding all the Liked Restaurants where the Restaurant is in the same City where the Person who Liked it.

    match (n1:person)-[:lives_in]->(n2:city)
    , (n3:restaurant)-[:located_in]->(n2)
    , (n1)-[:likes]->(n3:restaurant)
    
    return table(n1.name, n2.name as lives_in, n3.name as restaurant);
    

    This will return

    Notice the use of 3 Paths separated to commas to match this complex pattern. Note that the MATCH pattern share the variable n1, n2, and n3 . This means that they are the same variable rather than three different variables. This is what enables us to express the original requirements as MATCH patterns.

    GQL Binding Variables (like n1, n2, and n3 in the aforementioned example) carry data between statements. When you reuse a variable name, GQL automatically ensures it refers to the same data, creating powerful join conditions.

    Now let’s say we interested in finding all the Liked Restaurants where the Restaurant is NOT in the same City where the Person who Liked it. We can use the NOT EXISTS clause for this.

    match (n1:person)-[:lives_in]->(n2:city)
    , (n1)-[:likes]->(n3:restaurant)
    WHERE NOT EXISTS {
        match (n3)-[:located_in]->(n2)
    }
    return table(n1.name, n2.name as lives_in, n3.name as restaurant);

    Observe that designing the same request in SQL using recursive Common Table Expressions would be way more complex, difficult to read, and so complicated to maintain and update.