Study Note on “Information Lattice Learning”

8 minute read

Published:

Information Lattice Learning1 (ILL) is a journal paper published in 2023 trying to address the explainability challenge of AI. Instead of predicting data X and target Y, it focuses on explaining the so-called “what makes X an X” (or “what makes X differ from Y”). Interestingly, the “rules” of the signal of this work seem to be only interpretable to domain experts, but not for everyone. This also reflects one of the notions of interpretability considered in the paper - interpretable to whom. This study note is another layer of interpretation yet hopefully “easier” to understand.

Table of Contents generated with DocToc

Background

Often traditional unsupervised learning techniques such as clustering and dimensionality reduction don’t give good human-level interpretation. For a given dataset sample, clustering has no knowledge of absent data outside of the seen dataset and forms present data-specific clusters that have no special human-level reasons to explain. Dimensionality reductions similarly provide no explanation of what the reduced dimensions mean, and how to interpret and evaluate new data is unclear. ILL states that knowledge discovery is not just compression and clustering.

Terminology

It’s not hard to find out that one of the difficulties of digesting this paper is understanding and memorizing the terminology and symbols. You may need to double-check alien terms back and forth while reading, just like the case where at some point you need to go back and check what ILL means throughout this note. Annoying, isn’t it? Somehow this is the way how humans convey ideas. And ILL provides food for thought.

Let’s jump to the terminology part and summarize them in the following.

Signal

  • A function ξ : X, and also represent any probability distribution. ξ (xi) is signal, X (Chi) is domain. One example is Xn.

Partition

  • A partition P of a set X to denote an abstraction of X. For example, the following abstractions of the six outcomes of a die roll {{1,2},{3,4},{5,6}}, {{1,2,3,4},{5,6}}, {{1,3,5},{2,4,6}} are partitions of the set X= {1,2,3,4,5,6}.

Rules

  • A rule of a signal is a “coarsened” (or merged, collectively) signal rξ : P on a partition P of X.

Partition lattice

  • Parttion lattice is the partially ordered set(or called poset) of a set containing all partitions of X, denoted as (βX,≼).

If you find the same confusion about what is poset as I do, I will try to explain it in a later section.

From the finest partition (every element is a partition) to the coarest partition (all elements are in one single partition) of a set, you can draw a Hasse diagram representing the relationships that connect partitions with direct “finer than” or “coarser than” relation. The Hasse diagram is a DAG graph.

Information lattice

  • The information lattice of a signal is the poset of all rules of that signal. Also, an information lattice of a signal is isomorphic to the underlying partition lattice via projection.

A sentence like this contains too many alien terms to understand and effectively explains nothing. One learning goal is to make this sentence sensible after explanation. See the discussion section.

Projection

  • An operator that projects a signal down to a rule.

Lifting

  • An operator that lifts rule(s) up to a signal.

What is Lattice?

You probably need an extra explanation about what is lattice. The following is an external study on 2 3 4 and more.

Lattice theory is a textbook-long content, and let’s try to understand lattice through another set of terminology….

Partial Order Relation

By memorizing the key concepts of the definition5 of it - a relation R on a set A is called a partial order relation if it is (1) reflexive, (2) anti-symmetric, (3) transitive, let’s walk through examples directly.

  • Example 1: For set A = {a, b}, relation R = {(a, a), (b, b), (a, b)} is a partial order relation.

  • Example 2: For set A = {1, 2, 3, 4}, relation R = {(3, 4), (4, 3)} is NOT a partial order relation since the anti-symmetric condition is not satisfied.

Remember that a relation can be described by a set of pairs where elements of each pair are from the set A. Of course relations like “<” that implicitly means an infinite set of pairs with such relation are intuitive alternatives, if available.

Partial Order Set (poset)

Then, a poset is a set A that is combined with a partial order relation.

set of natural numbers with divisibility

  • Example 1: Natural number set A: {1, 2, 3, …} with the relation R: “a ≤ b if a divides b” is a poset.

set of subsets

  • Example 2: A set A: {{1},{1,2},{1,2,3}} ordered by the relation R: subset inclusion ⊆, is a poset.

Lattice

Finally, lattice, denoted as (βX,≼), is a special type of poset that has the following additional properties. In a lattice, every pair of elements has both a unique least upper bound (called the join, or supremum) and a unique greatest lower bound (called the meet, or infimum).

  • Join, supremum, or least upper bound - The join of elements a and b, denoted by a ∨ b is the least element greater than or equal to both a and b.

  • Meet, infimum, or greatest lower bound - The meet of elements a and b, denoted by a ∧ b, is the greatest element less than or equal to both a and b.

Example of a lattice:

Consider the set L = {1,2,3,6} with the divisibility relation R:

Join: The join of 2 and 3 is 6 since 6 is the smallest number that is divisible by both 2 and 3. Meet: The meet of 2 and 6 is 2 since 2 is the largest number that divides both 2 and 6.

The set L with this R is a lattice.

What does Information Lattice Learning (ILL) mean?

ILL is a single optimization problem. Given a signal we want to explain, explaining means to search for a rule set such that it recovers (by projection) the signal well, and the rule set is simpler (i.e. it contains fewer and simpler rules). The search space involves the full information lattice, or isomorphically, the full partition lattice. The formal definition is skipped here. Note that a recovery may not be exact, a loss function to measure the mismatch of signal and recovered one is also introduced.

Due to not only computational concerns but also different level of interpretabilities of criterions an abstraction can made, computing all partitions of X is unrealistic. Practically, ILL can be achieved via 2 phases: (1) lattice construction and (2) lattice learning. Which I think is the key novelty of this work. The lattice construction phase contains a prior-driven stage, and such priors are universal and simpler and thus domain-agnostic. The lattice learning solves a relaxed version of an ILL problem - the to-be-searched search space is limited to antichains (whose elements are mutually incomparable) only, and this ILL work chooses an iterative approach

Discussion

1. Explain the definition of information lattice again.

Let’s break down the following sentence again:

  • “The information lattice of a signal is the poset of all rules of that signal. Also, an information lattice of a signal is isomorphic to the underlying partition lattice via projection.”

information lattice of a signal == a poset of all rules of that signal

You can imagine that a rule describes the connecting direction of two different partitions of a set. And these two partitions are comparable, i.e., have finer than or coarser than relationship to each other. Therefore, the information lattice can be represented by a DAG diagram (Hasse diagram) in which each node is a partition, and each edge is a rule.

2. Why choose lattice for the purpose of explaining AI?

I guess one simple explaination is that lattice allows the projection and lifting happen on signals. And such operations converts signals to rules back and forth, which in one sense is explaining the signal by realizing them in rules.

  1. Yu, Haizi, James A. Evans, and Lav R. Varshney. “Information lattice learning.” Journal of Artificial Intelligence Research 77 (2023): 971-1019.: https://www.jair.org/index.php/jair/article/view/14277 

  2. Lattice Theory Lecture 1: https://math.nmsu.edu/people/personal-pages/files/ESSLLI1.pdf 

  3. Partial Orders and Lattices(Set-2): https://www.geeksforgeeks.org/partial-orders-and-lattices-set-2-mathematics/ 

  4. Partial Orders and Lattices: https://www.geeksforgeeks.org/partial-orders-lattices/#lattices 

  5. Partial Order Relation on a Set: https://www.geeksforgeeks.org/partial-order-relation-on-a-set/