Non-Standard Contract Clause Detection is Easy to Build, Hard to Get Right

11 minute read

Software can’t accurately identify non-standard contract clauses unless it’s highly accurate at identifying contract clauses in the first place.

Some organizations need to know how executed agreements differ from their template, and where they have agreed to non-standard clauses. Two potential approaches to identifying non-standard information in contracts are:

Identifying agreements that are standard (i.e., are exactly written off a template, or deviate from approved templates).
Identifying non-standard clauses across a pool of agreements.

The previous Contract Review Software Buyer’s Guide post explained how a moderately-skilled technologist could build software to identify non-standard clauses and agreements over the course of a weekend. A simple system should be able to accurately identify non-standard agreements (as long as by “accuracy” you mean “recall”; it could feature a bunch of false positives). Accurate non-standard clause detection is trickier. While it can be quite easy to build a non-standard clause identification system, it’s a lot more work to build one that is actually accurate. Essentially, non-standard contract clause detection software must be highly accurate at identifying contract clauses in the first place, or it may miss truly non-standard clause examples. If not highly accurate, a non-standard clause detection system will only identify non-standard clauses that aren’t too non-standard.

The rest of this post will explain why.

A Simple Way To Get Non-Standard Contract Clause Detection

This post will only really make sense if you understand how a simple non-standard contract clause detection system works. Good news: the previous Contract Review Software Buyer’s Guide post gives details on one simple method to build a non-standard clause detection system. Here’s the basic idea:

The first step is stocking a database with standard provisions. Next, set up a diff-based system to compare all agreements for review with the standard clauses …. Then, set the diff comparison threshold to return all text that is [90][75][60]% similar to your examples. This will allow your system to also pick up provisions that are drafted slightly differently from those in your standard provisions database. So, for example, any new provision that is >70%-similar to a provision in the “most favored customer” database will be extracted as a most favored customer provision. To here, this is the same as building a automated contract abstraction system using a comparison approach. Here’s where things change: identical hits get classed as “standard” provisions, and all other hits—those that are >70% similar but not identical (e.g., a most favored customer hit that is 85% similar to a most favored customer provision in the standard clause database)—get returned as “non-standard” provisions. You could even stock an additional database with non-standard most favored customer provision examples, and have the system also identify these hits as additional non-standard most favored customer provision results.

The full post has more details. Note that there are other ways to do non-standard provision detection. Some should work, some should be susceptible to the problems described in the the rest of this post.

The Catch: Accurate Non-Standard Clause Detection is Hard

The hard part of non-standard contract clause identification is that the accuracy of a non-standard clause detection system is derivative of its underlying clause detection accuracy. That is, if a system is not accurate at identifying clauses overall, it will not be any more accurate at identifying non-standard clauses. As discussed previously in the Contract Review Software Buyer’s Guide, clause detection itself is simple, as long as you don’t need accurate performance on unfamiliar agreements and poor quality scans. Unless your non-standard clause detection system is built on top of contract metadata extraction software that is already highly accurate at finding unfamiliar information, you will only be spotting not-especially-non-standard-non-standard clauses.

In the previous Contract Review Software Buyer’s Guide post, we explained how you could use a simple comparison based approach to spot non-standard contract provisions. On the plus side, this way is easy to set up. The minus is that comparison based approaches are a mediocre way to find unfamiliar contract provisions, especially in poor quality scans.

If your approach to non-standard contract provision detection is to stock a database with “standard” clauses, then set a difference threshold, and call all hits that are not exact matches to standard provisions “non-standard clauses” you will get some hits, but you will miss a bunch of non-standard hits.

It’s easiest to explain the problem through an example. Assume the following confidentiality clause is a standard clause:

Each party will keep information conveyed to it by the other party confidential for the term of this contract and 3 years thereafter.

Let’s assume you set a wide similarity threshold of 60%, trying to ensure your system doesn’t miss any non-standard clauses. After all, misses are bad. Here, the system will return all provisions that are 60% similar to the standard clauses. Ones that are exact matches will be considered “standard” instances, and all other hits that are not exact matches will be considered “non-standard clauses.” You could even use slightly fancier methods and allow clauses with specified deviations (e.g., names, dates) to be returned as standard instances.

Your comparison-based system set to return provisions ≥60% similar should detect wordings like this as non-standard:

Each party will keep information conveyed to it by the other party confidential for the term of this contract.

You will also get lots of false positives. These clauses are both 60% similar to the example standard clause in words, and not at all close on content:

Each party will keep the Goods conveyed to it by the other party secure for the term of this contract and 3 years thereafter.

Each party will endeavor to keep the other party apprised of information on Sales Opportunities for the term of this contract and 8 years thereafter.

False positives are not great, but probably fine if not overwhelmingly prevalent. False negatives can be worse in contract review. If you use this comparison-based approach, you will also miss non-standard clauses that are not drafted like the standard clauses you stocked your standard clause database with. We’ll call these “non-standard non-standard clauses.” For example, this confidentiality clause is almost nothing like the standard confidentiality clause above in wording:

Recipient will not disclose Proprietary Information for a period of two (2) years from the date of termination of this Agreement, except as permitted in subsection (d) hereunder.

Both are confidentiality provisions, and their meaning is not that different. But their words hardly line up. A comparison-based system set at a 60% threshold (or a 30% threshold for that matter) would not identify this as a non-standard example of a confidentiality clause. So you would not be aware of this non-standard confidentiality clause among your agreements, despite having tried to identify all your non-standard confidentiality clauses. No problem, you say—just add both clauses to the comparison database, perhaps dropping the second confidentiality clause into a separate “non-standard” clause database. Then it’ll be okay, right? Sure, for these two wordings (apart from the false positives issue). But there are lots and lots of other ways to express confidentiality, and you may keep needing to add examples without reaching accurate performance. For provisions that tend to be consistently drafted (e.g., governing law, notice, standstill), sure. Dumping clause examples in a database and maybe throwing in some human-created rules and token machine learning technology may be all you need to get accurate performance. But—in our experience of working at this since early 2011—a number of the most critical provisions get expressed in a lot of very different ways (e.g., change of control, exclusivity, non-compete, most favored nation). The difficulty is that very non-standard provisions matter as much as slightly non-standard provisions and standard provisions. You’re not going to be let out of adhering to an exclusivity obligation just because it was drafted in a non-standard non-standard way. It takes robust provision detection software to find truly non-standard clauses, and it is hard to build software that is accurate on unfamiliar agreements and poor quality scans.

Software that is not highly accurate it identifying clauses in the first place will miss truly non-standard contract clauses. Making it worse, it’s hard to know how many non-standard provisions the system will miss or where they will be missed. If using semi-accurate non-standard clause detection software, what will you truly learn? You will likely find that a certain portion of your agreements have standard clauses, and a certain portion have non-standard clauses. But contracts where the software does not identify non-standard clauses may still have non-standard clauses. And there may be additional non-standard clauses in agreements where the software identified only standard clauses. This is because some agreements have multiple instances of certain clauses; termination, indemnification, and exclusivity clauses are especially likely to occur multiple times throughout agreements, and none are great provisions to miss instances of.

If you think finding non-standard contract clauses is a worthwhile exercise, you probably know that non-standard non-standard contract clauses count; if non-standard clauses didn’t matter, you wouldn’t bother trying to find them. Non-standard clause detection really should capture non-standard non-standard clauses. Highly accurate provision identification models are a critical foundation for this feature.

If a vendor claims their software identifies non-standard contract clauses:

Make sure to quiz them hard on how their software actually finds non-standard clauses in the first place. Don’t take “we include machine learning technology in our methods” as an answer—push them harder.
Test their software on your own (or other random) unfamiliar agreements and poor quality scans to be sure that it actually finds non-standard non-standard clauses.

If you have read the Contract Review Software Buyer’s Guide up to here, you know that it’s hard to get software to find provisions accurately on unfamiliar agreements and poor quality scans. Identifying non-standard non-standard provisions is a subset of this problem. Contract review software will be no more accurate at identifying non-standard contract clauses than it is at finding clauses in unfamiliar agreements and poor quality scans.

The solution: use highly accurate contract review software to find all clauses of a given type, then compare these hits to “standard” provisions stocked in a database. Knowledgable technologists can even set such a system so that non-exact matches that deviate in specified acceptable ways (e.g., a different party name or date) count as “standard” clause hits.

We see spotting deviations off form agreements as quite useful in its place, and added comparison capabilities into our system in 2013. We are not yet through the “extraction” portion of the Contract Review Software Buyer’s Guide, but as we complete it and move onto “features”, expect to read more on this (if not before).

However, we’re not done with non-standard contract clause identification yet. It turns out there is another way to generate inaccurate non-standard clause detection results. It would not come up if using the method described above, but is raised by another contract review software vendor’s approach to this problem. The next post will give details.

Postscript - Another Vendor’s Approach to Non-Standard Clause Identification May be Highly Susceptible to the Problem Described in this Post

In a recent post, another contract review software described how their non-standard clause identification worked:

The non-standard clause detection does not rely on global normalized data, nor does it need to know anything about the customer’s environment before deployment. It is configured separately by the customers based on their known standard terms.
Every customer has a set of previously agreed-upon standard terms and conditions, potentially within a template library or contracts they have already signed and stored as images. Seal can leverage both of these language sources to learn these examples as standard clauses, and then quickly and accurately present a filtered set of contracts based on potential non-standard terms. As such, this classification is not based on a global generalization, but rather a local customer-specific model. And as new terms are agreed to or rejected, Seal learns and updates its index, thus always giving up-to-date results that are based on information available at that second, versus a model created external to the environment months ago.

Customer-specific provision models are precisely what are described in the main body of this post. Comparison based approaches—like this—are a sub-optimal way to find unfamiliar contract provisions. This problem will be exacerbated by the limited set of training data most clients will have to feed their particular non-standard clause identification system. If you need your non-standard clause detection software to identify non-standard non-standard clauses, make sure to test it carefully.

Contract Review Buyers Guide Series:

Share this article: