iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
📖

re:Invent 2025: How Mary Technology is Building a Legal Fact Layer for AI on AWS

に公開

Introduction

By transcribing various overseas lectures into Japanese articles, we aim to make hidden, high-quality information more accessible. The presentation we're covering in this project, based on that concept, is:

For transcribed articles from re:Invent 2025, information is summarized in this Spreadsheet. Please check it as well

📖 re:Invent 2025: AWS re:Invent 2025 - How Mary Technology is building the legal Fact Layer for agentic AI on AWS

In this video, Dan, CEO of Mary Technology, explains the challenges of large language models in automating document review for law firms and his company's solution. LLMs face issues such as loss of meaning due to compression, lack of sensitive training data, the nature of disputes where no single correct answer exists, and the context-dependency of facts. Mary builds a unique fact manufacturing pipeline that treats facts as first-class citizens, extracting entities, dates, and events and managing them as explainable metadata-rich objects. This persistent and auditable fact layer has enabled major law firms like Arnold Bloch Leibler to reduce document review time by 75% to 85% and achieve an NPS score of 96.

https://www.youtube.com/watch?v=cptduBASjRU
※ This article was automatically generated while maintaining the content of the existing lecture as much as possible. Please note that there may be typos or incorrect information.

Main Content

Thumbnail 0

Hello everyone, my name is Dan. I'm the co-founder and CEO of Mary Technology. We're a legal tech company based out of Sydney, but now we're global, and we help law firms automate document review. And this is a big problem for large language models, and I want to talk to you today about how Mary is looking to solve that. Before we begin, how many people here are head of legal ops in a large organization, or perhaps run your own law firm? Okay, understood, great.

Thumbnail 50

Thumbnail 60

So, let's talk about the problems. Large language models, even with Retrieval-Augmented Generation and even with agentic frameworks, are not fit for purpose for legal dispute resolution workloads. And there are a number of problems. I'm going to talk to you about four today, but the first is about training. And what I mean by the availability of training data is that the types of data that we deal with day-to-day for law firms and for litigation teams that are dealing with disputes is incredibly sensitive, and that information is not public, and you certainly cannot gather that data and train it when it involves confidential information from law firm clients and from internal employees.

The second problem is that there isn't a single correct answer for a large language model to converge toward. Because in a dispute, there are always at least two sides. So, you can't just say, "this is the correct answer" and then move and explore toward that. You need to understand potentially all of the narratives and all of the correct answers that are available depending on which side you're representing.

Thumbnail 120

Thumbnail 130

The second problem, and probably the biggest problem, is that large language models are compression machines. That's what they're really good at, and let me talk you through these stages of compression. The first thing that a large language model does when it receives a document is it ultimately turns that page into an image. And if that image has text, or even if it's just a photograph, it ultimately turns it into text. But in a legal document, particularly where there's a lot of text, some of the nuances that might be in there, some of the legal nuances and important meanings are stripped away. Like a handwritten note or a small note on the side.

Once it turns it into text, it ultimately turns it into tokens, then embeddings, and then context compression, and then ultimately something that you can chunk and summarize. Each layer of compression, you lose meaning. And that meaning and that nuance is incredibly important for law firms and for people who are trying to understand disputes and understand facts, and that's really the heart of the dispute.

Thumbnail 190

But this is what they're really good at. I'm not saying large language models are bad, I'm actually saying they're really, really good. They're just good at generalist capabilities, right? So they handle a vast range of tasks, they scale across large document corpora, and they generate fluent and plausible text without deep pre-processing. So they're generalists, and they're really good at that. And as you'll probably tell, this slide was written by an LLM. It did a very good job of telling me what I should write on the slide, with nice emojis.

Thumbnail 220

Thumbnail 240

The third problem is that the facts are not in the data you upload. And this is a slightly strange one, and it needs a bit of explaining, but facts are really the heart of a legal problem or a legal dispute. So let's give you an example. These are facts that might exist in a document. We've got a date, which is great. And we've got A. Smith reported an error.

Now, let me give you some of the challenges as to why you cannot simply extract this data and assume that it is usable for downstream processing with AI as it currently stands. What if there are multiple people in that document corpus named A. Smith? One might be Alice, one might be Andrew. But to make this a useful and meaningful fact, you need to understand which A. Smith it actually is. And using a large language model alone, you cannot do that.

We're in America today, but back in Australia where we're from, these two dates are completely different. So this could be the 5th of January, or it could be the 2nd, 3rd, or, yeah, the 3rd of May. So you need to understand the context of the case to actually be able to say, okay, this is probably this date. Is this reported error on this date even material to the case?

That's obviously contextually related, so maybe this is not even a relevant fact that you need to understand and dig into further. It's also fragmented. How many times has this particular fact been mentioned across all of your documents, and do they potentially contradict this fact or do they corroborate it? And then finally, the provenance. What kind of document did it come from? For example, did it come from a primary document, which was someone detailing what a CCTV camera saw, or was it hearsay from a statement provided to, say, a police officer, for example, and all of these have different meanings for how relevant and meaningful they are when correlated in a court or in a litigation process.

Thumbnail 350

So let me show you another example. This is actually taken from my co-founder Rowan's medical document, and there are a couple of challenges here. I'm hoping I'll get to show you later on how Mary, our platform, addresses this. But one of the things I just want to point out here is that this is very common, PT. And what that actually means is patient. Now, what happens if I take this fact and put it into a large language model? Well, it won't understand that that's referring to Rowan. So it won't understand that it's referring to the actual person that is mentioned, or rather the patient themselves. So you need to converge and correct this information so that later on, in the document review process, you can actually utilize that information as a fact, or in your document review.

Thumbnail 400

This is a more detailed example, but large language models perform significantly worse than systems designed and built to support litigation workflows. Especially in document review, where the tolerable error is extremely low, like in law. Imagine, I write a letter. In it, I don't write my name, I don't say who it's addressed to. And I detail a crime that I committed, but I don't necessarily say it as a simple fact. So I don't say I stole the car. I say it in some colloquial way.

Thumbnail 440

The challenge for a large language model is that if that document is put amongst 4,000 other documents, if you ask a large language model, "Did Daniel steal the car?" it will never say yes. Because Daniel's not mentioned, I didn't say I stole the car, and I didn't say who it was addressed to. What Mary, or rather, what tools that do this kind of work in the legal document review process needs to do, is look at the handwriting in that letter. Does that handwriting exist somewhere else in another document? Can we actually understand who wrote it? And maybe I put a date in there that I went to the park. We need to be able to understand that in this other document, Daniel said that he went to the park on that date. And then, ah, we might actually be able to draw a conclusion here, great, we should probably review whether this is Daniel, because there's corroborating evidence. So this is an example of why large language models are inadequate for this kind of work.

Thumbnail 500

What is Needed for Litigation Workflows: Building a Fact Layer You Can Be Confident In

And the final problem is that even if I do all that fact extraction perfectly, that's not what lawyers and legal teams actually need when they're doing an investigation. What they actually need is to be truly confident about those facts and about the narrative that they're trying to present for their client or for their company. So let me give you an example. Are there any lawyers here? Okay, we've got one, great.

Thumbnail 530

This is just an example, but hopefully it gives you food for thought as to why this is so important. This is an exercise, and this is a perfect bill. And let's say a large language model outputs to you and says, "This is a perfect legal document, whatever it is, and I've done the work for you. I've gone through this entire vast corpus of documents. I've extracted all the facts. I've reviewed what's relevant in the context of the case. And I'm now providing you with a perfect document. In this case, a bill. And I guarantee it's an ideal submission. It's backed by perfect evidence. It's in the template that you usually use, all of those good things. Go and submit it to the other side or to the court." Would you submit it?

That's the correct answer, good. You can't. Because ultimately you have obligations to the party that you represent. But more importantly, you have a responsibility to be confident in the action that you're about to take. And unlike a large language model which is perfectly constructed to receive a question and give you the correct answer back, what's needed for this kind of work, for this document review and litigation workflow, is something that can provide you with all of the potential stories so that you can understand, review, and verify all of the facts yourself and be confident, even when you don't know what the questions are going to be.

Thumbnail 620

Thumbnail 630

So how do we solve all of these problems? It's in a way that large language models simply don't like to do it. Because it's incredibly process-heavy, and it's incredibly specific, not a generalist task.

Thumbnail 670

Thumbnail 680

And the first thing you have to do is treat facts as first-class citizens. Just like a large language model will tell us that the most important thing for us is to have a very efficient embedding model, a fact review platform needs to have the best fact model and say, okay, great, we're going to take these facts and process them, and put them through an incredibly heavy manufacturing pipeline to provide you with something that you can trust and ultimately verify. And that's why we need a world-class review and verification experience. This is where lawyers and the teams that are representing or trying to do this investigation can review the extracted facts, build stories, and do other work. And then finally, this is probably the one element that I think was missing from what I spoke about before, but you need to take this fact layer that you're confident in, and you need to pipe it into downstream AI applications. So things like OpenAI or other integrated interfaces, you can pipe it in there and get it to work.

Thumbnail 710

Thumbnail 720

Mary's Solution: 75-85% Reduction in Document Review Time through a Fact Manufacturing Pipeline

Now, I've got a short video to show you how we've solved these problems. As a lawyer, when you receive a matter, your first goal is to get your facts straight. But that's never easy. It means digging through endless emails, PDFs, and records, splitting documents, cross-checking dates, and piecing together a clear timeline. It's slow, manual, and can take hours or even days before you even begin the legal work. We call this the factual chaos.

Thumbnail 760

But what if, the moment a matter hits your inbox, everything springs into action? We can retrieve documents attached to emails or find documents uploaded into tools you're already using, scan them, and process them. We can split messy consolidated files into clear, structured documents. We can categorize them, rename them, and organize them back into your workflow seamlessly, right where they need to be.

Thumbnail 770

Thumbnail 780

Thumbnail 790

Thumbnail 800

But what if organizing documents is just the beginning? What if we could completely unblock you and get you to the real legal work? We can extract key entities from all of your documents, that's names, companies, and their roles in the case. So you instantly know who's important. We can find and retrieve key dates, so you understand when events occurred. We can get a concise case summary, summarizing the entire matter into a few clear paragraphs. Identify gaps that need evaluation, detect potential data breaches, build a timeline of events, and extract any other crucial details relevant to the case.

Thumbnail 810

Thumbnail 820

Thumbnail 840

And by bringing all of these insights together into a single dashboard, anyone can get a firm grasp of a case in minutes, even if they've never seen it before. Delve deeper with a generated chronology, surfacing only what's most relevant to your case. Invite experts to collaborate in real-time with you and your colleagues, and draft directly into tools you're already using. Mary works with your existing systems, adapting as new evidence, events, and documents emerge, keeping your case aligned every step of the way. When facts are clear, decisions are faster. Factual chaos, solved.

Thumbnail 850

So, in summary, what I'm trying to say here, and hopefully you saw in the video, is that this takes a novel approach. And it's interesting because we couldn't simply go in and extract meaningful facts for users using what most people would use, which is Retrieval-Augmented Generation and Agentic workflows. So what we're saying here is that good enough isn't good enough, it has to be great.

So we built a fact manufacturing pipeline, and in that, we extract all of the events, entities, actors, issues, and many other things. And ultimately, imagine a fact as an object. And underneath that is a lot of metadata that allows us to build relationships and build out a case. So it's almost like building out a digital case as an object. And then it tells you, is this fact conflicting with this other fact? And the key here is that,

each piece of metadata underneath that object needs to be explainable. So if we make a single decision, we will surface that and expose it. If we make a decision and we tell you a date, we will tell you how we arrived at that date. If we tell you something is relevant, we'll tell you how it's relevant.

Only after we've created a high-quality fact layer do we then use these more traditional, well, not traditional, they're very new technologies, but more standard technologies like RAG and agentic frameworks. And what you're left with is a persistent, auditable fact layer that you can rely on, whether it's within the platform itself doing the investigation, or when you're piping that information downstream into document generation or other associated legal tasks.

Thumbnail 960

So let me very quickly show you what the platform looks like for a single fact. And it's to highlight the challenge that I spoke about before about the patient. So you can see at the top here, we've got a fact. We've got the date and time, and it's Rowan McNamee re-evaluated, and he's got residual right arm swelling. And you'll notice that this is a very summarized and concise fact. Because that's what a lawyer needs. They need to be able to see all of these facts. Because the vast majority of them are not relevant.

Thumbnail 990

So now let's zoom in. And imagine I moved my mouse to the right and hovered over that relevance. And it tells me that this entry focuses on another medical issue. And bearing in mind, many of my examples are around personal injury, but this can be used in employment law, and all kinds of law. But in this particular case, it's personal injury. So it tells me why this is not relevant. But if I think something might be relevant, I can then dig in deeper.

Thumbnail 1010

I can pull out the actual document, the exact page and location from which that fact came, and I can also get Mary to provide me with more grounding as to how it derived this fact, and where are the details that I can replace the fact if I need more information. But this handwriting is terrible. Well, it's pretty good handwriting, but, yeah. This predominantly works on unstructured data. Not something that's very easy to pull all the information out of, like a contract. We need to focus on the really hard documents.

But the reason I bring this up is that if you look at that document, what's coming out of it is PT, so patient. But we're not just relying on that. That's only one of the elements where it corrects the fact as it goes through the pipeline. And it says Rowan McNamee. So that ultimately, when I pipe this fact to another downstream AI capability, it knows that it's looking for Rowan McNamee. So if I ask, has Rowan been to hospital with this, it can say yes, and confidently, and I can go directly back to where it found that.

Thumbnail 1070

So let me very quickly talk about where we're at today. We're working with many of the largest firms in Australia today, and that includes Arnold Bloch Leibler, which is one of the largest law firms in the world, and they've got operations not just here in Australia, but in the UK and in other regions. And we're bringing on new firms every week. With all of our clients, we've successfully reduced the time spent on document review, which is probably the biggest bottleneck in litigation, by 75% to 85%. That's an area where a lot of time is spent, a lot of costs are incurred, and we're reducing that by a significant amount.

Thumbnail 1120

And overall, we've achieved a Net Promoter Score of 96 out of 100. People really love using Mary, because this is one of the hardest, most frustrating, frustrating jobs in this entire process. So people are loving it. I'll just put this up on screen. It's a bit of an Australianism, but I've had to edit the name, so there's a little dot there, but this is one of our clients talking about how they're using Mary.

Thumbnail 1140

So with that, I will take any questions, but that's Mary Technology, and that's how we're building this fact layer. Any questions? No? Alright, thank you very much.


※ This article was automatically generated using Amazon Bedrock, aiming to retain as much information from the original video as possible.

Discussion