Leveraging Plaintiff Data in Mass Tort Litigation: Structuring Data for Powerful Insights
| S:1 E:3In this episode of Litigate with Insight: The LMI Podcast, we discuss ways to leverage data collected in mass tort litigation and strategies for approaching data collection for powerful reporting and insights.
Join Megan Pizor, LMI’s General Counsel and Chief Data Officer, and Angela Browning, LMI’s Chief Strategy Officer, distinguished legal professionals with extensive experience in mass tort litigation, as they dive deep into the science behind data modeling and best practices for data collection and management to get to the root of allegations at the outset of litigation.
Where to Listen
Find us in your favorite podcast app.
Angela:
Hi, and welcome to Litigate with Insight, the LMI podcast, a podcast about the intersection of data, analytics, technology, and innovation in the legal industry. In each episode, we bring together experts, thought leaders, and industry pioneers to explore how these cutting edge topics impact litigation. I'm your host, Angela Browning, Chief Strategy Officer at LMI. On today's episode, we're going to take a deep dive into data management and data normalization with General Counsel and Chief Data Officer for LMI, Megan Peyser. Welcome back, Megan. Thanks for joining us today. It's great to be here. I'm really excited about today's discussion because we have the opportunity to enter your data tech world and see things from your perspective.
Megan:
And I get the opportunity to try to explain the overly technical side of litigation support.
Angela:
I think that's what you do a really good job at, is saying things in terms that layman's people, us normal folks can understand, kind of bringing us into your world in a way that's really easy to digest. And I think that's what our clients see too. But I'm going to start out with kind of the overarching question is, when we're talking about data in litigation, what is it that we're talking about?
Megan:
Certainly. Well, during the discovery process, there is a lot of different evidence that might be collected to substantiate or defend against a claim, or if there are multiple claims at issue. This can include information about the person or the business that is filing the complaint, the person or business about whose complaint is being filed, the allegations at issue, it can be potential fact witnesses, and really any other information relevant to the matter. Requests for discovery can be through subpoenas, interrogatories, discovery questionnaires, or requests for production of documents or data. So the data itself can include anything from emails to business records, mobile phone usage, messaging data, social media content, medical and employment records, and the list goes on from there.
Angela:
Let's go a little bit like in the weeds, kind of into our area in mass tort litigation. How does, you know, what are the different types and the different uses of data in mass torts?
Megan:
Mass tort litigation can be a bit unique in that there could be hundreds or even thousands of plaintiffs. I know we've discussed this in some of the other episodes of this podcast. But as we have discussed in prior episodes, there are some unique tools for consistently and efficiently gathering data across multiple parties. And again, that can be hundreds or even thousands of plaintiffs and sometimes even multiple or hundreds of defendants. In many mass tort cases, data about specific plaintiffs, and sometimes defendants, is gathered by way of a tool referred to as a fact sheet, or sometimes a profile form. This is basically a set of interrogatories, or sometimes called a discovery questionnaire, that is uniform and completed by all of the plaintiffs to determine the basis for their specific claims. For plaintiffs, these fact sheets typically ask questions about general demographics, maybe medical background, proof of use or exposure to a product or a substance, and proof of the injury that's being alleged. Historically, fact sheets were completed individually by plaintiffs or their legal representatives and gathered in hard copy format, or sometimes in electronic format, such as a fillable PDF. Increasingly, however, fact sheet completion is taking place in a centralized online repository where it's much easier to capture data in a way that allows for more consistency, which ultimately leads to better metrics, reports, and general data insight.
Angela:
Well, let's talk about that. Once you have the data, and I mentioned this kind of in the intro, data modeling, data normalization, what is it when you hear that term? Bestow us your knowledge on what that means in the easiest terms possible.
Megan:
Good question. I'm going to actually break that into two terms. I'll talk about data modeling, and I'll also talk about normalization. Sometimes the terms are used somewhat interchangeably, but there can be differences. So I'll talk to both. Data modeling is basically a process of analyzing and defining various types of data being collected, stored, and used. It often involves creating some type of a visual representation of an information system in order to better communicate trends or relationships between data points. To do this, data needs to be defined and structured in a way that essentially maximizes desired outcomes. There's a couple ways to do this, but for purposes of today's discussion, the basics involve organizing the data, meaning making sure the data is logically structured, and then defining the relationships, establishing how different data elements interact with one another. Data normalization may have some overlap, but it's essentially organizing the data to ensure similarity and consistency across all records, across all systems and all fields. The goal here is to ensure that the data is logically stored and easy to find or retrieve, but more to the point to generally improve overall data integrity, allowing for more usable and reliable data.
Angela:
Now, Megan, I think that's a really great way to describe the difference between modeling and normalization. For me, that makes perfect sense. I'm a visual person, so I get the modeling piece. I'm going to actually get to see something as a result of the data that's in the system. Normalization is more making sure that any of the data that's getting entered is entered in the same way across, you know, across the different forms, across the different plaintiffs, so that if you're able to do any of the reporting, the data is clean. Exactly. And I'll leave it at that. That's my layman's way of reiterating what you just said to make sure that I am on the same page with you. You did great. Good, good, good, good. So now that, you know, data is normalized, we have the data, you know, how do you prepare data for analysis?
Megan:
That's another very good question and a critical step in the process of getting to what I'll call actionable data that can really be utilized for maximum insight. And I think normalization is sort of part of that process, data modeling, data normalization. But there are some other key steps that I think it's worth breaking down without going too far into the tech data rabbit hole. So, again, there's some overlap between these steps and some variation that may occur depending on what the ultimate endgame is, what the goals are, what type of data is being prepared. But I'll try to define this at a high level with some understanding that, again, there's overlap amongst all of this. So the very first thing to do is generally to define the overall objectives and requirements. What are the end goals? What are we ultimately trying to achieve with the data and any potential associated insights? What do we want to learn? What do we want to identify? What kind of reports or metrics might we be seeking? So second, we'd actually gather the data. This could be through a mass export of a company's database or system. It could be through discovery questionnaires that we previously referred to. Next, we would clean the data. This part is very critical. I can't stress it enough. This is the world where I live. We want data to be consistent, available, and accurate. And this touches back on some of the prior points with normalization and modeling. How will missing values be treated? Are there data fields that are going to be required versus ones that are optional? Are there ways to reduce duplicates? In litigation, this could be maybe not just duplicate data fields, but actual duplicate respondents to, say, a discovery questionnaire. How will errors or exceptions be handled or corrected? All of this is in that cleaning the data stage. Then there's sort of a data transformation stage where the data is converted and structured into a usable format to best support analytics and general decision making. This could involve preparing the data to match a specific system. It could be making modifications to fit a particular purpose or context. Then is somewhat related, but I'll pull it out as a slightly separate step, and that is the data validation. This is where data is checked for quality and accuracy before it is used, imported, or otherwise processed. Data formatting, again, some overlap, but this is where data would be organized according to certain preset rules. It would allow for maximized standardization and getting back to that normalization of the data that we talked about. Think radio buttons, selecting from a drop-down menu, as opposed to using highly inconsistent three-text fields. And then you get to what I'll call data exploration. And this might actually be the first stage of the analysis piece of it. And it involves using data visualization tools. You mentioned being a visual person. This is where we start to make that data visual so we can identify similarities, trends, patterns, maybe even outliers. In short, it's allowing data analysts to begin identifying the potential relationships between data sets and variables.
Angela:
You touched on a number of things that I think are kind of outlining the challenges of what can happen with, you know, data, you know, import data, normalization, modeling. I'd love for you to talk about, you know, some of the challenges that you've seen or maybe others that maybe I've experienced in your realm. Where shall I start?
Megan:
I think in my experience, data quality is the biggest challenge when it comes to data collection. In litigation, when parties initially contemplate what data they wish to gather, it can be very hard to know what format is going to be best later in the process, when it comes time to query or actually use the data. The information needed as litigation evolves can really change a lot, which makes it hard to predict. But there are some best practices that can help ensure data will continue to provide maximum value throughout the litigation. And I think data structuring and maybe data integration For data structure, think of this as how the data is being formatted when it's initially captured. I mentioned challenges to free text fields a minute ago. If parties wish to query a database later to identify, for example, let's say every instance where a plaintiff is alleging a heart attack as their injury. Heart attack could be noted in a free text field as heart attack. It could be myocardial infarction. It could even be abbreviated as MI. It can be really challenging to identify all the instances of heart attack if every plaintiff is denoting it differently in their respective discovery questionnaires. However, if you use required formatting, heart attack could be a dropdown option or selected with a checkbox or radio button, which allows it to be consistent across all plaintiffs and therefore much, much easier to identify and report on. Data integration is another aspect of data quality, and there could be a lot of complexities that arise when combining data sets from multiple sources. Think about a matter that involves multiple plaintiff law firms, each of which may have a different intake platform they are using to gather data about their respective claimants. The firms certainly don't want to have to rekey all the same data into another system when it comes time to respond to a discovery questionnaire. So, very understandably, they would prefer to export whatever data they have from their intake system and set it up in such a way that it can be ingested into whatever centralized system is managing the actual discovery questionnaire process. But if every firm's intake system is set up differently, different dropdown or checkbox fields and values, all of that data needs to become consistent before a new system can fully recognize it. This is where really good data preparation comes in, the data cleaning and normalization that we previously talked about.
Angela:
One of the challenges I know that we still see, I think, is just the adoption of this online PFS platform versus a fillable PDF versus, you know, a hard copy, you know, maybe that's been filled out by a plaintiff or their representative. Talk about the benefits of, you know, this electronic form versus the other ways, because we definitely still see it, I would say, in these more old school ways of filling out a fact sheet, filling out a questionnaire, and then somebody has to do something with it to get it in a format that can be reported on, normalized, all of that. Tell us about, you know, some of the benefits that you see, you know, as going electronic versus not.
Megan:
Sure. A fillable PDF form is pretty much exactly what it sounds like. It's a form, completed electronically, and can even include some tools to help structure the data, as we've talked about, checkboxes, drop-down lists. The challenge, however, is ingesting the data from potentially hundreds or thousands of individual forms in a manner that is really going to most effectively allow for reporting and analytics. If all the users, the plaintiffs, the claimants, the law firms, were using the exact same software to view and complete the form, this would help quite a bit. But that is almost never the case. And even if they were, it can still be a challenge to ensure consistency and ingest data from so many unique sources. And I think that's where one of the biggest benefits of a centralized platform comes in. These fact sheet platforms are literally designed to ensure consistency and standardization across all forms. They're designed to process and analyze data. They are designed for analytics and dashboards as the end goal. Law firms, both plaintiff and defense, can start identifying trends and patterns within the data and better understand relationships that they may not have even previously considered. For example, maybe consider a toxic exposure case where plaintiffs from multiple jurisdictions are alleging some kind of different injury related maybe to a location and identifying that those injuries might be related to their location might not happen without having good data in a source that allows for insight dashboarding analytics. Maybe different age groups have certain commonalities that are uncovered via dashboard graphical trends. And I'd say, you know, lastly, in addition to being designed to manage exactly the purpose that these are intended for, fact sheet platforms are usually designed to accommodate the changing needs of the parties as litigation evolves, with the added bonus of being probably a more secure aspect of collecting private and confidential data.
Angela:
There's your general counsel hat on, that's for sure.
Megan:
Definitely.
Angela:
One of the things that you mentioned earlier was these intake platforms. Let's talk about data transfer from a spreadsheet. Let's say these firms are tracking these things into a spreadsheet. How does that information get into a centralized PFS platform?
Megan:
I actually spoke with one of LMI's data analysts in preparation for this episode to see if she had anything she wanted to add, additional insight. She did, and had some excellent commentary. But her initial response to this specific question made me laugh. How does data transfer from a spreadsheet to another program? Not well, for her exact words, right? But joking aside, there are ways to transfer data from a spreadsheet to another system or program. And I think advanced preparation is key. It goes a long way by making sure the spreadsheet aligns with the ingesting system in terms of formatting and overall structure. There will typically be a process involved to help kind of map which spreadsheet fields go with which database fields. And there are also some technical integration tools that can help with some more simple kind of copy-paste type transfers. But there are definitely drawbacks, you know, as the data analyst may have implied. Data can certainly be corrupted when trying to convert formats that don't align exactly. There can be timeout errors if you know, a small special character is used and is not recognized by the ingesting system. Again, I think advanced planning and preparation can really help go a long way to mitigate this. And for large data sets, LMI typically, and I think a common best practice, is to use some kind of a pre-made spreadsheet template if any kind of a bulk import via spreadsheet is contemplated, where the fields and formats can be locked down in advance to reduce data entry errors.
Angela:
Now you touched on some of this, some of the best practices for data management. One of the big things I've heard you, I think this is a theme I've felt throughout this episode, is thinking ahead of what you might need in the future. Absolutely. And I think that's somewhere where we are always, you know, advising our clients on doing that, asking the questions, anticipating what the next five steps are going to be. Tell us more about some of these best practices for data management and how our listeners can implement those today.
Megan:
Absolutely. And I think I've probably spoken quite a bit to this throughout the episode. As you mentioned, I think advanced preparation is really a key piece of this. There's a lot of factors to consider and implement, but I think that advanced planning and then clean, structured, consistent data is really the foundation for any good data management program, whether in general, whether for litigation purposes, you name it. For litigation, Think about what data will be gathered in advance, come up with a plan for how best to capture that data in a way that will allow for that actionable insight during later phases. In my experience, rushing through that initial planning phase can lead to a lot of data headaches later in the matter.
Angela:
And I know we see that with a lot with litigation, it's hurry up and then wait. The hurry up is implementing these systems and then waiting because there's something that needs to be reworked because it wasn't contemplated at the outset. I think that's a really nice segue into my final question is, what are the benefits of working with a legal tech partner to manage data? Firms don't have maybe the infrastructure that is needed when there's large scale litigation at play, a number of players involved in terms of the defense and the plaintiffs. Love to hear your thoughts on benefits of working with a legal tech partner in this space.
Megan:
I think legal tech partners specialize in this. They specialize in data, data capture, data processing, data management, and how to get the best insight from litigation data. These partners can help at the outset with advanced planning. As we discussed, they can work with legal teams to get maximum value from the data as that litigation evolves to trial or ultimately resolution. They will have those tools that help ensure the data integrity, as well as having the more robust analytical tools and dashboards. And lastly, the legal tech partners can often really just bring a peace of mind with respect to ensuring that that data is remaining secure and that it's accessible really only at the right times to the right parties and is as impactful as it can possibly be throughout the litigation.
Angela:
I'd like to thank Megan for being our guest on today's episode, as well as you, our listeners. If you enjoyed the show, subscribe to Litigate with Insight, the LMI podcast, on your favorite podcast app, share it with your colleagues, or on LinkedIn. You can learn more about LMI at lmiweb.com. Have a topic you'd like us to discuss? Send us a message on our website or LinkedIn. This has been a co-production of Evergreen Podcasts and LMI. Special thanks to our contributor Kaylee Sabnick. Our producers are Brigid Coyne and Sean Rule Hoffman. Our audio engineer is Sean Rule Hoffman. I'm your host, Angela Browning. Thanks for listening.
Hide TranscriptRecent Episodes
View AllLegal Technology Advancements: Past, Present & Future
Litigate with Insight: The LMI Podcast | S:1 E:Bonus 1Streamlining Mass Tort Cases: An Inside Look at Multidistrict Litigation
Litigate with Insight: The LMI Podcast | S:1 E:2Navigating Mass Tort Liability: Leveraging Legal AI for Strategic Litigation Management
Litigate with Insight: The LMI Podcast | S:1 E:1Introducing the new LMI Podcast!
Litigate with Insight: The LMI PodcastHear More From Us!
Subscribe Today and get the newest Evergreen content delivered straight to your inbox!