Some BrainSpace Considerations
1. The current version of BrainSpace requires that the documents be in a text readable format, it does not read or process native files or non-text documents.
2. BrainSpace requires you to point it at a data set, but it doesn’t care how large or what type of data set it examines. Semantic queries can be formulated to run across virtually any document populations including intranets, extranets, enterprise content management systems, portals, and email archiving.
3. BrainSpace does not create an index. Rather, it creates containers of document "intelligence" with similar concepts which allow you to quickly examine them for relevance.
4. Within approximately an hour of "brain building" (loading) a dataset that includes 1 million documents, you could begin to examine the contents by clustered document container.
By quickly returning easy to understand visual maps of the contents of your data, you can quickly review, edit, tag and parse information as your knowledge of the contents increases
• Improved document recall
Focused Semantic search quickly returns more results that are related to your search query, whether or not you used the specific terms in your initial search.
• Greater precision
By combining Boolean and focused semantic search technology, your results will be highly relevant to the topics that are important to you.
• Increased transparency
The terms and concepts suggested by the semantic technology are returned to you for your review as the actual search is run. You know exactly what the system is searching on.
5. The BrainSpace query takes a plain language question that you are asking and converts to a conceptual Boolean search statement and then searches the data set. Results are displayed in concept clusters which allow you to easily identify documents that are responsive to questions you didn’t even know to ask.
6. BrainSpace maintains the Boolean statements it created so that results can be repeated and described to the opposition or the court, revealing how the results were obtained in a defensible manner. It makes transparent to the user what search statement is being sent to the data set.
7. Functionality built into the product:
a. The Semantic Near Dupe Identification Engine detects and groups near duplicate documents, identifying redundant documents with only slight variances which reduce review time.
b. It also contains text-based deduping, which goes beyond hash value in identifying duplicate documents by comparing the text of documents, exclusive of metadata differences.
c. Concept clustering is displayed as a "Focus" wheel of relevant containers that can be continually parsed into concept subsets, all the time displaying the concept that brings the documents together. Users can quickly generate visual maps of responsive documents, identify and tag key areas of hot documents and store and/or export these subsets for immediate review by a team.
d. BrainSpace assigns a PDID Document Tagger number that goes beyond Bates numbering. It is a semantic Bates number because it assigns documents with contextual similarity a similar PDID number. This allows the user to sort documents from related containers by using the PDID number.
e. Users are given the ability to add, delete, increase or decrease the importance of all query words in a unique visual query interface as they are examining results.
8. Document Containers of concept related documents that are quickly reviewed and determined to be relevant and likely require further analysis can be tagged and then exported into any document review platform or predictive coding system for a secondary analysis.
Why is this type of analysis different?
One of the most important features of the software is not really a feature, but rather forms the basis for a new approach to using semantic search. BrainSpace has been designed to be an interactive experience or conversation between the user of the system and the machine learning concept search.
The goal of the system is to encourage continued interaction with the data so that the search results continue to educate you about the information in the documents, quickly increasing your level of knowledge. It is less transactional and more interactive and hands on.
Creating a tool that is easy to use and understand means that attorneys can easily spend time with the data, quickly educating them as to concepts and documents that may be relevant to their ongoing discovery. The interaction creates a place to work and learn, rather than a long result set of documents that are similar to a keyword that was preselected.
This level of understanding is made possible by transforming your queries into a QueryCloud, which is a visual portrayal of the newly generated semantic query. It effectively places the user in the center of the transaction, encouraging interaction between the query and the data. Each user query is transformed into a list the shows the most relevant extracted and inferred words and phrases (which are
The goal of litigators in handling the large volume of data in today’s discovery is to provide the most cost-effective and comprehensive solution to analyzing the data that is potentially involved in EDiscovery and discover what is relevant and why. And the earlier this is done in the process, the better! When used in conjunction with other data management and review tools, semantic search can improve the state of EDiscovery. I have listed 4 key factors that indicate why and how semantic search can be used to improve your handling of Big Data in EDiscovery. It is time to take a long look at how this can impact EDiscovery:
1. Know your data – you have to be aware of what data you have, what it means and how it might impact your case as early in the process as possible. Your knowledge may result in your pursuit of a settlement rather than proceeding to trial based on what you learn. Including semantic searching in your plan dramatically reduces your learning curve by pointing you towards information that is likely relevant; more quickly and easily than other methods.
2. Semantic search improves your results – Semantic search queries take plain language questions that you are asking and convert them to a conceptual Boolean search statement which then examines the data set.
3. Explain your approach – you need to provide an explanation to the opposition and the judge about how you have achieved your search results and why the document population you are turning over is in fact responsive and relevant to the discovery request. This level of search transparency is at the heart of the semantic search product Pure Discovery which turns all the plain English search requests into a conceptual Boolean statement which can be clearly understood and replicated when necessary.
4. Be transparent and cooperative – Judges require parties to come to the meet and confer with definitive plans that have been worked out between the parties. They are looking for reasonable and well thought out approaches to discovery that are based on some degree of proportionality.
Results are displayed in concept clusters which allow you to easily identify documents that are responsive to questions you didn’t even know to ask. Semantic searches are dynamic, with the ability to continually update results as new information is introduced to the system. The better knowledge you have, the better you are able to negotiate during the meet and confer to limit document production, understand your case and determine litigation strategies.
Using semantic search as part of your overall preliminary document strategy will help improve your knowledge about the document population and allow you to improve everyone’s understanding of what and how documents have been selected. You will not be taken by surprise at the meet and confer since you will be in control of the information on behalf of your client.
1) The Basics of Predictive Coding
Without going into a complete discussion about predictive coding, the essential element that is relevant to understand is that predictive coding is based on some type of document seeding in order for the machine to "learn" what kinds of things you are interested in finding. The legal team puts together several representative populations of documents dealing with key areas of interest and the machine begins to locate documents of a similar nature. Predictive Coding requires:
• Input from case experts: both substantive legal issue and software consultants
• Keyword analytics to first locate important documents and create seed sets for the machine to use as their matching sets.
• A defined workflow that includes strong statistical sampling analysis to help insure accurate results
• Iterative rounds of machine "learning" (augmented by software and case experts) to find other documents that are "like this" based on keywords and some concepts.
Predictive coding is not designed to replace human review of documents, it is meant to optimize the review and help reduce the volume of documents that must be examined during discovery. The output from predictive coding during discovery is to take all the documents the computers identify as "related" to an issue identified by the case experts and then rank them and tag them so that they can be reviewed by humans for relevance and responsiveness.
One of the advantages of this technology is that you are using human decisions to "teach" the computer to locate documents, increasing the accuracy and relevancy of search sets over time. Whether you call it predictive coding, computer-aided review, or technology-assisted review, it employs a combination of human beings and computer algorithms that are used to determine relevant documents by creating "seed sets" -- and then using the seeds (controlled by algorithms) to have computers produce subsets of responsive documents.
About the Author
Jeffrey Parkhurst, Consultant, Studeo Legal
Jeff provides consulting and business development leadership to legal service providers regarding new business opportunities and increased service offerings. He delivers consulting to clients on EDiscovery procedures, processes and software alternatives to process discovery data. He writes a weekly blog, Support for Litigation on EDiscovery and litigation support issues highlighting discovery trends, consulting services and the impact of recent court rulings on the practice of law.