DBTA WEBINAR PANEL

Implementing and Managing a Data Fabric: Steps to Success

Recorded: May 18, 2023

About

As data environments continue to grow more diverse, distributed, and complex, the need for agile data management has increased proportionally. To maximize the value of their data, organizations need to make it available to users where and when it's needed—fast, easy access is the mandate. This requires rethinking traditional data management practices.

In response, data fabric has emerged as a new design concept aimed at solving the persistent challenges of integrating, governing, and sharing data across the enterprise. Through a unified architecture and combination of data services and technologies, organizations can create an integrated layer of connected data to facilitate self-service data consumption.

The benefits of this scenario are easy to understand but getting there is a different story. There is no out-of-the-box or one-size-fits-all solution, and it's not a one-and-done project either.

This webinar helps IT decision makers and data professionals gain a deeper understanding of data fabric, how to start, and how to maintain success.

 

You might also be interested in

 

The Rise of the Knowledge Graph ebook

blog-ipad-image-email-rectangle-rounded-corners

 

Learn more about Cambridge Semantics, Inc. solutions. 

Transcript

As, Stephen said, my name is Sam Chance, and my company is called Cambridge semantics. We're based in Boston, and we offer the Anzo knowledge graph software platform, which serves as a backbone for data fabric and data mesh architecture.

Today, I'm gonna talk about implementing your data fabric using, knowledge graph technology as the underlying data model and I think is, that choice is your key to success. The assertion is on this slide is that not all data fabric implement implementations are created equal. In terms of information accessibility, usability, scalability, and adaptability.

A shared goal of data fabrics, however, is easy to access information on demand. In other words, We seek to shift humans and software from transactional activities such as data gathering and preparation and aggregation, to more cognitive activities such as analysis and insight creation. Toward that end, the data model becomes arguably the key determinant of success. I'm biased, of course, But based on my experience in this technology space, I think data fabrics built on W3C semantic web standards namely web ontology language and resource description framework, also called OWLand RDF will suit will prove superior over time.

These standards remove structural and format heterogeneity and add human and machine, understandable meaning, i.e. semantics. OWL RDF based knowledge graphs readily adapt to change in uncertain times due to what's called the open world assumption. That's a topic for another time, but you can research that concept. Further, information in semantic knowledge graphs is modeled and presented in business oriented terms instead of physical schemas, which allows users to interact with information more aligned with how they think and subject matter experts and all your users interact more intuitively with information. Also, OWL is machine understandable and enables machine reasoning services that create new facts from existing facts. Your knowledge graph should grow. Your knowledge graph as it grows represents everything you know. And at least with Anzo, It is made completely and arbitrarily accessible to answer known and unanticipated questions on demand. No indexes required. Knowledge grafts move more intelligence to the metadata, which makes data more interoperable and make of software appear more intelligent.

A concept that has increasingly gained awareness is the idea of data as a product. That is we want to require data providers or publishers to take responsibility for curating and managing their datasets. Curated knowledge graphs can be logically and part physically partitioned to represent different data products. Of course, knowledge graphs are aligned with the fair principles of findable, accessible, interoperable, and reusable. So this kind of depicts what we want. What we want is to take all starting from the bottom to take all of these different data systems repository files, services, etcetera. And get rid of all of the different proprietary or not proprietary, but all the different formats, structures, and taxes, integrate that in a way that brings together information based on the meaning of that data. For example, you may have, information about, John Q Public in ten data sources, but the question is, what do I know about my customer, John Q? And what we wanna do is all of that available to end users and automated applications in a way that is more conceptual fully integrated and and and seamless to the the client, be they human or machine clients.

So how do we get there? This slide is it, for better or worse. It's a little bit product oriented. But the idea is on the left your your input side is where you have all of your heterogeneity aka the chaos and you want to be able to bring that in. Meaning, you wanna connect to it or collect it. However, you prefer. And then begin to normalize that, in our case into RDF, the graph.  I'm pushing knowledge graph as your foundation. So, you convert those sources, the data or the metadata from those sources into, ontology's, the owl metadata, and then you convert the the data in those sources into what's called RDF triples. So now you have your initial raw knowledge graph representations and then you apply standard or, enterprise models, OWL ontology's to link the auto generated concepts and data into, more intuitive business aligned knowledge graph representations. And then at the bottom there, you can see where you're blending the data and this is where you might perform your ETL, your transfer, transformations, computations, etcetera. Maybe apply those, inference services that I mentioned earlier. But then finally, you have these prepared data sets that can serve as your data products.

And further on the right and data access, you can see on the top right there's Anzo. Like I said, it's kind of a product slide, but this this overall flow is is not unique to Anzo. But you wanna make available, tools from your solution, your data fabric, but you also wanna make available the ability for users to use their own preferred tools so that you minimize the impact on them and, help promote adoption.

And this applies to unstructured two. Again, kind of, Anzo specific, but it it's not just, our product. The idea is that you wanna be able to account for unstructured content as well. And in that second stage there, where we're saying configured annotators basically an NLP function. And what you wanna do is, extract the entities relationships, facts that you care about, and also represent them in the knowledge graph and link that to your structured data through the ontology. And the, again, the ontologies are the the formal definitions of of your domain or your application areas. But all of your structured and unstructured data is linked through the the knowledge graphs.

So let's get going. And, a great resource to get started is if, if you, if you are interested in knowledge, graph as a foundation and using it as your I'll just say your premise. The rise of the knowledge graph ebook provides great, conceptual and, practical information with guidance to get you underway. If you're thinking about data fabric for your company or your team, you are already thinking big. I recommend starting with two or three data sets to create an initial knowledge graph. Research commercial offerings so you're prepared to scale fast. And then depending on the scope of your project, you will likely need experts to get you going in the right direction and minimize risk.

And and I'd say before you run around shouting the data fabric message, educate yourself if possible build your first knowledge graph based on a specific problem that you know others will care about. While it's true that one can create a knowledge graph using a text editor, this is tantamount to boiling the ocean.

Again, an effective knowledge graph software platform will prevent you and your team from requiring specialized skills to be successful. It will also help you optimally focus your efforts and it will provide the quickest and most likely time to value.

When you develop your UIs, be careful not to just show these depths fuzzballs. People don't understand them or they don't care about them, but they can help people understand that the data is now highly connected.

Anyway, figure out how to display the data from multiple sources in one view. People typically get it especially end users when they see data from all over the place being brought together in one logically consistent, view in a single pane of glass.
But a harsh reality is that some may not understand the significance of how you created your demo.
They just might not get it. As such, it's important to find and engage people who will understand why your approach is so important. We're talking architecture, so users typically don't know or care.

Your data fabric is something of a journey. You start with a couple of data sets create your initial knowledge graph, then expand and iterate. It's very well aligned with the agile development process. And over time your data fabric becomes the primary information access source. By the way, it's worth mentioning that your data fabric should not disrupt current operations. It should be an overlay of existing investments. Additionally, it should allow end users to continue using their tools like I mentioned earlier for that perhaps in addition to something you might provide. As we mentioned earlier, treat your data as a product with quality metrics, service level agreements, change management processes, and so on.

In contrast to early and even some contemporary data are architectures, we are intentionally thinking about others and how we can act to the larger community or network of information. So think more ontologically and model data based more on its inherent meaning, not based on known questions or known use case. Consumers will customize it if and as needed. We publish, knowledge graph. We published a knowledge graph best practices article on our website which elaborates on all this stuff that I just said on this slide. And it has a video and and and commentary as well. You might find that useful.

Last slide, I wanted to say a word about, large language models and knowledge graph. So probably a statement of the obvious, but generative AI is front and center now, but there are serious concerns with AI in general. In in in fact it has some people calling for, bombing data centers all the way to other people saying they wanna create an AI god. So you get the sense that you want to proceed, but you wanna proceed with your head on a swivel and and and thinking about what you're doing. But the giggle principle still applies. In other words, garbage in still leads to garbage out. And although the G in chat GPT is generative, a lot of effort is going into generating the right context.

So suddenly a spotlight is on knowledge graph because among other things, they are fully contextualized and provide great input. For AI technology. In fact, if you research the semantic web, capital s, capital w, You'll soon discover it was originally envisioned for machines. As we will see soon being able to explain how clicks of AI as efficiently and as effectively as possible will be critical to competition and, regulatory compliance. And superior knowledge graph implementations naturally support verification and validation functions. This is typically enabled by what we call provenance.

So finally, in my opinion, data fabrics, built using knowledge graph technology provides supercharged inputs for AI. In other words, knowledge graphs enable AI. And that concludes my presentation.


How do I combine data fabric and data mesh architectures?

Great question. So first of all, we've been talking about data fabric, obviously. So a word about data mesh, it is based on what I've at and and observed it is almost as much about, process and policy as it is about technology. And one of the the large or key, points in the mesh that is coming out is data producers or publishers being responsible for curating their data and offering it as a product. So that is not a technical thing, but it is implying that all the data, publishers are all over the place. So you have this notion of distributed and maybe decentralized.

And the data fabric is seeking to provide that frictionless access to all of this data. So then what you wanna do in this, we have articles on this, of course on the website and LinkedIn, is use we would say use the knowledge graph to be that connective tissue between both of those constructs and more specifically the w three c semantic web standards are built for distributed decentralized. Hopefully that helps.

Will generative AI make the knowledge graph slicing dicing and layering efforts obsolete?

As one might imagine, that question has been an elephant in the room at our company. And therefore, a lot of thought has gone into it, and, obviously, we can't no or guarantee, the answer. But the current situation with the, generative AI is using these pre trained data sets to try to get in the right context based on what the user is asking. So the knowledge graph comes along and says, well, I'm fully contextualized and I'm the best input you can have so that you have the best output you have. So it's looking at this point, like, this could be a boon or a really good situation for knowledge graph technology. And currently, we're very optimistic, and we're, of course, like others in our space, we're developing capabilities that use knowledge graph as the the fuel, if you will, for generative AI. So we're kind of optimistic right now.


What about latency of fetching data from diverse data sources into fabric and then time to analyze and produce insights versus traditional data like based data analysis?

I was gonna say, two aspects of the latency issue. One I mentioned semantic W3C semantic web standards. One of those is the query language. But if you dig a little bit deeper, there's an element of that standard that is a protocol for how to manage federated queries. The standard itself has, measures built in to mitigate latency issues. And then this is an implementation detail, but they're the way that our plaque form helps cope with that and get this robust virtual data virtualization is by applying functions in the AnzoGraph query engine in memory so that It's more dynamic and and and handles latency better. So there's two things. One is in your implementation.
And the other is in the standards that you select, for example, you know, standards that are designed for this kind of data access pattern.

Are there any required roles or skills necessary to get started with Anzo or knowledge graph technology in general?

if you start a data fabric project using Anzo as a foundation. Then you would have to learn, what I call, knobology. In other words, you would have to learn the Anzo interfaces, but you would not have to learn any new technologies. So your existing personnel and skill sets are able to be successful with creating solutions. And if you start with, open source technologies then you're going to have to understand, you know, and I I've been focusing on, knowledge graphs with OWL RDF as the foundation for your data fabric. So if you if you use open source technologies that implement these standards, then you're gonna have to understand the the you're gonna have to have those skills, especially SPARQL. That's the protocol and query language.

Are there any disadvantages to using a data fabric?

No. There really aren't. I mean, it's an evolution. You know, we add from a data warehouse to a data lake, which could also be considered as a lake house. And, but what you're now doing is really bringing data together. I mean, the only def the only disadvantage is if you don't have insistent definitions throughout your organization. You know, you know, as I think Evan brought up earlier, the distributed data warehouse, you know, if you're in a bring a bunch of distributed data warehouse together, you've gotta make sure that you have consistent definition, across those, you know, from your data glossary and and what you're looking at so that you're getting to insisting data throughout your organization. But I don't see that there's really any def any disadvantage. I think the the advantages really outweigh it. You know, you can bring data together and you're now providing really almost real time here. I'm getting data to you when you need it. Which is really the goal of analytics.

Absolutely. Sam, would you like to add to this?

I would just say I love that answer. And, I always tell people we're we're basically, overcoming a problem that we unwittingly created. And what we did was, in my opinion, it was my database, my spreadsheet, my data, And we also assumed that I was right and my schema was right. So, fast forward to our time. We're the the generations that are trying to overcome that problem and let the world, if you will, this is kind of the big hairy goal, let the the world wide web become a queryable database. So that humans can answer questions and collectively we can know what we know. It's not all. Stove type that's democratized.

If there's one thing you'd like our attendees today to walk away keeping in mind, what would that be?

I love those answers, and I simply say, digital transformation is is coming. So think big, start small, scale fast. I'll leave it at that.