2023, the world of data will continue to evolve and transform, driven by increased demand for data-driven insights and security needs. In this chapter, we'll take a closer look at the key tech trends related to data in 2023, including improving approaches to data management and the importance of data in driving ESG initiatives.
Check out the previous chapters of the Tech Trends 2023 series: Web3 and Enterprise. Keep an eye out for our upcoming insights on development & infrastructure and a few other topics worth following.
Note: All pictures are AI-generated with Jasper Art.
Data collection
As we said earlier, 2022 has been an eye-opening year for hacking, and a lot of personal data has been stolen, sold, and abused. In 2023, we expect more scrutiny from the government and harsher penalties for enterprises losing personal data.
Data management: less is more
For a long time, companies have been data hoarders: “Get the data first, and we'll figure out what to do with it later… and keep it forever, just in case we need it in the future.” However, this approach is starting to backfire as data privacy and security become increasingly important.
In 2023, we anticipate that companies will take data privacy much more seriously, starting with reducing the data they collect and process. Regardless of how many security measures you put in place, there's a simple, universal truth: The data you don’t have cannot be stolen from you! Additionally, sensitive data you don’t have is a reputation you can’t lose.
Better data management methods
We're also expecting that other ways of handling data will be prevalent in 2023: aggregated data, hashed or tokenised data, homomorphic encryption, and more exotic data management.
Think about backup rolling policies. Due to space constraints, we do not keep all backups forever. Daily backups are retained for a few weeks, monthly ones are kept for five years, and yearly ones are kept for 20 years. The same could apply to data, where data gets aggregated after a few weeks, and detailed data is removed.
Outside of legal requirements, unused data should be deleted. Data classified as sensitive for ex-customers (e.g. passport data, bank details, credit card information) should be removed or hashed after a while.
When you're thinking about data privacy in 2023, start by asking yourself: How much data do I really need? What's the risk, and what would be the cost in case of a major issue? How long should I keep the data (hint: it's not forever), and what alternatives can I find?
AI and ChatGPT
Unless you have been living in a bunker for the past six months, you have heard (and maybe played with) ChatGPT. The conversational AI has been used for anything from jokes to research, blog posts, and even theses (leading to issues and deep introspection for teachers about how assignments are evaluated).
Incorporating AI Chatbots into businesses
By the end of 2023, GPT3 and ChatGPT will have become commonplace in enterprises. These AI chatbot technologies will be used to provide customer service, execute tasks and analyse data. Further still, they can be incorporated into existing AI service pipelines to fit use cases newly imagined or previously out of reach. However, with the rise of GPT3 and ChatGPT come several vital questions regarding regulations, copyright, attribution, and risk.
Businesses need to understand that GPT3 is not a magical tool for writing a perfect natural language; there are still some limitations that may affect the accuracy or success of GPT-generated content. For example, GPT has difficulty understanding context when it comes to slang terms or dialects. Additionally, GPT may struggle to accurately attribute sources and credit content creators appropriately.
The rise of copyright and attribution issues
From a creator’s point of view, it raises many questions. If I ask GPT to generate a text, who owns the copyright? The original author of the text used in the training data? The enterprise owning GPT? Myself? Somewhere in the middle?
Can we consider GPT as an aid to writing, like a grammar corrector or a thesaurus? I suppose it depends a lot on how it is used. For instance, GPT could rewrite specific paragraphs or flesh out ideas. But how would this be tracked? It's very far from what we are used to in finding plagiarised documents. Of course, there will be software used to detect AI-generated content in the future. Still, I think it will be unreliable, full of false positives, and relatively easy to circumvent.
AI images (like those used in this article) also pose many questions regarding attribution: if I create images "in the style of" someone else, is it still my own? What about the pictures initially used as training data? Does the author or artist have a say on how it is used? What’s the minimum amount of data for something to be plagiarism? These questions have been (legally) answered for music and the written word, but not when data comes from hundreds of places and respondents.
If it seems far-fetched now, think again. Last year, free and open-source collectives were not very happy with the decision from GitHub to train Copilot (an AI that can generate code and help with software development) on open-source software available on the platform. They were claiming that the licenses used for the code were violated if Copilot was used to create closed-source software.
And the issue will compound soon when AI-generated content is used as training data for AI-generated content alongside human-created one. It’s a complex problem that requires innovative thinking from all stakeholders.
We are not here to judge if this is fair use, but it raises many questions regarding copyright and attribution, licenses, and legality – better left to lawyers and lawmakers.
Today, ChatGPT is far from perfect, and we can see its limitations once we use it more. But tomorrow? Microsoft has already launched GPT capabilities in some of their products. This will dramatically increase the size of the data available to GPT and increase the engine's capabilities... and more headaches for people having to figure out the legality and fairness of these tools.
It will be interesting to see how GPT-related regulations, copyright and attribution issues are addressed in the upcoming years. Whatever is decided, it's clear that GPT3 and ChatGPT have changed the game in AI, and their impact on enterprise use can't be underestimated. And this goes double for GPT-related risks and legal implications.
Data marketplace and clean rooms
Functional datasets have always been highly sought commodities, both in the commercial and research spaces. Today, the tools and mindset to build and share quality datasets on central exchanges exist, and more companies are incorporating the need to build data products into their strategic roadmaps.
Facilitating collaboration and ensuring quality
Data quality and access controls are crucial to achieving an ideal vision of companies being able to search a global marketplace for quality data and use it to enhance their own analytical workloads. Data Quality processes are being automated via Business Process Management (BPM) to ensure that consumers of these datasets can trust them in their analyses. The monetisation of data access is a by-product of building data products, and companies are increasingly leveraging the tools available. The use of clean rooms will go up as companies seek to collaborate with other firms on each other’s anonymised datasets to provide common clients with relevant product offerings.
Automation and audit trails via DevOps practices will be key to ensuring all the above can be achieved with quality at a large scale.
Our clients have used Data Marketplaces and Clean Rooms to facilitate collaboration in the banking, FSI, and retail spaces. For 2023 we foresee the same trend continuing in the ESG, green finance and systemic sustainability spaces on top of the domains mentioned above.
ESG data and reporting
Quality data has an important role to play in defining ESG strategy. For example, in the Carbon Credits space, there is an apparent demand for quality-rated credits, which companies can trust and rely on to be a key part of their ESG strategy. There are even startups in the space that are looking to add further transparency to the calculation of GHG (Greenhouse Gas) emissions, additionally benefitting local communities.
Reinforcing ESG efforts with data analytics
Sustainability officers will work hand-in-hand with your data analytics team to define your ESG roadmap and back it up with meaningful data. With the help of analytics, you’ll be able to identify which areas need improvement and where progress has already been made. Your ESG strategy can then be tailored to meet those specific needs.
Using data-driven analytics, you’ll also be able to track the progress of your ESG strategy over time. This will enable you to measure the success rates of your ESG initiatives and make any necessary adjustments where needed. In addition, analytics can help you understand how different stakeholders perceive and engage with your ESG efforts. Knowing your target in each area—and being able to display and explain it to everyone in a simple, clear and meaningful way—is essential to the success of your strategy.
As such, we suggest that your ESG "dream team", which defines and explains your ESG strategy, includes data analysts, marketing specialists and designers. Getting everyone involved in defining and monitoring your ESG strategy is critical to successful implementation. By better understanding data-driven insights, you will be able to ensure that your ESG initiatives are meaningful, sustainable and effective. At the end of the day, this ensures that you’re doing what’s necessary for a successful change of your impact footprint.
Quality ESG data directly leads to gains in systemic sustainability. So, whether it is making more sustainable choices in your company's supply chain or other operations or creating sustainable investment portfolios – sustainability officers will need to consider quality ESG data a key tool to achieve their goals.
Cost management and agility
Another critical aspect of data practices is the cost and ability to quickly develop and adopt new technologies as they arrive.
The move to cloud for data workloads
For this, along with FinOps (which itself requires data, cloud engineers and finance teams to work hand-in-hand), data practices will search for solutions to reduce cost and improve flexibility. The cloud is the obvious choice for this. This is why in 2023, we foresee more and more companies migrating their SAP, data processing and storage capabilities to the cloud to leverage economies of scale.
The ability to scale and experiment with technology such as GPUs on EMR is also a key reason for big data and ML workloads. To be successful in your cloud migration for your data workload without impacting the velocity of your current projects, having strong partnerships with cloud-native technology vendors in the Warehousing and Integration spaces is key.
Data migration strategy also needs to include questions on data access, sovereignty, access control, audit, and, last but not least, selection of the appropriate cloud provider, as they differ greatly in their offers and costs (another company’s cloud selection criteria will be very different from yours).
Next week, in part 4 of the Tech Trends 2023 series, we will explore the topic of Development and Infrastructure – highlighting tools and practices to be aware of in 2023.
If you’re interested in how your company can better manage data or utilise data for your ESG efforts, get in touch with us to see how we can help.
Contributors: Leo Arkhipov, Kevin Aubry, Michael Biallas, Ian Carter, Dominic Eales, Leonardo Diaz Deramond, Kevin Lawrence, Faisal Ramay, Yudesh Soobrayan, Dan Wheaton