OS, AI, Mythbusters -- 🦛 💌 Hippogram #14
The 14th edition of the Hippogram focuses on the myths of open source
I'm Bart de Witte, and I've been inside the health technology industry for more than 20 years as a social entrepreneur. During that time, I've witnessed and was part of the evolution of technologies that are changing the face of healthcare, business models, and culture in unexpected ways.
This newsletter is intended to share knowledge and insights about building a more equitable and sustainable global digital health. Sharing knowledge is also what Hippo AI Foundation, named after Hippocrates, focuses on, and it is an essential part of a modern Hippocratic oath. Know-How will increasingly result from the data we produce, so it's crucial to share it in our digital health systems.
With all the excitement of the democratisation of AI, I stumbled on a tweet from a researcher I highly respect, Carissa Véliz. She is the Author of a must-read book called ‘Privacy is Power’. Her tweet made me write this newsletter. In her tweet, she expressed her concerns and referred to an article published in Slate.
The argument that bad actors can manipulate open source code is an argument that regularly pops up in my conversations and debates on open source. It’s as old as open source. Or one could argue it goes back to when Gutenberg invented the printing press. Printing decentralized gatekeepers, which transformed religion, science, and politics, and provided more people access to knowledge, deception, and power. It weakened authority and hierarchy, putting groups against one another. It disrupted and liberated. The printing press deserves credit for democracy and the Enlightenment, while at the same time it deserves some of the blame for chaos and slaughter. Yes, some of those responsible for the slaughter were horrible actors. Nevertheless, the need to eradicate ignorance and provide people with access to information and understanding evolved into a driving force behind uprisings that overthrew feudal regimes.
Top 9 Open Source AI Mythbusters
1. Bad Actors Could Manipulate The Code
This straw man argument assumes closed source code, and closed datasets cannot be manipulated. In the past, proprietary and closed software was abused and controlled. Those who witnessed Wanacry, Stuxnet, or ILoveYou know that closed software is just as vulnerable as proprietary. With closed AI models, people are more likely to be manipulated by algorithms. Closed and proprietary AI models have been using our social media newsfeeds to control people. Algorithmwatch in Berlin sought to expose Instagram's algorithm but was intimidated by the corporation and had to shut down their project. Open-sourced AI is a safer approach to defend humanity from evil actors because of the many eyes concept. In open source the more sets of eyes you have on your code the more likely you are to find issues. This allows for peer review from a base of knowledgeable and expert supporters.
2. Open Source AI Is less Secure Than Proprietary AI
How data and the source code for the AI models are licensed has nothing to do with security. Security risks can come from both closed source and open source. In the end, developers make open source code is safe or not. Most open source projects have active communities that help them and check for bugs all the time.
Also, developers are concerned about their reputations and want to showcase code that adheres to the best standards. In addition, by open sourcing their work, they wish to identify and correct security flaws. For example, by making their training data available, they enable others to review the data for potential biases or to augment it with more diverse data. When IBM published a dataset of annotations on more than 1 million images to improve its understanding of bias in facial analysis, the company wanted to encourage research on this important topic and accelerate efforts to develop fairer and more accurate facial recognition systems. The public release earned IBM fierce backlash because its dataset violated privacy laws but sparked a public debate about how to handle training data that is under creative commons licenses. More importantly, the public publication allowed the community to review the data or use it to benchmark existing AI services. The advances in research eventually led IBM to decide to shut down its facial recognition business. This form of openness is to be celebrated, not criticised.
According to recent research papers, most of the current commercial AI systems in healthcare are trained on biased datasets making them less secure for minority populations. Closed AI models are a legal black box, and public access to training data is prohibited. Google fired brilliant researchers that warned that the data their models are trained on are prone to a geographical bias. It seems ethics go as far as the business model allows.
3. Open Source Development Is Done by Amateurs and Students
Some people think of open source contributors as non-expert developers who work at home during their free time. 20 years of open source history taught us that open source has become fundamental for technology innovations. Microsoft, which formerly saw open source as a form of communism or a cancer cell, integrated their developers deeply within the open source ecosystem. Within the AI community, the open source release of AI frameworks such as Google’s Tensorflow or Meta’s Pytorch has been created by leading AI researchers that worked for the most valuable companies. Without access to these frameworks, almost none of the applied AI startups now active on the market would have ever seen the light of day.
4. It’s Open Source, So I Can Do Whatever I Want With It
Open source code is licensed code. Open source describes a broad range of software licenses that make source code available to the general public with relaxed or non-existent copyright restrictions. However, most permissive open source licenses grant users the right to copy, but it doesn’t allow licensees to patent it.
As AI is much more complex, the open-source community is still learning how to adapt open-source rights and freedoms that act as a northern light for the open-source community and allows everyone to shape the technology. AI is a configuration of algorithms that self-modify and create new algorithms in response to input and data. Therefore the new responsible AI License (RAIL) from BigScience that was used for publishing their Large Language Model Bloom also focuses on all derivatives created by the model and puts it under the same open source license. This means any other AI model created or initialized by transfer of patterns of the weights, parameters, activations or output of the original model to the other model to cause the other model to perform similarly to the model falls under the same license use-based restrictions. The license goes beyond the classic copyright restrictions and adds a set of ethical principles that restrict the use of the model.
5. Open Source Is Not Sustainable
1000 researchers from 25 countries published their work open source in August. 73 million people utilize Github. According to the most recent Octoverse report, more than half of the contributors are employed by a private business, supporting what I said in the last newsletter: open source is not an attribute, but an integral element of the R&D strategy of any IT organization.
Leading IT businesses have supported and participated in open source projects for 20 years. OS projects operate on principles analogues to Darwinism and follow a 'survival of the fittest ethos. This idea may appear inefficient in terms of software engineering resources, but the collaborative, consensual approach simply creates better software. Allowing others to re-use data and trained models, on the other hand, does not appear to be a waste of resources.
Sustainable projects need sustainable communities. To adapt to a broader, more competitive open source ecosystem, enterprises must engage in community development. This requires a view of the availability of source code that is intrinsically linked to the social activities of individuals and companies in open source projects. Numerous corporations increasingly see open source community interaction as a "sociotechnical" (social and technical) investment.
6. Open Source Communities Can Not Define Global AI Standards
Open source drives global democratization. Developers worldwide may contribute to open-source eco-systems and not ego-systems, which means their development is consensus-based which is what makes standards. This global collaboration increases scalability and reduces local biases. Rapid innovation, changing standards, and everyday life criticality make open source and open source-based solutions the top option in emerging technologies. Today membership-based standardization bodies such as ISO or in healthcare HL7 / FHIR that sell their standards under license contracts and their documents as PDFs for hundreds of dollars are the opposite of open standards that can act agile and adapt to the rapidly changing environment.
7. When I Use Open Source AI I Won’t Get Funded
The initial public offering (IPO) for Red Hat, a corporation that built its empire on open source LINUX, took place more than twenty years ago. Their purchase by IBM for US$34 billion four years ago. Although the internet would not exist without open source, it took that long for commercial open source to start eating the world. Few investors appear to comprehend this momentum. Why? Most firms want to add customers or community to their technology assets, whereas open source enterprises contribute technology to the community. Closed AI companies value the asset, while open-sourced companies value their community. Community and technology are tough to combine. Working with your community will help your business grow. Health AI firms are part of the healthcare community, which has established ethical values since Hippocrates' oath 2,000 years ago. Companies that drive new ethical standards into established eco-systems will struggle to scale and achieve acceptance, hence the delayed adoption of AI in medicine. If you want to develop a startup with a short-term exit, it may not matter much since your investor will concentrate on generating valuable assets that can be sold, but if you want to conquer the globe, you may want to reconsider.
8. Open-Sourced AI And Free Software Are The Same
The metaphor that "data is the new oil" has deluded many individuals in the business, and they believe that the value of their product is completely produced by the unique training dataset and AI model they have developed. Any product manager that manages a product knows that this is pure nonsense. Physicians or researchers won’t promote an AI-based clinical decision support product or service with poor usability (lack of workflow integration) or desired functionality and even a lack of independent peer review publications. Your net promoter score or customer satisfaction score might even increase when using open-sourced AI. With ⅔ of the German startups having difficulties accessing data, open-sourced AI need to be looked at as a layer of shared R&D that will lead to greater access to data and resources. Open source's Darwinian-like characteristics may potentially result in speedier and more widely recognized industry reference models.
9. Open Source Data And AI Ecosystems Are Fatal To Human Prosperity
If Gutenberg's investor, Johan Fust , had patented and licensed every use of the alphabet, we could still be in the dark ages. The Dark Ages weren't dark because they were bad, but because our knowledge of them is limited. Those who think an economy thrives when information is restricted, and knowledge is behind paywalls need to go back to school. They think like medieval rulers and have not understood that all current digital opportunities were created because a few people within CERN the World Wide Web software with an open licence, a more sure way to maximise its dissemination. These actions allowed the web to flourish. Today, open, accessible and democratised medical knowledge is the problem that data capitalism strives to solve. If we are serious about creating AI that will benefit all of mankind and building the kind of future we want our children to inherit, openness is the only realistic course forward.
Share Knowledge, Empower Humans
Thank you for reading the Hippogram. You can help with our mission and share knowledge as well - forward this email at least to one person, or share it across your favourite social network.
Please Help Us With Your Donation!
We started our humanitarian mission to create artificial medical intelligence as a common good. To achieve this goal, we need your support and are incredibly grateful to all our supporters that believe in our mission. Every donation counts. Building data and AI commons are made possible by your donations.
Thank you for your support!
About Bart de Witte
Bart de Witte is a leading and distinguished expert for digital transformation in healthcare in Europe but also one of the most progressive thought leaders in his field. He focuses on developing alternative strategies for creating a more desirable future for the post-modern world and all of us. With his Co-Founder, Viktoria Prantauer, he founded the non-profit organisation Hippo AI Foundation, located in Berlin.
About Hippo AI Foundation
The Hippo AI Foundation is a non-profit that accelerates the development of open-sourced medical AI by creating data and AI commons (e.q. data and AI as digital common goods/ open-source). As an altruistic "data trustee", Hippo unites, cleanses, and de-identifies data from individual and institutional data donations. This means data that is made available without reward for open-source usage that benefits communities or society at large, such as the use of breast-cancer data to improve global access to breast cancer diagnostics.