<style type="text/css"> li, h2, h3, h4, .markdown-body.slides, .reveal .slides { text-align: left; } .alert { padding: 10px; } li { line-height: 1.22; } h3 { padding-top: 0px; margin-bottom: 0px; } .medskip { height: 20px; } </style> <div class="alert alert-info"> # Six décennies # de logiciels scientifiques libres </div> <div class="medskip"/> [Nicolas M. Thiéry](https://Nicolas.Thiery.name/) <div class="medskip"/> Laboratoire Interdisciplinaire des Sciences du Numérique Université Paris-Saclay <div class="medskip"/> <div style="text-align: center;"> Open Science Days @ UGA Codes et logiciels de recherche 13-15 décembre 2022, Grenoble, France </div> --- Note: ## Résumé Le logiciel libre entretient une profonde relation avec le monde de la recherche depuis des décennies, comme outil, comme produit, comme objet. Il serait bien présomptueux de vouloir en brosser un paysage exhaustif en quelques dizaines minutes, d'autant que les besoins et les pratiques varient considérablement d'un domaine à l'autre; ce sera le travail des historiens. Nous nous contenterons donc ici de mettre en exergue quelques tendances de fond -- typologie de logiciels, modèles de développement et de financements, ... -- qui ont ponctué cette histoire, pour initier le partage d'expérience avec les participants. Le titre de cet exposé lui-même est un clin d'œil à un autre exposé aux journées du logiciel libre à Lyon organisée en 1999 par l'ALDIL. --- ## Caveats and biases <div class="alert alert-warning fragment"> **Nothing but a personal testimony** </div> <div class="alert alert-warning fragment"> **My world, my biases** - Reborn to Libre Software in 1992 - Jean Thiéry @ ALDIL 1999: quatre décennies de logiciels [...] scientifiques libres - Computer Science for Mathematics - Software as a **research tool** more than a **research outcome** or **research object** - GNU/Linux, Python, SageMath, Emacs, Conda, Jupyter, Mutt, ... </div> --- ## Another annecdote <div style="text-align: left;"> <img class="r-stretch" src="https://Nicolas.Thiery.name/haut2.jpg"> </div> <div class="alert-success fragment"> **Lesson learned the hard way** Typical code for research: a thin layer of pixie dust on top of a pile of generic stuff </div> ---- ### Lesson learned the hard way: when you fail to be FAIR <div class="alert alert-danger" fragment> - I could not **Find** my own best friend's code! :cry: - It needed generalization - I could not **Access** his code: - It was not published - I did not have a Maple license - Anyway Maple and MuPAD were not **Interoperable** - Thereby, we could not **Reuse** each others code </div> <div class="medskip"/> <div class="alert alert-danger fragment"> **A shame**: by **sharing** we could have saved ~50% of development time Meaning **more research** :microscope: :medal: (and more juggling :tada:) </div> ---- ## `*`-Combinat: Sharing Algebraic Combinatorics software since 2000 <div class="alert alert-success"> - Apply induction from two to a community! - By bringing libre software and best practices to research software </div> <div class="medskip"/> <div class="fragment"> Don't get me started on this ... </div> --- ## A brief historical perspective ---- ### 1960's: primordial *structured* libre research software <div class="alert alert-info"> - [FORTRAN](https://en.wikipedia.org/wiki/Fortran#Evolution) (Formula Translating System) gains adoption $\Longrightarrow$ portability and simplicity (**Interoperability**:+1:, **Reuse** :+1:) - Punch cards $\rightarrow$ Tapes </div> <div class="medskip"/> <div class="alert alert-success fragment"> **A pioneer [QCPE: Quantum Chemistry Program Exchange](https://www.theochem.ru.nl/files/local/bk-2013-1122.ch008.pdf)** - Mission: index, archive and distribute programs in Quantum Chemistry, and beyond! - A newsletter advertises new additions (**Find**:+1:) - Ships copies of the program for cost of operation fee (**Access**:+1:) - Builds a community, organize workshops (hackathons!) - Most programs were effectively libre software: (**Reuse**:+1:) freedom to use, scrutinize, modify, and distribute modifications - Hundreds of programs (typically 1-2 authors, 100 lines) </div> <div class="medskip"/> <div class="alert alert-danger fragment"> - **Urgent task: collect and archive the QCPE software!!!** </div> Note: - ... - ... - ... ---- ### 1970's: Early libraries <div class="alert alert-danger"> **Practical limitation** (**Reuse**:-1:) - Sharing pattern: distribute programs - Reuse pattern: copy and adapt $\Longrightarrow$ <span class="fragment">**does not scale!!!** </span> </div> <div class="medskip"/> <div class="alert alert-success fragment"> **Example: LinPack** - A collection of FORTRAN subroutines for Linear Algebra - Modularity (**Reuse**:+1:) </div> <div class="medskip"/> <div class="alert alert-success fragment"> **Example: BLAS** (1979) - Basic Linear Algebra Subroutines (library $\longrightarrow$ interface) - Even more modularity (**Reuse**:+1:) </div> Note: - ... - ... - ... ---- ### 1980's: Scientific computing at the fingertip of researchers <div class="alert alert-info"> - Generalization of personal computers $\Longrightarrow$ A researcher can have a desktop and use it for interactive computation. </div> <div class="medskip"/> <div class="alert alert-success fragment"> **Example: Voyons (Jean Thiéry, CEA)** - Integrated interactive software for statistics, modeling, simulations, visualization - **Innovations:** - **Target** non specialists - **Coconstruct** by participating to the research - **Reuse** across diverse research projects: NMR spectrography, agronomy, ... - **Open source:** complete control on the algorithms - **Credit by citation** - Early forms of **Agile development** and **Research Software Engineer** </div> <div class="medskip"/> <div class="alert alert-danger fragment"> **How to scale?** - Requires reaching a critical mass - Lack of collaboration means $\Longrightarrow$ collocated team </div> Note: - ... - ... - ... ---- ### 1980's: Scientific computing at the fingertip of researchers (continued) <div class="alert alert-info"> - Computers on anyone's desk - Technology ripe for "user friendly" programming languages </div> <div class="medskip"/> <div class="alert alert-info fragment"> $\Longrightarrow$ Potential for a **mass of users**! $\Longrightarrow$ It's worth **investing** </div> <div class="medskip"/> <div class="alert fragment"> ***Archetype: MatLab turns to a commercial product*** - General purpose numerical computing environment - Wraps numerical libraries (LinPack, ...) - In a tailored programming language - $\Longrightarrow$ brings computing to the masses, e.g. teaching :+1: - $\Longrightarrow$ generates revenue to fund a collocated team of developers :+1: - $\Longrightarrow$ critical mass </div> <div class="medskip"/> <div class="alert alert-danger fragment"> - $\Longrightarrow$ silo between developers and users :-1: - $\Longrightarrow$ environment silo :-1: </div> <div class="medskip"/> <div class="alert alert-info fragment"> "the hardware is the product" $\longrightarrow$ "the software is the product" </div> Note: - ... - ... - ... ---- ### 1980's: A new Hope <div class="alert alert-success fragment"> ***Example: GAP: Group, Algorithms and Programming*** - A community gets together and decides to share - Developed by users for users - Dedicated programming language - Library - Packages </div> <div class="medskip"/> <div class="alert alert-success fragment"> **Libre software is formalized** - A response to closing sources hurting **ethics** and **practice** - Freedom to **use**, **scrutinize**, **modify** and **redistribute modifications** - Remember: copyright is about balancing the needs of both authors and users </div> Note: - ... - ... - ... ---- ### 1990's: Scientific computing for the masses <div class="alert alert-info"> - Internet for the masses: web, chat, forums, mailing lists, ... </div> <div class="medskip"/> <div class="alert alert-success fragment"> - Systems gain momentum (**Access** :+1:, **Reuse** :+1:) - Much easier to build communities and user groups - Online archives of user contributions: (**Access** :+1:, **Reuse** :+1:) CPAN, CRAN, CTAN, ... Maple shared library, ... </div> Note: - ... - ... - ... ---- ### Late 1990': A growing frustration #### Ethical concerns <div class="alert alert-danger"> > You can read Sylow's Theorem and its proof in Huppert's book in the library, then you can use Sylow's Theorem for the rest of your life free of charge, but for many computer algebra systems license fees have to be paid regularly ... > With this situation two of the most basic rules of conduct in mathematics are **violated**: In mathematics **information is passed on free of charge** and **everything is laid open for checking**. > [name=Joachim Neubüser (started GAP in 1986)] [time=1995] </div> Note: - ... - ... - ... ---- ### Late 1990': A growing frustration (continued) #### Practical concerns <div class="alert alert-danger"> - Silos by system: license, language, community - Silos by role: developers / users - Silos by institution and physical location $\Longrightarrow$ **Fragments the community and the forces** </div> <div class="medskip"></div> <div class="alert alert-danger fragment"> - Increasing institutional pressure to **valorize** research software **as commercial products**, **as closed** $\Longrightarrow$ **Killed many cool pieces of software** :skull: </div> <div class="medskip"/> <div class="alert alert-danger fragment"> **Reuse**:-1: **Sustainability**:-1: </div> Note: - ... - ... - ... ---- ### 2000's: The return of libre computing <div class="alert alert-info fragment"> - "User friendly" general purpose programming languages: Python, Perl, ... - Software Forges (SourceForge, ..., GitHub, GitLab, ...) + more best practices + physical ubiquity $\Longrightarrow$ massive collaboration </div> <div class="medskip"/> <div class="alert-secondary fragment"> **Question**: - Viable libre software development models for large systems? - "by users for users"? </div> <div class="medskip"/> <div class="alert alert-success fragment"> - The Scientific Python stack challenges Matlab - SageMath challenges Maple and Mathematica - R challenges S, SAS, ... - ... </div> Note: - ... - ... - ... ---- ### 2010's: libre scientific software at scale <div class="alert alert-info"> - Social networks, cloud infrastructure, and services for the masses - More best practices - Open Science gets momentum and recognition by institutions - Multiplication of devices (tablets, "smartphones", ...) </div> <div class="medskip"/> <div class="alert alert-success fragment"> **A massive international collaboration across academia, industry, and more** **On digital commons** </div> <div class="medskip"/> <div class="alert alert-success fragment"> **Supported by infrastructures, best practices, funding, ...** - Massive modularity across systems (**Compose**, **Reuse**:+1:) - Software forges (**Find**, **Access**:+1:) and collaborative tools (**Community**:+1:) - Package management and hosting (conda, guix, pip, node, ...) (**Find**, **Access**, **Reproduce**:+1:) - Archival: Software Heritage (**Find**, **Access**, **Credit**, **Legacy**:+1:) - Virtual environments (**Access** :+1:) - Literate Computing (**Access**, **Reproduce**:+1:) - Community building: training, workshops, hackathons (**Community**:+1:, **Environment**:-1:) </div> <div class="medskip"/> <div class="alert alert-success fragment"> **And the [Research Software Engineer](https://society-rse.org/) (RSE) movement!!!** </div> Note: - ... - ... - ... ---- ## 2020's: The next challenges? <div class="alert alert-info"> - From physical ubiquity to virtual ubiquity (**Environment** :+1:, **Joy** :-1:, **Community** ?) - Massification of Machine Learning; impact? - Growing digital Commons of tens of thousands of packages to compose from :+1: Built by hundreed of thousands of developers worldwide :+1: </div> <div class="medskip"/> <div class="alert alert-warning fragment"> **Will that scale?** - Complexity (**Reuse** :-1:) - Potential compatibility nightmare (**Sustainability**:-1:) - Reliability? [e.g. nodejs' breakages](https://en.wikipedia.org/wiki/Npm_(software)#Notable_breakages) (**Sustainability**:-1:) - More Silos (**Reuse** :-1:) - Environmental impact??? </div> <div class="medskip"/> <div class="alert alert-warning fragment"> **Libre software is ubiquitous**, but too often under the hood - What about the applicative layer? - What about fair and ethical services? **In particular for less tech-savvy audiences** </div> <!-- Enjeux de complexité, de qualité, de formation Mouvement des ingénieurs de logiciels de recherche Comment financer? Conciliation généraliste versus performant, interprété versus compilé TODO - Continuum utilisateur <-> développeur - ... !--> Note: - ... - ... - ... --- ## <a name="messages"></a>Take home messages ---- ### Open Science and research software <div class="alert alert-success"> - **A decades long joint history**; finally recognized by institutions! </div> <div class="alert alert-success fragment"> - **Software raises very specific Open Science challenges** (it's not just another type of data) Notably: - Software is a social construction - Software is a living object ($\Longrightarrow$ ecosystems of software) - Software is complex, composed - ... </div> Note: - ... - ... - ... ---- #### A long track record of Open Science Best Practices for software <div class="alert alert-success"> - **Findable:** Barriers: complexity: which function XXX of package YYY solves problem ZZZ? Levers: documentation, introspection, training, social networks, ... - **Accessible:** Barriers: complexity, institutions, ... Levers: public forges, package managers, repositories, archives, training, time, ... - **Interoperable:** Barriers: architecture, languages, systems, institutions, ... Levers: source code, standards (e.g. webassembly), virtual environments, remote procedure calls (bind & adapt), semantic, commons, ... - **Reusable:** Modularity, quality, **build reproducibility**, **execution reproducibility**, **litterate computing**, training, ... </div> <div class="medskip"/> <div class="alert alert-success fragment"> Beyond FAIR: **Sustainable**, ARDC, ... see Roberto's talk </div> Note: - ... - ... - ... <!--TODO: - cite paper on initial work of FAIR for software !--> --- ### Open Science policies for research software <div class="alert alert-success fragment" style="text-align:center"> - Given appropriate means and training, scientists are in general sympathetic to Open Science, when not enthusiasts </div> <div class="medskip"/> <div class="alert alert-danger fragment" style="text-align:center"> - Which best practices are relevant depends enormously on the piece of software </div> <div class="medskip"/> <div class="alert alert-info fragment" style="text-align:center"> **Support and foster Open Science Best practices for Software** **Don't impose any of them** unless absolutely necessary to counter-balance other higher forces </div> <div class="medskip"/> <div class="alert alert-success fragment"> If in doubt, ask the Software Charter of the CoSO (Comité pour la Science Ouverte) </div> ---- ### Research Software Engineers <div class="alert alert-info"> - Research software development **by-users-for-users** can work very well - However support from **Research Software Engineers** makes a huge difference: - train the community - give advice - tackle highly technical tasks - maintain - ... $\Longrightarrow$ A continuum between research software engineers and researchers </div> <div class="alert alert-info fragment" style="text-align:center;"> **Recognize software development by all** **Ease flexible access at all time scales to Research Software Engineers** **Promote career paths for Research Software Engineers** </div> ---- ### Funding <div class="alert alert-info" style="text-align:center;"> **Fund basic scientific software development and in particular <em>Software Maintenance</em>** </div> <div class="alert alert-danger fragment"> Project based funding has its limit: - Unpredictable - Tension with career paths - Huge overhead for the community </div> <div class="alert alert-info fragment" style="text-align:center;"> **Promote recurrent funding** </div> <!-- ## Enjeux Pourquoi réutiliser Barrières à la réutilisation - Facile à trouver - Accessible: - Enjeux d'interopérabilité - Enjeux de silos - Enjeux de formation - Enjeux institutionnels - Enjeux de propriété intellectuelle - Enjeux de qualité - Enjeux d'adaptabilité ## Contenu - Évolution du regard des collègues, des institutions - Évolution de complexification: quelle organisation est en capacité pour: - porter le dev? - porter la diffusion - Évolution de petits logiciels indépendants -> écosystèmes modulaires - Enjeux des silos - Enjeux des institutions? !-->
{"tags":"Software, Libre software, Scientific software, History, Talk","type":"slide","slideOptions":{"transition":"none","theme":"white","width":"95%","height":"100%","margin":0,"minScale":1,"maxScale":1,"centered":false,"slideNumber":true}}