Using NotebookLM for research with digitised historical newspapers
This is the third post in a series of reflections about how historians can engage with generative AI. In the first, I argued that we should abandon an attitude of pure suspicion about students trying to cheat on us and instead try to focus on understanding how different generative AI tools actually work, what they can and cannot do. The second post explored Google’s NotebookLM, highlighting its ability to avoid “hallucinations” by grounding answers only in uploaded sources, while also noting the risks it poses to the skill of critical reading. In this post I return to NotebookLM but I move from its use with secondary sources (monographs, academic articles, etc.) to its capabilities with primary sources, in particular digitised socialist newspapers from the late nineteenth century. This post also summarises the content of a paper I’m presenting at the 55th Annual Conference of the International Association of Labour History Institutions (IALHI), held at the International Institute of Social History in Amsterdam in September 2025.
It’s easy to forget how radically mass digitization changed historical research in less than a generation. When I started carrying out research work, around 2007, almost every primary source I could use was physically bound to place, clustered in archives and libraries. What I could work with was directly related to the possibilities of being physically present at those archives or libraries, and of course to the holdings of the archives themselves, meaning I was largely confined to the limited collections of underfunded repositories in Buenos Aires. Consulting a collection of European or Asian newspapers was simply unthinkable. Beyond the fact that I could carry a laptop to the archive and could take photographs of a primary source to analyse later (another crucial revolution that took place a decade or two prior), my relationship with the archive was not fundamentally different from that of a historian working in 1975 or 1950.
Over the last two decades, an explosion of digitization projects has made vast amounts of material available remotely. Furthermore, the development of Optical Character Recognition (OCR) turned many of these scanned sources into searchable ones. In a brilliant article, Lara Putnam reflected almost ten years ago about the epistemological and methodological consequences of this transformation. She argued that this “digitized turn” had been the “unacknowledged handmaiden of transnational history”. Suddenly it became possible to retrieve primary sources from across the globe, without ever travelling to the archive. Moreover, historians were not limited to browsing the scanned images, as we do with paper or microfilmed copies, but could also search for terms within the collection. The ability to locate a name, a phrase, or an idea across thousands of publications at once dramatically increased the speed and scope of discovery.
Putnam also warned about the “shadows” cast by these new methods. She argued that the very efficiency of digital search across OCRed collections brings the risk of overemphasizing the importance of which is most easily found while undermining slower, context-bound forms of knowledge that only deep engagement with a physical archive can provide. Digital discovery allows us to bypass the slow process of place-based learning that was once crucial to our practice. In doing so, we risk producing work that is radically decontextualized, and losing the multi-causal awareness that came from immersive reading in an analogue archive.
Apart from these epistemological challenges, keyword searching also involves technical risks. False negatives are common: an occurrence may be missed because of poor OCR quality, or simply because contemporaries used a different term. In the case of the primary sources I usually work with, historical newspapers from the late nineteenth century, they often have poor OCR accuracy due to old typefaces, ink bleed, paper degradation, and complex layouts. Keywords get mangled: “labor” becomes “1abor”. Moreover, a keyword search for “unemployment” might miss articles about “men out of work,” “idle labourers,” or “the surplus population.” False positives are equally frequent: a keyword search for “strike” returns theatre reviews mentioning actors who “strike a pose.”
In the context of the current generative AI development, the arrival of Retrieval-Augmented Generation (RAG) systems like NotebookLM represents the next, and arguably more profound, stage of this transformation. The established practice of searching for keywords in large databases of newspapers can now be supplemented by semantic querying through generative AI. The key difference is that now it is possible to operate not just on keywords, but on semantic meaning. Instead of finding exact strings of characters, a tool can retrieve concepts. What follows are some reflections from my initial experiments. It is important to stress that these are based on work with digitised historical newspapers rather than handwritten sources. Significant progress is also being made with handwritten material, but that is beyond the scope of this post.
***
The technical threshold for experimenting with these tools is strikingly low. NotebookLM offers an intuitive, ready-to-use interface that requires no coding or specialised software. You only need to upload a PDF and can start “asking questions” about it. It even has text recognition capacities (i.e. you can upload a source that has no OCR and NotebookLM will do a decent work recognising the text). Yet the devil is in the details. To use the tool effectively with large collections (without which it makes little sense as a discovery tool), several necessary and time-consuming preparation steps are unavoidable. To start with, you need to have the sources as PDF, ready to upload. Second, NotebookLM has limits in terms of how large each file can be, how many PDFs can be uploaded in a single notebook, etc. Furthermore, it is strongly recommended to use a specialised OCR tool (if the sources are not already OCRed) before uploading. In practice, this meant downloading full runs of several historical newspapers to my local hard drive, applying baseline OCR to those that lacked it, and then splitting multi-year sequences into smaller chunks that fit within NotebookLM’s constraints on file size and source count.
Rather than searching for specific words, I could now ask NotebookLM open-ended questions on topics of interest. Imagine, for instance, uploading the full collection of Die Neue Zeit (the theoretical journal of German social democracy, edited by Karl Kautsky since 1883) alongside the complete run of Vorwärts, the party’s main daily newspaper, and posing a question such as: “What were the stances of these publications on unfree labour?” It is a complex question in many respects, and one that can certainly not be posed using a text-search tool.
The initial results were impressive. The answer generated in the notebook containing a couple of years of Die Neue Zeit issues pointed to pieces on Russian forced labour in the Siberian gold mines; on the “sweating system” in England; on chattel slavery in the southern United States; on the “truck system” and “wage fraud” in Argentina; and on corvée labour in Romania, but also to texts that characterised the capitalist labour contract itself as a “relationship of domination and servitude” in capitalist society. In the case of Vorwärts, the answer distinguished between practical campaigns and “ideological stances,” pointing, among many others, to articles about agitation against the “miserable conditions in child labour,” reports on excessive working hours and hazardous trades, and critiques of liberal attempts to reconcile the “complete unfreedom of the worker” with principles of freedom. All references included direct citations and clickable links to the original sources. Note also that the “conversation” with the tool was in English while the sources are in German and printed in Fraktur typography.
Just consider how long it would have taken to locate all these discussions by manually browsing the pages of the journal (unless, of course, one is based at a wealthy university in the Global North with access to a team of multilingual research assistants). Or how difficult, complex and error-prone it would have been to perform word-searches to discover all these occurrences. In a matter of seconds, the tool produced a trove of relevant references that might otherwise remain undiscovered.
***
But here is where Putnam’s warnings about the “shadows” become critical. Most obviously, we are “discovering” occurrences without mastering the contexts (even the language!) in which they appear. The risk of misunderstanding an article is high, as is the temptation to cherry-pick the snippets that fit our argument. Putnam analysed this danger convincingly. Yet there is also a new risk, one that emerges specifically from using NotebookLM with digitised newspapers.
My first experiments revealed that outcomes vary dramatically depending on the format and layout of the source. A theoretical journal such as Die Neue Zeit performs remarkably well in NotebookLM. Its structure resembles that of a book: each issue consists of long, self-contained articles, printed sequentially with clear titles and bylines. When NotebookLM “chunks” a source like this, each chunk almost always corresponds to a coherent argument. The linear, single-column layout preserves structural integrity, making summaries and analysis reliable.
The book-like, single-column format of Die Neue Zeit. This linear structure preserves context, making it a more reliable source for AI-driven semantic analysis.
The dense, multi-column layout of Vorwärts. An AI tool, blind to visual structure, may incorrectly merge text from an editorial and an adjacent advertisement, leading to “context collapse”
The situation is different when we turn to a daily mass-circulation newspaper like Vorwärts. A newspaper page is a carefully constructed visual object where layout conveys meaning. For the AI, however, the typical page is simply a dense “wall of text.” The tool ingests the text by “chunking” it into decontextualized passages, a process that destroys the contextual integrity of the page. A single page contains a multi-column collage of unrelated items (editorials, strike reports, reader letters, advertisements, even serialized fiction) arranged side by side. NotebookLM’s chunking process struggles to separate one story from another.
The consequences can be serious. NotebookLM can confidently summarise a political stance by stitching together sentences from two different articles—for instance, a party editorial and a critical letter to the editor; or even worse: the speech of a socialist leader in Parliament and the reply of a conservative senator, both printed in the same page. Temporal flattening is equally common: it can blend quotes and events from different days, weeks, or even years, erasing their historical specificity.
These shortcomings can partially be avoided by fine-tuning the prompts. Rather than the previously mentioned prompt, “What are the stances of this publication about unfree labour?”, a better option is something like “Identify occurrences of chunks of text discussing unfree labour. Treat each chunk separately. For each one, provide a summary of the topic, the main argument, and the context”. The result is a much useful list of occurrences, that avoids conflation and allows for further exploration by close reading the source directly.
Yet even with carefully adjusted prompts, NotebookLM’s analytical capacity remains significantly weaker with large corpora of digitised newspapers than with monographs or academic articles. Being aware of its limits puts us in a better position to appreciate its strong research potential. My main conclusion is that NotebookLM is very powerful as a discovery tool, even for sources with layout problems. It outperforms keyword search in surfacing unexpected themes and connections, though keyword search still remains indispensable as a complement. Perhaps the best practice for discovery is a hybrid one: use semantic search to discover themes and connections you hadn’t considered, then use keyword search to systematically verify and complete your findings within specific timeframes.
For the analysis, close reading remains indispensable. RAG tools like NotebookLM can operate as an extremely smart polyglot research assistant that brings up leads, themes, and overlooked material across oceans of print. But once a promising snippet is found (one that that would otherwise go unnoticed in our research, due to the large size of the collections), the logical method is to click the citation, return to the full page, and engage in close reading and contextual interpretation.





Thanks Lucas, for another helpful post! I believe that historians engaging with computational methodologies should develop and adopt rigorous validation protocols, aligning with established best practices in the field of Computational Humanities. For instance, after the automated analysis of a substantial textual corpus (e.g., 1,000 pages), a crucial step would be to perform a qualitative assessment on a randomly selected subset of the data (e.g., 50 pages). This process involves a meticulous close reading by the researcher to juxtapose their own findings with the output generated by the tool. Such a comparison allows for an evaluation of the algorithm's accuracy, including its potential for false positives or negatives. Consequently, scholarly articles leveraging these technologies should include a methodological section detailing this verification process to ensure transparency and reproducibility.