Posted by admin on March 22, 2019

cover image of the book "Working as a Data Librarian: A Practical Guide"A data librarian is a librarian who specializes in helping people find and use data.  While I'm not a librarian myself, I do help librarians find and use data, so when I saw this book mentioned on Twitter, I was quick to hop on our local library's OPAC and request a copy.  Quick summary: a fantastic introduction to the strategies and tools used by data librarians!

The author of "Working as a Data Librarian: A Practical Guide" is Eric O. Johnson, a librarian at Kent State University.  Johnson starts the book with an introduction to the specialty of data librarianship and the various tasks entailed.  It's a relatively rare specialty, and the book mentions some of the other titles associated with the work, including "data services librarian," "data management librarian," and "digital projects librarian."

Chapter Two dives in to the topic of finding data (and helping patrons find data) with an introduction to the data lifecycle, data management plans, and more.  There is a broad survey of online data sources: government, academic, corporate, etc. and a short tour of using ASCII codebook files.  I particularly enjoyed the discussion of working with patrons to narrow down their data requirements for their project.  Chapter Three is entitled "Database Design" and provides a basic introduction to that topic, covering normal forms and then an introduction to basic SQL commands.  There are examples included but I feel that readers who don't already know SQL would benefit from more visual aids: it's a tough topic to teach someone in this format so briefly.

"Data Collection Techniques" is Chapter Four and it covers a very wide range of topics, beginning with collecting data online using spiders and scrapers and then moving onto APIs (Application Programming Interfaces).  There are plenty of examples included for readers to examine.  There is then a fairly comprehensive section on designing surveys for research purposes, including both practical and ethical concerns.  The chapter concludes with an impressive deep dive into statistical analysis: basic concepts are illustrated with numerous example tables and charts and popular software tools are covered.

I do have a mild complaint with Chapter Five, "Using Data for Visualization," in that the topic on visual representations of data only includes two images: a single diagram and a single map.  The content is great - various types of charts and diagrams are discussed and the appropriate uses of each - but some examples would have helped.  This chapter also included an interesting overview of digital humanities where I learned a lot -- my previous experience with this is limited to georeferencing historical maps and the associated data.  But I also learned about textual analysis via computerized mining and analysis of historical texts.

Chapter Five goes on to cover my favorite topic, geographic information systems (GIS) and mapping.  I consider myself somewhat of an expert on this topic and I was not surprised to find the coverage of this topic generally solid for an introductory text.  Underlying geographical concepts like datums and projections are explained, as well as fun tech stuff like georeferencing and geocoding.  Various types of map visualizations (heat maps, choropleths, etc.) are discussed - but as mentioned above, only one example is included.  One other small complaint: the coverage of actual GIS software is limited to just a couple of paragraphs explaining Esri's dominance and listing a few free open source titles.  I think the book would have benefited from some coverage of the open source GIS desktop apps as well as the newer crop of web services, such as MapBox and Carto.

As somebody who works almost exclusively with public libraries and not academic libraries, I felt that I wouldn't get much from Chapter Six - "Using Data for Scholarship" - but I was pleasantly surprised to see that there was a lot of great content that applies.  General discussion of topics like metadata, copyright, intellectual property is solid, and I especially appreciated the Creative Commons and licensing discussion.  There is also a great introduction to institutional repositories - something that I didn't think applied to public libraries but actually has some relevance: libraries and their patron communities are creating content, data, and other works that should be preserved just like scholarship in academic libraries.  The long-term digital preservation of these works completes this chapter.

Chapter Seven - "Public Programs" - might have the most appeal for public librarians that are not huge data geeks like me.  There's a brief history of libraries and reference desks and a discussion of various public programming options, but the bulk of this chapter is a deep dive into 3-D modeling and printing for makerspaces and fab labs.  Everything from file types in modeling software to specific materials for printers is discussed - for a newbie like me, it was very helpful.

The book ends with a concluding chapter to wrap things up, and also includes a departmental data strategy plan which could easily be adopted for a library data program.  It also includes some very helpful appendices, including a ton of data catalogs that I hadn't heard of before and a listing of tools and resources.

If you couldn't already tell, I'm a big fan of this book.  If your library is considering a data program or project, it will definitely be your secondmost helpful resource -- after us, of course!