Blog Post #5 – The End

1  

THE END

Looking Back

History 303, The Digital History, has almost come to a warp. After revisiting the first post, I have realized that my appreciation for the current state of digital history has drastically increased. The tools mentioned in the first post such as Captcha, reCaptcha and Duolingo have become an integral part of my daily experience in academics and leisure. The power of crowdsourcing, Zotero’s effectiveness in extracting citations, and other online research platforms introduced in the course have completely changed my research methods and efficiency. Before History 303, I had never used Google Books for research. To look for academic sources, University of Waterloo’s library search was the only platform I had used. WorldCat and JSTOR were great additions to my arsenal. However, one of the biggest lessons I learned in this course was the importance of copyrights. Using creative commons to mine legitimate pictures from Flickr and the conscientiousness to properly cite the online sources despite the easy accessibility are important habits to keep in mind.

Stone, Parchment, Paper, 0 and 1

2    3

64

Preserving records online had never seemed controversial or troublesome before. But after going through all the readings of the problems that digital preservation inherently carries, my awareness on this subject has changed. Ian Milligan said, “preserving records become more difficult as technology develops”. For example, a jump from floppy disks to cds forced a British institution to rebuild a matching readable device to extract the data. Imagine the technological separation between now and twenty years down the road. Due to online records’ vulnerability from human intervention, natural disasters, and political and historical agenda, history on digital platforms don’t appear to be safe. However, isn’t this the same with paper?

Whether we agree if the physical libraries would disappear in the future, digital history and its paradigm will continue to develop and evolve; the gateway will be widened by the new generation. Information Technology will continue to change the way we learn, preserve, and redefine history. Since the new platform allows for a more cooperative academics, unlike the conventional history fixed on paper, history will be revised more frequently. Granted, popular history will gain majority of the interest and sources. Economy and politics will determine the lens the future generation see through. But again, history has always been a record of winners. At least digital history could allow a more freely engaging interactive platform for the history enthusiasts to exchange their views.

Digital preservation is not the first platform revolution that has taken place in history. The transition from the stone tablets to parchments and to paper were as controversial as the advent of 0 and 1. As writing materials, stone tablets and parchments last much longer than paper, but in the end, paper won the battle. Ancient writers loved parchments despite the relatively early advent of paper. They were light in colour, flexible, smooth, sturdy, and even possible to erase ink! The transition from parchments to paper caused a similar discourse in medieval times. It is important to note that a change in any platforms rely on social, economical, and technological settings of that time

“By about 1400 it become a relatively common medium for little volumes of sermons, cheap textbooks, popular tracts, and so on. As late as 1480 a ruling of the University of Cambridge stipulated that only books on parchment could be accepted as security for loans. Paper was evidently thought to be too insignificant. It was the invention of printing in the I450s which transformed the need for paper, and by the later fifteenth century it had become so infinitely cheaper than parchment that it was used for all but the most luxurious books.” Materials and Techniques of Manuscript Production

     Digital history is currently winning in all three categories. History is written by winners and digital history will prevail on top.

5

2

Advertisements

Blog Post #4 – Venom of Python

python

 

 

Encountering a Wild Python: First Ever Computer Programming

     My first ever encounter with a computer language was a disaster. As a classmate in History 303 confessed, it was difficult to even grasp an understanding of what python was.

Fig. 1

1

          From simply “printing” Hello World and creating a html page, the level of difficulty suddenly jumped. The real confusion started from defining functions. While I understood what defining function do and meant, the speed of class progression left me behind despite the long personal reviews after class. But more importantly, personal interest played a bigger part. For example, I was able to completely understand the process of taking the National Basketball Association (NBA) stats online into a html and a textual format.

Fig. 2

2ESPN NBA’s all time MVP (most valuable player) award list in a text format.

Fig. 3

3
 ESPN NBA’s all time MVP (most valuable player) award list copied and pasted without python.

Comparing fig. 2 and 3 proved that by using python, I can obtain statistical data in a textual format that can easily be modified and posted online unlike fig. 3, where excessive fluffs have to be edited. There are many sites that do not allow copy and pasting. This method will bypass many restrictions. However, the real purpose of the above “exercise” was to categorize the MVP winners in the history of NBA that scored less than 30 points per game in that season. Unfortunately, I did not possess necessary programming knowledge to accomplish that task.

Fig. 4

4
http://programminghistorian.org/lessons/manipulating-strings-in-python

Fig. 5

5
Practicing string manipulations.

      The most difficult aspect of python was manipulating python strings. I followed the guide on The Programming Historian and understood what manipulations do. However, I didn’t understand its meaning and importance. Personal interest played a big role, but I failed to apply string manipulation into anything meaningful. Lack of fundamental understanding of programming was clear.

 

 

Programming for Historians

     A driver does not need to know the mechanical blueprint of cars to drive them. However, anyone would benefit from knowing how to change a tire and an engine oil. Programming is the same. There is no need to become a master programmer to be entitled a Digital Historian, but it is, nonetheless, beneficial to harness the power of any computer language.

     Indeed, we are living in a digital age.

     One of the biggest challenges in learning a new language is age and familiarity. If students were exposed to programming since elementary school, the programming portion of History 303 would have been easier to understand. Or, too easy, according to a classmate, Shaun.

     Programming is a valuable approach for historians and to the discipline of history. At this moment, millions of records are being digitized. However, programs that historians use to extract digital resources are written by engineers with no historical background. What key words to take out? How should the program categorize and visualize historical content for users? These are all important questions to ask. Simply playing with wordpress.com, neatline, and google maps might not be enough in the future.

      A great advantage for the programming historians is not having to dig into hundreds of books for content. Although reading through The Programming Historian took many hours just to follow the basic instructions, as soon as the language clicks in our brain, the “puzzle” should come together.

     Is computer language merely a culture? No, information technology platforms are extending beyond its original field. More of the world is falling under its dominance and will only continue to grow. Massive demand but short supply of programming knowledge  is inevitable.

     In the near future, to be called a digital historian, one will be “required” to possess a programming background.

Blog Post #3 Visualization of Data

 Science Article: Quantitative Analysis of Culture Using Millions of Digitized Books

Science googlelabs

     Simplicity is good. But how simple can study of history become? The great minds of the past were venerated through their literature; now we aim to categorize, define, simplify, and claim great understanding of their culture and language. This is the objective of quantitative analysis of digitized text.

     Culture decides which subject to become prominent and language influences our vocabulary in speaking and recording the subject (Jean-Baptiste Michel et al). Cultromics focuses on cultural and linguistic phenomenons in english language throughout history. With this method, we can learn new history out of the old. Without manual labour, we can ask great questions like: When did the F – word appear? When did slavery become a social issue in Europe? How were women’s sexuality represented in Victorian times? The most popular person in 1800s? How did Nazi Germany censor literature, and what can we learn about Germans’ true feelings toward Hitler during World War Two? All interesting questions. However, there is but one problem.

     We are limited by “selection” of sources. Who select what sources? For example, the article stated that 4% of all books ever printed have been digitized into a corpus. Since majority of scanned texts have been selected and scanned by a third party, we are bonded by their interests. Our knowledge, therefore, is geared by selective historians. Or are they historian? They might be engineers for all we know. Culturomics will create biased perspective when “authorities” decide the value of sources. However, what isn’t biased and subjective? Protagoras once said, “man is the measure of all things.”

     My favourite part of the article was fame. According to the article, more people are more likely to be famous than the past. Makes sense. High literacy, development of media, and technology mean higher probability of fame. Let’s just remember to avoid mathematicians as a “route to fame” (Jean-Baptiste Michel et al). The public doesn’t seem to favour them.

 

Google N-Gram

Fig 1. Women in university

Women in university
“The first woman to gain honours in a University examination which was intended to be equivalent to that taken by men for a degree was Annie Mary Anne Henley Rogers. In 1877 she gained first class honours in Latin and Greek in the Second Examination for Honours in the recently instituted ‘Examination of Women.” – Oxford Online Archive

     According to Oxford University Archive, women started attending classes and taking examinations from the late 1870s. This was the later part of Victorian age where democratic values were strengthening. Frequency suddenly drops between 1910 – 1920, reflecting social interests to World War One, and skyrocketing back afterwards. Historians recognize that women’s rights dramatically increased after the war.

Fig 2. Isaac Newton

Isaac Newton
The one who came up with calculus during school vacation. Mr. Gravity’s graph is very interesting. According to the graph, he was alive in 1608, which can’t be right. There must be a namesake. I was quite surprised of how often a namesake was mentioned in literature. The high spike in the graph shows belated appreciation for Isaac Newton’s work near and after his death.

Problem with N – Gram:

     A slight problem we can see here is the emphasis on general cultural phenomenon over the historic person. Women + university phrase was never used before 1800s. No scholars would have written about it. Therefore, it’s easy to connect the dots for “the first women to graduate university,” which I was looking for. However, Isaac Newton’s results surprised me. A namesake means the probability of other “doppelgangers” before or after the historic person in question is born. N – Gram is more specialized to cultural phenomenon than names.

     Lack of content is another predictable issue. Did the public really appreciate Isaac Newton’s contributions or did it create big controversy? If women’s social participation grew after World War One, which departments were they geared to or forbidden to be administered?

Mining the Dispatch

Mining the dispatch

     According to Robert K. Nelson, “topic modeling and other distant reading methods are most valuable not when they allow us to see patterns that we can easily explain but when they reveal patterns that we can’t, patterns that surprise us and that prompt interesting and useful research questions.”

      Mining the Dispatch used MALLET software to explore the “rhythm” of daily life in Richmond by topic modelling The Richmond Daily Dispatch Newspapers’ online archive. In conventional American history, there hasn’t been much discussion on daily lives of Richmond residents. How resilient were the slaves during the war? How stable was the slave market. Did they receive relative opportunities to seize their freedom? Topic modelling allows us to venture these questions.

     Total of forty topics are divided into nine categories; under nine categories, there are subcategories chosen by the frequency of keywords. For example, under the category of “Slavery,” there is “For Hire and Wanted Ads.” We are presented a graph, showing how frequent “For Hire and Wanted” appeared between January 1861 to January 1864.

However, according to Nelson topic modelling has flaws.

    Ranaway.—$10 reward.

    —Ranaway from the subscriber, on the 3d inst., my slave woman Parthena. Had on a dark brown and white calico dress. She is of a ginger-bread color; medium size; the right fore-finger shortened and crooked, from a whitlow. I think she is harbored somewhere in or near Duvall’s addition. For her delivery to me I will pay $10.
    de 6—ts G. W. H. Tyler.

http://dsl.richmond.edu/dispatch/pages/intro

     This ad appeared on “the Dispatch in December 1861” clearly after the end of American Civil War and emancipation. Programs are not perfect. Associating key words under topics can lead to an above example. However, only “90% of this piece comes from the fugitive slave ad topic and the other 10% from two other topics.”

Mining the Dispatch has done a fascinating job as a data mining site. Although American Civil War is nothing of my interests, this method will help capture unnoticed patterns among pile of documents. Nelson’s work is a great example of large data and close reading.

Blog Post #2 And Then There Were Three

 

And Then There Were Three
Three musketeers

 

What is the best example of public history? One that utilizes crowdsourcing to its full extent to preserve and present history. Three sties below are great public history community that collect, preserve and present history with public contributions.

“The September 11 Digital Archive”
“Hurricane Digital Memory Bank”
“Occupy Archive”

They are not perfect. Crowdsourcing has its flaws; just ask WIKIPEDIA! we will examine each sites and evaluate.

 

The September 11 Digital Archive

Remember 9/11? If not, check this out. CLICK

Hope that cleared your memory. 9/11 digital archive is the most informative and largest content preserving site out of the three musketeers. Unlike the other two, its’ FAQS connect links to major online news sites, such as, CNN, BBC, and New York Times for chronology and timeline of the attack. It doesn’t assume users’ knowledge of the event. Users are well informed about what the site is about.

Most organized website ever? Probably not. Most organized out of the three? It sure is. Under collection, categories are visibly appealing to help navigation faster. Titles and descriptions are easy to locate and read. Most importantly, diverse metadata is offered for every collection without glaring at the screen to find the appropriate data. Thank goodness the web designers were passive with colouring and underlining. Aesthetics are important.

Featuring function was my personal favourite. I was looking for official government related documents of 9/11. Since crowdsourcing’s major problem is presumed as validity of information and organization, I didn’t know where to start. Featuring became my best friend. FDNY Incident Actions Plans (click title), and Michael Ragsdale Flyer Collection received spotlight on the main page. They were featured for users to better understand the day to day work of various groups, and social and political transitions taking place during and after the event.

However, the site deserves some criticism. It provides diverse metadata terms, but without actual content. Many of them were unknown (click). Lack of effort or lack of data, is not sure.

What about the accuracy of posted contents`? Are they reliable? This is not a big concern. If an anonymous person uploads a fake story, people might question the site’s reliability; 9/11’s tragedy and impact on american people, however, will not change. George Bush ministration conspiracy theory is one thing, but exaggerating a personal story of 9/11 will only develop more empathy for the community. Digital vandalism, some calls it; fake uploads have small chance of ever appearing on such american and patriarchic archive.

Personal account stories, combined with posts about how 9/11 will be remembered personally, transformed the site into a relatable community for Americans. This is crucial to sustain participation. The archive has found a home, representing not only the historical event of 9/11, but the people involved.

 

Hurricane Digital Memory Bank and Occupy Archive

Allow me to become a poet.

 

Amazed at 9/11 archive.
So easy and it looks tight.
Hurricane and Occupy, left me to ponder.
And I walked with confidence,
Far away from the stinkers.

     Ok, maybe, stinker was a little harsh. Honestly, I couldn’t find a good rhyme. Anyhow, Hurricane Digital Memory Bank (here on, HDMB) and Occupy Archives (here on, OA) weren’t my favourites. One, they were difficult to navigate. Two, OA’s multiple posts had same red headings but different content; this was confusing. Finally, aesthetics. Is it just me or emphasis on the colour red brings up totalitarian government images in my head. The keyboard warrior within me was uncomfortable.

9/11 Archive felt easy. The main reason was FAQS’s chronological breakdown, catered for those with limited knowledge of the event. However, history online is supposed to be about breaking linear barriers that plagued history for so long. User participation and to “DO” history are key elements in constructing successful public history. In that perspective, HDMB and OA weren’t terrible.

HDMB, the on going archive of hurricane Katrina and Rita, offered a relief vibe. Collections and posts focused on the aftermath and recovery of the victims. For example, reconstructed houses, making a new best friend on evacuation, and opening new businesses for the better. Out of the three, HDMB is the least prone to ever modify its purity. The site is simply about the natural disaster and its victims. No political controversy or activists involved. It’s a place to share and rehab. Most importantly, lot of pictures!

     OA, not just an archive, but an arbiter for the occupy movement, had consistent metadata content compared to 9/11 archive’s unknowns.

Unknown

(http://bulbapedia.bulbagarden.net/wiki/Unown_%28Pok%C3%A9mon%29)

No, I’m not talking about you, Unknown

     Although subjects and tags included too many, its controlled vocabulary meant better categorization. Tags were available before clicking into posts; 9/11 archive, on the other hand, required deeper navigation for tags.

     I question OA’s lasting purity. Occupy movements are to influence better economic distribution. In their perspective, the greatest evil is the government. Such phenomenon to occupy has a probability of turning into an institution with specific political intent. We’ve all heard stories of passionate protestors that fought against the government, and entered politics with gained renown. Of course, this is far fetched. THIS PICTURE of Stephanie Keith shows their perspective and story; was she really man handled?

      Overall, I was satisfied with the three musketeers; great public history sites that allow participants to alter and add digital history. Unlike analog history, contributions are constant. Like gravity, we history enthusiasts are pulled in, creating a community platforms that preserve, present, and produce history.

Week of January 12th: Digital Civilization

CAM00036

 

DIGITAL CIVILIZATION

 

Welcome to davenmello’s first post for History 303

Click me!! (What Recaptcha is!)

 

2015, more than ever, history has become digitized. Unlike past historians, modern scholars have access to abundant resource floating around the web. Therefore, we are encouraged to develop and harness easier tools to apply online resources to history. In the midst of digital revolution of our past, multiple notable projects were born that would forever change how we preserve, produce, and present history.

The word, digital, fascinated me to sign up for history 303 at University of Waterloo. Have you ever tried to download an mp3 file and was asked to type in distorted letters? Had to refresh numerous times because you couldn’t recognize them? It is frustrating. However, those 10 seconds of your time were well spent in preserving human history. Recaptcha, it’s called. The story of this remarkable project reshaped my views in looking at digital history.

At first, it was called Captcha. Invented to prevent computers from mass producing meaningless comments, links, and other web garbage. Then, it turned into Recaptcha; you were presented with two distorted words and required to type them correctly. This procedure would take on average, 10 – 15 seconds. Precious time in modern age. Programmers wanted to find a way to use the time more efficiently. Answer was simple; combining the process of typing correct letters and preserving history online.

Preserving historical records were always important. With computers as our tool, it became easier. Scan and upload. However, computers could not recognize all letters and words, urging the bright minds to incorporate the function of stopping spams and preserving records into one. Here came Recaptcha. You would correctly type scanned but non-recognizable letters by computers, and input another word to prevent spams. Also known as, Project Gutenberg, used 10 seconds to its full potential. That is, preserving history online and developing new medium, paradigm, and tools that History 303 will help me to understand.

Duolingo, a free language learning, and translation platform is another example of digital history; In bigger spectrum, part of digital humanity. First, it offers an English grammar, and an answer is given in foreign language. The platform would curate the input data and stack them, increasing the accuracy of translation. In result, far surpassing the accuracy of machines. This is the power of digitized sources and online participation called, crowdsourcing. Easy and accessible, what digital sources are for.

As seen from above, we have contributed greatly in digitizing our records. Ironically, we have yet to fully harness the ability and intent to maximize the new medium. Great tools like Recaptcha has helped to preserve indispensable records. Duolingo produced and presented digital humanity in new light. We, as in students, instructors and digital humanity enthusiasts are big part of this transition. The web is like an open ocean in the age of discovery, vast and unknown. It is however, only matter of time, for that ocean to be conquered with continuous writing of our trace, brightening our paths, in the new age of digital history.