Data Management
From Archaeological Methods
|
Section 9: Data Management
As previously discussed in Section 6, archaeological data exists in many forms and types in which each has its own particular management concerns. Archaeologists use the term data loosely to encompass a wide range of possible meanings. Common types of data may include, lithic assemblages, floatation samples, point provenience information, geophysical survey grids, and images. Data can manifest itself in both the physical (objects) and as the last two examples illustrate, the digital realms. The manner in which archaeological data is gathered, processed, and managed is a topic that needs to be addressed especially in the present technological age. The advent of computers and the storage, computation, and analytical tools that they provide has revolutionized the use of information. Archaeologists have access to vaste amounts of information that can be readily accessed and utilized for research. Technology provides these enhanced capabilities at a cost. The knowledge required to use the digital environment to its full potential is an exponentially increasing phenomenon. New technologies present new challenges for the researcher in both the methodological and theoretical realms. Regarding database creation and management, the digital environment introduces new challenges for creation and maintenance of databases. These challenges are compounded by the pace at which technology advances. New programs, instruments, and software replace preexisting versions or models in a manner that can be detrimental to digital data. This is due to new software formats outpacing the original data creating compatibility issues. In this manner digital data may be lost without constant reformatting. Digital data can be viewed as a fragile commodity. Therefore steps must be taken to ensure that the data is continuously reformatted or stored in a way is compatible with a wide range of existing and future formats. Researchers must now consider methods of how best to preserve digital data in a way that will make it available for future applications. The concerns presented by the technological revolution are not necessarily novel. Hard copy forms of data also face preservation and conservation issues.
9.2 Hard Copy Formats
Before a thorough discussion of digital data is started, we must consider hard copy formats that supply most of the data that is later digitized. Hard copy forms of data exist in a variety of ways. The new archaeology of the 1970s sought to ground archaeological method and theory in the scientific process. Up until this time empiricism had driven data collection and interpretation. One of the by products of the new archaeology was to systematize how archaeological data is collected and processed. This is reflected in the copious amounts of paperwork that exist and reflects a reliance on deductive reasoning. In other words, the data can prove or disprove the hypothesis and in some regards directly influences theory and method. In order to facilitate verifiable and replicable interpretations, data must be recorded in a consistent manner. Hard copy forms of data such as spreadsheets, photo logs, soil profiles, feature forms, shovel test probe forms, and artifact catalogues must contain consistent formats and terminology. A high level of internal consistency allows data to be recorded quickly and universally interpreted. Consistency in both format and terminology alleviates loss or degradation of data during processing. Hard copy forms should be brief while guarding against redundancy. The brevity streamlines data collection in the field and subsequently cuts back on processing time in the lab. Relating to this, data recorded on hard copy formats must be transferable. As the previous section discussed, data is no longer a static entity but can be quickly accessed, manipulated, and analyzed. Information that is recorded on hard copy formats will most likely be entered into a digital format that will transform the data into a dynamic body. Standard record forms must have this transferable characteristic. When developing record forms, maps, catalogues, etc., researchers must realize that data collection should be conducted in a method that is conducive for further manipulation. The type of data that is recorded on hard copy forms is related to what specific phenomena are being studied. The following is a brief discussion of the commonly used forms in archaeological survey. A map is an example of a hard copy form that conveys the location of archaeological resources or sites. Certain standard elements must be present on a map to aid in interpretation. Some of the data that must be present on a hard copy map are scale, orientation, title, and legend. This gives the researcher a frame of reference in which to interpret the map and the data it contains. Other information that is present on a map are features and location of specific objects of interest. Area forms should convey the size of the area studied, methodology used to test the area, the presence of archaeological materials, and a general description of the soils, landcover, and boundaries. The information recorded on an area form will assist the eventual writing of a report or aid the interpretation of finds. In relation to the area form, the shovel test record form can be considered as presenting data on specific locations within the study area. The shovel test record form is probably the most recognizable hard copy form that archaeologists use on a regular basis. The principle components of the shovel test record form are the provenience information, soil profile depth and type descriptions, and the presence and location of cultural materials found. Photographic logs are an important form of hard copy documentation. Photo logs are typically organized as rows upon which the particular exposure or digital image is documented along with a brief description of what the image contains. Additionally, the photo logs should indicate direction or view of the photograph. Other types of hard copy documents on which data is recorded are, feature forms, field notes, and artifact catalogues. These forms should be brief and consistent in format in order to transfer the data into other mediums.
9.3 Spreadsheets
One characteristic that all hard copy data formats should share is an ability to integrate them into a digital database. The foundation upon which data exists within a database is the spreadsheet. The spreadsheet consists of rows and columns creating individual cells within which attribute and numerical data can be stored. The spreadsheet is also useful for the creation of propagating formulas and display formats. Artifact catalogues are usually created using a spreadsheet. Column headings could include provenience information as well as various attribute information. Rows might contain individual artifacts. Numerical measurements such as width, length, and weight can be easily stored. Probably the most recognizable spreadsheet formats are programs in the Microsoft Office suite. More specifically, Microsoft Excel and Microsoft Access are two programs that share wide spread use by archaeological researchers. Microsoft Excel is a spreadsheet program that contains a number of display, design, and quantitative techniques in addition to providing a format for simple data entry. Depending on the nature of the data, the researcher can quantify particular characteristics and traits using simple calculation functions. Also, large amounts of data can be easily entered and stored on single spreadsheets or the researcher can create multiple booklets. The booklets are compatible and data from one sheet may be evaluated with another. Spreadsheet programs are also searchable and dynamic. Entries can be updated or erased continuously as more data becomes available. Microsoft Access takes this concept a step further by linking each entry or row to an individual form. Microsoft Access allows the researcher to create his/her own form. These forms often mimic the hard copy formats; however, with the digital version it is easy to provide structure to the responses entered. The programmer is able to pre-set the types of attributes entered for a particular field. The ability to limit the types of responses provides a level of consistency in the data set as well as expediting the manual entry process. Fields may contain a preprogrammed set of values that can be selected from a pull down list. The drawback is that attributes must be well defined and exist with little variation. However, large amounts of data are quickly entered into fields that group them together based upon the various attributes used. The researcher is also afforded the opportunity to isolate objects with a particular trait. One unique function of spreadsheet data is that it comprises the raw data of Geographic Information Systems (GIS). Geographic Information Systems provides a dynamic forum in which large amounts of stored digital data can be compiled and presented as spatial phenomena. The spreadsheet is essentially the backbone of the program. Large amounts of attribute and numerical data are stored and displayed in a format that allows the user to manipulate and interact with the data. Just like other forms of spreadsheet data, GIS spreadsheets can be created, updated, and merged with other bodies of information. The functionality of a GIS allows the researcher to create, manipulate, and interpret large amounts of spreadsheet data. The relational database is an integral part of a GIS. A relational database essentially breaks down a flat-file database into multiple tables (Connolly and Lake 2006:52). Each table can be linked by an identical column usually consisting of a sequential identity field. Independent tables can be joined by using this common field, called a relation. The data can be searched using the standard query language (SQL) (Connolly and Lake 2006:53). The relational database provides a simple and versatile format of managing large amounts of data.
9.4 Database Creation
Databases can be described as a certain number of individual data sets that are organized and stored together. Databases have grown, technologically speaking, from the days of index cards being fed into a computer. Currently we think of digital databases as programs that collect, store, manipulate, and retrieve bodies of information (Lock 2003). Databases must be accessible, searchable, and structured. The organization of the database must be done in a manner that allows the researcher unobstructed access to the information contained within. A researcher may wish to create a database for a particular region such as all of the archaeological resources within Owens Valley, California. The advantage to having this data in one location becomes obvious when the researcher is looking at particular trends or patterns within the data. These large reservoirs of information can also be treated as individual entities and compared and contrasted according to the questions being asked. A database must be searchable. The searchable function provides the researcher with a means of rapidly accessing particular information. A fair amount of time is saved by the researcher by not having to wade through the mounds of available data in search of a specific item(s). Comparisons between a number of data subsets may be preformed in rapid succession due to the relative ease at which information can be accessed. Finally, a database must be dynamic. This is probably the most important characteristic of a database. The dynamic nature of a digital database allows the researcher to store, access, search, organize, manipulate, and interpret a large amount of information. The same data stored on hard copy formats is unwieldy and to a great degree, inaccessible.
9.5 Storage
One primary component of data management is the issues regarding storage, preservation, and conservation. These issues must be balanced with the needs of ongoing research. The data needs to be available for study but will degrade with use. Certainly this is the case with hard copy formats but what about digital data? Digital data may often be more fragile than our hard copy formats for issues already elaborated upon above. Therefore data must be stored in a way that assures it will exist for future researchers. Considering the need for continued preservation, digital formats must be chosen that can be recognized by a wide range of software programs. A digital format that enjoys an extensive use via a number of computer systems is the Portable Document Format (PDF) by Adobe Systems. Since PDFs may be stored and accessed using a widely distributed freeware program, it remains as one of the primary formats for digital data. Currently anything from journal articles to technical reports are converted to PDF files. A second format that is widely transferable is the American Standard Code for Information Interchange (ASCII). This format can often be accessed through a number of different software programs and is good for numerical data. ASCII files may be read by spreadsheet programs as well as basic word documents. Despite the particular format type that the digital data is stored, it must be backed up in a separate location. Large amounts of digital storage space can be purchased for relatively cheap making remote back ups of data and databases possible. It is desirable to periodically back up files in case they become corrupted or are lost due to technical malfunction or natural disaster. Hard copy data may be recovered or conserved using special chemical treatments or controlled atmospheric conditions. Digital data when lost usually cannot be recovered. Therefore it is advisable that backup copies of data be part of routine practice. Data can exit in many varieties. The careful organization, compilation, and presentation of data in formats that are conducive to research will great aid our ability to make meaningful interpretations. Data should not be viewed as a static entity but rather as a dynamic body. If gathered and managed properly, the information we obtain may aid current and promote future research. By taking a fresh look at archaeological data that was gathered in the past, we might see this body of research in a new light and utilize it in new ways.
9.6 Digital Data
It is becoming more and more common for archaeologists to use digital programs such as ArcGIS, Survey Pro, ArcPad, ArcheoSurveyor, Microsoft Office, and Adobe Creative Suite. Some simple guidelines/rules should always be followed when processing data within a digital context.
1. Make sure all file names are 16 characters or less and contain no spaces or characters. Most programs will not allow for files to be saved if their name is over 16 characters or contain characters which are not letters or numbers.
2. Make all file names easy to identify what they are. This not only helps you locate files quicker, but also helps any future researchers who may need to use your data. One thing which helps in this process is creating project folders with sub-folders within for the different types of data you are storing. It may take longer to create these folders whenever creating a new project, but well organized data makes things much easier for data editing and manipulation.
3. To separate words within a file name, use underscores ( _ ) instead of spaces.
4. Save all data to the hard drive you are working on. Using data saved to an external device will take twice as long to process. This occurs because when transferring data to or from an external device, a temporary file is saved to the hard drive in use, essentially creating two copies of every file saved or used.
5. Always create multiple copies of your data on separate devices. This will ensure that if a file is corrupted or lost, there will always be a backup. One thing to consider though, is to make sure that these multiple sources are updated each time you finish working on a project.
References
Lock, Gary
2003 Using Computers in Archaeology Towards Virtual Pasts. Routledge, New York.
Connolly, J. and M. Lake
1999 Geographical Information Systems in Archaeology. Cambridge University Press, Cambridge.
Robinson, William
2010 Archaeological data management and analysis at Blandwood mansion. Thesis, The University of North Carolina, Greensboro.
http://libres.uncg.edu/ir/uncg/listing.aspx?id=3697.
Useful Websites
http://www.nmhistoricpreservation.org/PROGRAMS/arm.html
http://www.esri.com/industries/archaeology/business/journal.html