Extract Data for Social Network Analysis
Relational data can be found in many data sets but it is often tedious to extract. To help you find the networks hidden in your data sets, we've written a script for the Excel-plugin DataNitro that can extract information about nodes and the relationships between them.
To get started:
- download and install DataNitro (Windows only) and install the included Python package
- download the script
- download the Excel-file example
The following is a demonstration of how the script is used to extract a list of nodes and a list of edges from a data set with incorporated relational data. Followed by a description of how to adapt the script to your own code.
Example: Running the Script
This script will construct a traditional network (an one-mode network) where all node can be connected directly to one another. See the last paragraph for scripts for two-mode networks.
Once you've opened the Excel-file, navigate to the DataNitro pane in the top and click 'Import script'. Load the Python script (sna_extract.py) you downloaded earlier and click 'Run'.
A prompt will open to show which step the script is currently executing:
As the prompt closes you'll be shown the 'Controls' sheet. This sheet gives you instructions on how to proceed and also allows you to adjust thresholds for the ratings. Ratings are used to show the strength of a relationship. The column on the far right titled 'Rating Counts' shows how many of each rating can currently be found in the list of relationships. This last column is updated as the thresholds are changed.
Example: Using the Data
To use the extracted data export the 'Attributes' and 'Edgelist' sheets as Excel files.
These Excel files are imported as shown in the Data Entry guide
Once the two files have been imported into Socilyzer they can be visualized like this (here colored by cluster):
Modifying the Script to Your Data
The script is easy to adapt to your own data. Simply open the script in a text editor (e.g. Nodepad) or a code editor and change the variables listed near the top:
should equal any cell in the column where start nodes are listed, i.e., the origin of a relationship.
rated_nodes similar to the variable above, rated_nodes should be any cell in the column listing end nodes, i.e., the destionation of a relationship.
ignore_first_line should be set to True if headlines are used in the data sheet.
By default the script only extracts the name of the nodes but it can easily be changed to group (rater) nodes. This is done by setting "include_groups" to "True" and setting "groups" to a cell in the column with group names. This produces the following output in the Controls sheet:
Relational data can also be extracted directly from a database using SQL. The example queries below use this publicly available dataset.
SELECT "CUSTOMER-" || CustomerID as "Customer", "EMPLOYEE-" || EmployeeID as "Employee" FROM Orders;
SELECT "CUSTOMER-" || CustomerID as "id", ContactName || ", " || CustomerName as "name", country as "group" FROM Customers UNION SELECT "EMPLOYEE-" || EmployeeID, FirstName || " " || LastName, "Employee" FROM Employees;
Scripts for Two-Mode Networks
For data sets with two node sets (e.g., clients and sales people) you can construct a two-mode network where each node set is connected through the other node set. To help you extract such information you can use this DataNitro script.
A two-mode network can also be converted into an one-mode network where the number of shared intermediate nodes (second node set) determine the intensity of the relationship between the two nodes. As an example, two sales people can be strongly connected because they share many clients. This script helps you extract such information.
It's as easy as that. Should you run into any problems using the script with Socilyzer, feel free to reach out to us at email@example.com