In my previous post, we were able to create a classifier to see how the number of learning events was distributed for the year, Social Learning Journal - Classification.
This is a good start, but not all events are created equally. So this leads us to the question of how to evaluate the density of each event. As I have pondered this with the progression of this project, I have landed on a weighted time value - Knowledge Consumption Velocity (kcv). This is a made-up unit by which each various medium can be converted into. In future posts, I will go into detail about each learning medium, but for now, let's quickly look at one example.
Book vs AudioBook Example
How could we compare a book that is read to another event, like listening to a book?
Book
We have a constant, the number of pages in the book. In the above picture, this is annotated as "^432p". I utilize the ^ as an indicator of density. The unit of p stands for "pages". So given this event, a 432-page book was finished.
Our other two variables are a bit more subjective (words per page and reading speed). Publishers have a recommended number of words per page, which is 250-300 words per page. Additionally, we need to assume the average reading speed, around 150 words per minute for a non-fiction technical book. With this information, we are able to do a little bit of math to calculate a weighted time value (aka kcv).
given:
- 432 pages (p)
assumed:
- 250 words per page (wpp)
- 150 read words per minute (rwpm)
kcv = (p * wpp) / rwpm
kcv = (423 * 250) / 150
kcv = 705
With our above example, we land on a value of 705, which is an estimate of the weighted time that can be used to compare to other events.
Audiobook
Audiobooks are a little easier. The duration of the production is constant here. This event is annotated as "^595m". The unit here is m which stands for minutes. Given this event, we have finished a 595-minute audiobook. We assume this is our kcv value.
Comparison
We can now compare the book and audiobook based upon a shared variable. This calculation is not perfect, and tweaking the book variables slightly can produce vastly different results. In addition, sometimes people listen to books at a 1.x speed.
What if we compared the physical book Pragmatic Programmer 2nd Edition against its audiobook counterpart?
We know the audiobook has a value of 595. Let's quickly run the calculation for the book.
given:
- 352 pages (p)
kcv = (p * wpp) / rwpm
kcv = (352 * 250) / 150
kcv = 586.66
This is not a perfect comparison, but you can see we have a similar value for the book (586) and audiobook (595).
Outcomes
Add weight to the classified learning events. I am going to assume that the start and finish of an event denote a single value and ignore the distribution kcv over time (future post). In addition count a Tweet as a static value of 2. I have played with the idea of scrapping referenced URLs and creating a value, but that is beyond the scope of today.
Extracting Density
Instead of just adding one to each classified event, we are going to slightly change our data model and create a new extractor that will parse the kcv value for each event.
# tweet_event_model
from typing import List
class TweetEventModel:
hashtags = None
text = None
def __init__(self, hashtags: List, text: str):
self.hashtags = hashtags
self.text = text
I pushed the classification code into its own extractor class - classification_extractor.py. Then based upon the file classify-tweets-for-this-year.py, I created a new script:
# tweet-density-for-this-year.py
def reduce_classifications(result: dict, tweet_event: TweetEventModel) -> dict:
classification_extractor = ClassificationExtractor(tweet_event)
classification = classification_extractor.classify()
result[classification] = result[classification] + 1
return result
def hashtags_from_tweet(tweet: dict):
return [hashtag_entity['text'].lower()
for hashtag_entity in tweet['tweet']['entities']['hashtags']]
def full_text_from_tweet(tweet: dict):
return tweet['tweet']['full_text']
if __name__ == "__main__":
with open(DATA_SEED_TWITTER_PATH) as data_seed:
data = json.load(data_seed)
time_extractor = TimeExtractor(data)
tweets_from_this_year = time_extractor.tweets_for_year(current_year)
tweet_events_from_this_year = [
TweetEventModel(hashtags_from_tweet(tweet), full_text_from_tweet(tweet))
for tweet in tweets_from_this_year]
classified_tweets = reduce(reduce_classifications, tweet_events_from_this_year, classifications)
for key, value in classified_tweets.items():
print(f'{key}: {value}')
This gives us the same results as the classification script but utilizes a model instead of a list and the classification extractor. Now we need to make adjustments to the reduce_classification function to account for our weighted time annotations.
First, we need to do some string manipulation to find the annotation and its value:
if "^" in tweet_event.text:
kcv_annotation = tweet_event.text[tweet_event.text.index("^") + 1:].split("\n")[0]
Next, depending upon the ending we need to either take the int value of the result (minus the unit) or perform our calculation for books.
if kcv_annotation.endswith("m"):
kcv_value = int(kcv_annotation.replace("m", ""))
elif kcv_annotation.endswith("p"):
pages = int(kcv_annotation.replace("p", ""))
kcv_value = (pages * WORDS_PER_PAGE) / READ_WORDS_PER_MINUTE
The full script can be found here: tweet-density-for-this-year
Conclusion
By applying a density calculation to the journaled events, we are able to quantify the value. In comparison to the "raw" count, we end up with a much different distribution of learning events over the year.
2020 Tweet Density (kcv)
- Engineering: 2783
- Agile: 9557
- Leadership: 3199
- Other: 1414.33
Pushing everything into a time variable lets us quantify the learning data in ways that are easy for the human mind to process. The covid-19 pandemic has been difficult on us all. I found myself indulging more in my family and video games, as opposed to my career, than in previous years. Given there are 1440 minutes in a day, we can do a raw comparison of "days" spent Sharpening the Saw.
Time Spent Learning
- 2020: 11.77 days
- 2019: 14.72 days