Tuesday, January 30, 2018

The birth of SpongeGuyParkFeld

I joined my college's data science club this quarter.  It's very fun so far, though I haven't done much more than pick a project and join a team.

My team's project (well, we're one big group but split into two teams for easier management) is natural-language processing.  Specifically, sentiment analysis of transcripts of TV shows.  The goal is to map character relationships, vocabulary shifts over time or due to new writers, and vocabulary richness based on target audiences.  And stuff like that.  We're planning on refining the goals more as we progress in the project, but that's the general direction.

The shows we have chosen to analyze are Spongebob, Family Guy, South Park, and Seinfeld.  We sort of mashed all the names together to create a single word to represent our project.  Hence, the birth of SpongeGuyParkFeld!
It's a memorable name, and will definitely raise a few eyebrows.  Hopefully in a good way.

I'm excited to do some 'real' computational linguistics.  I've used Python and R before, as well as Beautiful Soup to scrape things.  But not in this capacity, where's it's mostly self-directed and open-ended.  And automating the scraping of an entire website rather than a single page will probably require some stuff I've never done before.

Another thing that's new to me is using git with multiple people.  I have my own GitHub, use it semi-regularly, and I'm comfortable with managing it with Sourcetree.  But I also mess up commits and organization.  So I hope that I won't mess up my entire group's project somehow.

I think I'll have to learn as I go, and ask a lot of questions.  Only two people in my group have used GitHub with multiple contributors before though, so I'll be learning along with everyone else.  Hopefully they don't mess it up either!

The first part of the project is just going to be webscraping and cleaning datasets.  Transcripts of the target shows are the best ones we can find, but there's still some problems and inconsistencies with them.

I don't know why I didn't join this club earlier, honestly.  It's a great experience and hand-ons experience.  The club is also a great way to network with people from more technical majors and different interests.  My group has a statistics major, two physics majors, a couple computer science people, and me, a linguistics major.

I'm excited to start working on this project.  Every aspect of it will strengthen my skills and challenge me.  Especially the teamwork aspect of it.  People are sometimes more frustrating than code.